Can You Scrape JavaScript Websites? The Complete Answer (2026)
Yes, you can scrape JavaScript websites. But it requires a fundamentally different approach than scraping static HTML. JavaScript-rendered pages serve empty HTML shells to bots, then populate content dynamically via client-side JavaScript. Traditional scrapers (requests + BeautifulSoup) see nothing but empty <div> tags.
This guide covers every approach to scraping JavaScript websites, from headless browsers to scraping APIs.
Key Takeaways
- Static scraping tools (requests, BeautifulSoup) cannot extract content from JavaScript-rendered pages
- Headless browsers (Playwright, Puppeteer, Selenium) render JavaScript but are slow and resource-heavy
- Scraping APIs (SearchHive ScrapeForge, Firecrawl) handle JS rendering server-side with proxy rotation
- The key signals of a JS-rendered site: empty HTML on inspection, data in
<script>tags, content loads with a delay - For AI/LLM workflows, scraping APIs that return Markdown are the most practical choice
- Modern JavaScript frameworks (React, Next.js, Vue, Angular) are the primary reason JS rendering is needed
What makes JavaScript websites hard to scrape?
Traditional scraping works like this: send HTTP request, receive HTML, parse with BeautifulSoup. This works when the server sends complete HTML.
JavaScript frameworks (React, Vue, Angular, Svelte) change the game:
- Server sends a minimal HTML shell (often just
<div id="root"></div>) - Browser downloads JavaScript bundles
- JavaScript executes and renders the actual content into the DOM
- The visible content never appears in the initial HTTP response
import requests
from bs4 import BeautifulSoup
# This will NOT work on a JavaScript-rendered site
resp = requests.get("https://react-app.example.com/products")
soup = BeautifulSoup(resp.text, "html.parser")
print(soup.find("div", class_="product-list"))
# Output: None (the div doesn't exist in the initial HTML)
How do I know if a website uses JavaScript rendering?
Three quick tests:
- View page source (Ctrl+U): If the HTML is mostly empty divs and script tags, the content is JS-rendered
- Disable JavaScript: Open DevTools > Settings > Disable JavaScript. If the page goes blank, it is JS-rendered
- curl the page: If
curl URL | grep "your-content"returns nothing, the content is loaded by JavaScript
Most modern web apps are JS-rendered. Frameworks like React, Next.js (client components), Vue, and Angular all default to client-side rendering.
How to scrape JavaScript websites with Playwright
Playwright is the most modern headless browser for Python scraping. It renders JavaScript, waits for content to load, and gives you full DOM access:
from playwright.sync_api import sync_playwright
def scrape_js_website(url: str) -> str:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
# Wait for the content to render
page.wait_for_selector(".product-list", timeout=10000)
# Extract text content
products = page.query_selector_all(".product-item")
results = []
for product in products:
title = product.query_selector(".title").inner_text()
price = product.query_selector(".price").inner_text()
results.append({"title": title, "price": price})
browser.close()
return results
data = scrape_js_website("https://react-app.example.com/products")
Playwright pros: Full browser rendering, handles SPAs, can interact with elements (click, scroll) Playwright cons: Slow (1-5 seconds per page), high memory usage, needs browser binaries, no built-in proxy rotation
How to scrape JavaScript websites with Selenium
Selenium is the older but still widely-used approach:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=options)
driver.get("https://react-app.example.com/products")
# Wait for JS to render
time.sleep(3) # Simple but unreliable
products = driver.find_elements(By.CLASS_NAME, "product-item")
for product in products:
print(product.text)
driver.quit()
Selenium pros: Mature ecosystem, lots of tutorials, supports all browsers
Selenium cons: Slower than Playwright, verbose API, time.sleep() is fragile (better to use explicit waits)
Why scraping APIs are better for JavaScript websites
Running headless browsers at scale is an infrastructure problem. Each browser instance consumes 200-500MB of RAM. Scraping 1,000 JS-rendered pages with Playwright means managing 1,000 browser launches, proxy rotations, and potential crashes.
Scraping APIs handle all of this on their infrastructure:
import requests
# SearchHive ScrapeForge -- JS rendering handled automatically
response = requests.post(
"https://api.searchhive.dev/v1/scrapeforge",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://react-app.example.com/products",
"format": "json",
"renderer": "js"
}
)
data = response.json()
# Returns structured JSON with all rendered content
| Approach | Speed | Reliability | Maintenance | Cost at 10K pages |
|---|---|---|---|---|
| Playwright (self-hosted) | 2-5s/page | Medium | High | Infrastructure only |
| Selenium (self-hosted) | 3-8s/page | Medium | High | Infrastructure only |
| SearchHive ScrapeForge | 1-3s/page | High | Zero | ~$1 (10K credits) |
| Firecrawl | 1-3s/page | High | Zero | ~$8.30 |
| Jina AI Reader | 1-2s/page | Medium | Zero | Free (1M tokens/day) |
SearchHive ScrapeForge automatically detects whether a page needs JavaScript rendering and applies it when necessary. You do not need to specify -- just send the URL and get structured data back.
Can I extract data from JavaScript variables without rendering?
Some sites embed data in <script> tags as free JSON formatter objects. You can extract this without rendering:
import re
import json
import requests
resp = requests.get("https://example.com/products")
# Find JSON data embedded in script tags
match = re.search(r'window\.__INITIAL_STATE__\s*=\s*({.*?});', resp.text)
if match:
data = json.loads(match.group(1))
products = data.get("products", [])
print(products)
This technique (sometimes called "HTML scraping with JSON extraction") is fast and does not require a browser. It works on Next.js apps that use getServerSideProps, many Gatsby sites, and any site that embeds structured data in script tags.
What about infinite scroll and lazy loading?
JavaScript sites often load more content as you scroll. Headless browsers handle this:
from playwright.sync_api import sync_playwright
def scrape_infinite_scroll(url: str, max_scrolls: int = 10):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
for _ in range(max_scrolls):
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(1000) # Wait for new content to load
# All content is now in the DOM
items = page.query_selector_all(".item")
browser.close()
return [item.inner_text() for item in items]
With SearchHive ScrapeForge, you can pass wait_for_selector or scroll_depth parameters to handle infinite scroll without writing browser automation code.
SearchHive vs alternatives for JavaScript scraping
| Feature | SearchHive ScrapeForge | Firecrawl | Jina Reader | Scrapy + Playwright |
|---|---|---|---|---|
| JS Rendering | Auto-detected | Yes | No | Manual setup |
| Markdown Output | Yes | Yes | Yes | No (HTML) |
| JSON Output | Yes | Yes | No | No (HTML) |
| Proxy Rotation | Built-in | Built-in | No | Manual |
| Infinite Scroll | Configurable | No | No | Manual |
| CAPTCHA Handling | Yes | Partial | No | No |
| Free Tier | 500 credits | 500 credits | 1M tokens/day | Free |
| Pricing | $0.0001/credit | $0.83/1K | Free/$0.6/1M tokens | Infrastructure cost |
Frequently Asked Questions
Can BeautifulSoup scrape JavaScript-rendered pages? No. BeautifulSoup parses HTML that has already been received. It cannot execute JavaScript. You need to pair it with a headless browser or use a scraping API that renders JS first.
Is Puppeteer better than Playwright? Playwright is generally better maintained, supports more browsers (including WebKit), and has a cleaner async API. Puppeteer is Chrome-only and has fallen behind in features.
How much slower is JS rendering vs static scraping? JS rendering typically adds 1-3 seconds per page compared to static HTML scraping (under 1 second). At scale, this adds up -- which is why scraping APIs with pooled browsers are more efficient.
Can I scrape React/Vue/Angular apps? Yes, all of them. Modern scraping APIs handle these frameworks automatically. The specific framework does not matter to the scraper -- it just needs a full browser render.
Summary
JavaScript-rendered websites require headless browsers or scraping APIs that render JS on the server. Playwright and Selenium work for small projects, but scaling them is an infrastructure headache. SearchHive ScrapeForge handles JS rendering, proxy rotation, and CAPTCHAs automatically, returning clean Markdown or JSON for a fraction of the cost of self-hosted solutions.
Get started free with 500 credits and scrape your first JavaScript site in under 5 minutes.