How to Scrape JavaScript-Rendered Pages with Python

Modern websites render content with JavaScript — React, Vue, Svelte, Angular. When you requests.get() a JS-heavy page, you get an empty HTML shell with a bundle of scripts and no actual data. The content loads after initial page load, meaning traditional HTTP scraping fails silently.

This guide covers five methods for scraping JavaScript-rendered pages, from simple API interception to full browser automation, with working Python code for each approach.

Key Takeaways

The fastest approach is intercepting the underlying API calls — skip the browser entirely
SearchHive's ScrapeForge handles JS rendering server-side, so you don't manage browsers
Playwright is the most reliable browser automation option for complex SPAs
Selenium still works but is slower and more verbose than Playwright
Always check for hidden APIs before reaching for browser automation

Method 1: Intercept Underlying API Calls (Fastest)

Most JavaScript-heavy sites load data via API calls. Instead of rendering the page and extracting from the DOM, call the API directly.

import requests

# Most SPAs fetch data from a JSON API
resp = requests.get(
    "https://api.example.com/products",
    headers={"Accept": "application/json"}
)
products = resp.json()

for product in products["items"]:
    print(product["name"], product["price"])

How to find hidden APIs:

Open DevTools (F12) → Network tab
Reload the page
Look for XHR/Fetch requests returning free JSON formatter
Right-click → Copy as cURL → translate to Python

This is always your first move. It's 10-100x faster than browser automation and doesn't break when the frontend changes.

Method 2: SearchHive ScrapeForge (No Browser Management)

ScrapeForge handles JavaScript rendering server-side. Send a URL, get markdown back.

import requests

API_KEY = "your_searchhive_key"
resp = requests.post(
    "https://api.searchhive.dev/v1/scrape",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "url": "https://example.com/products",
        "render_js": True,        # Enable headless browser
        "format": "markdown",     # Clean markdown output
        "wait_for": 3000,         # Wait 3s for JS to render
    }
)
data = resp.json()
print(data["markdown"])

Why use this over self-hosted Playwright: No browser installation, no server resources, no proxy management. ScrapeForge handles anti-bot detection, retries, and concurrency. You pay per credit but skip the entire infrastructure burden.

Method 3: Playwright (Most Reliable Browser Automation)

Playwright is the modern standard for browser automation. It supports Chromium, Firefox, and WebKit.

from playwright.sync_api import sync_playwright

def scrape_js_page(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        page.goto(url, wait_until="networkidle")
        
        # Wait for specific elements to load
        page.wait_for_selector(".product-list")
        
        # Extract data
        products = page.query_selector_all(".product-card")
        results = []
        for product in products:
            name = product.query_selector(".name").inner_text()
            price = product.query_selector(".price").inner_text()
            results.append({"name": name, "price": price})
        
        browser.close()
        return results

data = scrape_js_page("https://example.com/products")
for item in data:
    print(f"{item['name']}: {item['price']}")

Installation:

pip install playwright
playwright install chromium

Pros: Full browser control, handles any JS framework, screenshot support. Cons: Heavy resource usage, slow at scale, needs proxy rotation for volume.

Method 4: Selenium (Legacy but Still Works)

Selenium is the oldest browser automation tool. It works, but Playwright is generally better.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=options)

driver.get("https://example.com/products")

# Wait for JS-rendered content
wait = WebDriverWait(driver, 10)
products = wait.until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "product-card"))
)

for product in products:
    name = product.find_element(By.CLASS_NAME, "name").text
    price = product.find_element(By.CLASS_NAME, "price").text
    print(f"{name}: {price}")

driver.quit()

Method 5: Requests-HTML (Lightweight Hybrid)

Requests-HTML adds basic JS rendering to the familiar requests library via Pyppeteer.

from requests_html import HTMLSession

session = HTMLSession()
resp = session.get("https://example.com/products")

# Render JavaScript
resp.html.render(timeout=30, sleep=3)

# Extract with CSS selectors
for product in resp.html.find(".product-card"):
    name = product.find(".name", first=True).text
    price = product.find(".price", first=True).text
    print(f"{name}: {price}")

Pros: Familiar API for requests users. Cons: Slower than Playwright, limited browser features, less maintained.

When to Use Each Method

Method	Speed	Complexity	Best For
API Interception	Fastest	Low	Any site with JSON APIs
ScrapeForge	Fast	Lowest	Production scraping at scale
Playwright	Medium	Medium	Complex SPAs, interaction needed
Selenium	Slow	Medium	Legacy projects
Requests-HTML	Slow	Low	Quick scripts

Handling Infinite Scroll and Dynamic Loading

Many JS sites load more content as you scroll. Here's how to handle it with Playwright:

from playwright.sync_api import sync_playwright

def scrape_infinite_scroll(url: str) -> list:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url)
        
        items = []
        seen_urls = set()
        
        while True:
            # Collect new items
            new_items = page.query_selector_all(".item")
            for item in new_items:
                link = item.query_selector("a").get_attribute("href")
                if link and link not in seen_urls:
                    seen_urls.add(link)
                    items.append({
                        "title": item.query_selector(".title").inner_text(),
                        "url": link
                    })
            
            # Scroll down
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            page.wait_for_timeout(2000)
            
            # Check if we've reached the end
            if len(new_items) == 0 or "no-more" in page.content():
                break
        
        browser.close()
        return items

Common JS Scraping Pitfalls

Race conditions. Elements may not exist when you try to find them. Always use explicit waits — never time.sleep().

Anti-bot detection. Headless browsers leave fingerprints. Use stealth plugins or a managed service like ScrapeForge that handles this.

Memory leaks. Browser automation consumes RAM fast. Always close browsers in finally blocks or use context managers.

Rate limiting. Even headless browsers trigger rate limits. Add delays between requests or use proxy rotation.

Getting Started

For production scraping, start with SearchHive's free tier — 500 credits, no credit card. Test ScrapeForge on your target URLs. If JS rendering works via the API, you've saved yourself weeks of browser infrastructure work.

For local development, Playwright is the standard. Install it, write your scrapers, then port to ScrapeForge when you need to scale.

How to Scrape JavaScript-Rendered Pages with Python

AI-Powered Research

Key Takeaways

Method 1: Intercept Underlying API Calls (Fastest)

Method 2: SearchHive ScrapeForge (No Browser Management)

Method 3: Playwright (Most Reliable Browser Automation)

Method 4: Selenium (Legacy but Still Works)

Method 5: Requests-HTML (Lightweight Hybrid)

When to Use Each Method

Handling Infinite Scroll and Dynamic Loading

Common JS Scraping Pitfalls

Getting Started

Keywords

RELATED ARTICLES

How to Build a News Aggregator with Python and Web APIs

Best Web Scraping APIs for Startups — Affordable Options Ranked

Google Search API — Free and Cheap Alternatives Compared

BUILD WITH SEARCHHIVE