Modern websites render content with JavaScript — React, Vue, Svelte, Angular. When you requests.get() a JS-heavy page, you get an empty HTML shell with a bundle of scripts and no actual data. The content loads after initial page load, meaning traditional HTTP scraping fails silently.
This guide covers five methods for scraping JavaScript-rendered pages, from simple API interception to full browser automation, with working Python code for each approach.
Key Takeaways
- The fastest approach is intercepting the underlying API calls — skip the browser entirely
- SearchHive's ScrapeForge handles JS rendering server-side, so you don't manage browsers
- Playwright is the most reliable browser automation option for complex SPAs
- Selenium still works but is slower and more verbose than Playwright
- Always check for hidden APIs before reaching for browser automation
Method 1: Intercept Underlying API Calls (Fastest)
Most JavaScript-heavy sites load data via API calls. Instead of rendering the page and extracting from the DOM, call the API directly.
import requests
# Most SPAs fetch data from a JSON API
resp = requests.get(
"https://api.example.com/products",
headers={"Accept": "application/json"}
)
products = resp.json()
for product in products["items"]:
print(product["name"], product["price"])
How to find hidden APIs:
- Open DevTools (F12) → Network tab
- Reload the page
- Look for XHR/Fetch requests returning free JSON formatter
- Right-click → Copy as cURL → translate to Python
This is always your first move. It's 10-100x faster than browser automation and doesn't break when the frontend changes.
Method 2: SearchHive ScrapeForge (No Browser Management)
ScrapeForge handles JavaScript rendering server-side. Send a URL, get markdown back.
import requests
API_KEY = "your_searchhive_key"
resp = requests.post(
"https://api.searchhive.dev/v1/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": "https://example.com/products",
"render_js": True, # Enable headless browser
"format": "markdown", # Clean markdown output
"wait_for": 3000, # Wait 3s for JS to render
}
)
data = resp.json()
print(data["markdown"])
Why use this over self-hosted Playwright: No browser installation, no server resources, no proxy management. ScrapeForge handles anti-bot detection, retries, and concurrency. You pay per credit but skip the entire infrastructure burden.
Method 3: Playwright (Most Reliable Browser Automation)
Playwright is the modern standard for browser automation. It supports Chromium, Firefox, and WebKit.
from playwright.sync_api import sync_playwright
def scrape_js_page(url: str) -> str:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
# Wait for specific elements to load
page.wait_for_selector(".product-list")
# Extract data
products = page.query_selector_all(".product-card")
results = []
for product in products:
name = product.query_selector(".name").inner_text()
price = product.query_selector(".price").inner_text()
results.append({"name": name, "price": price})
browser.close()
return results
data = scrape_js_page("https://example.com/products")
for item in data:
print(f"{item['name']}: {item['price']}")
Installation:
pip install playwright
playwright install chromium
Pros: Full browser control, handles any JS framework, screenshot support. Cons: Heavy resource usage, slow at scale, needs proxy rotation for volume.
Method 4: Selenium (Legacy but Still Works)
Selenium is the oldest browser automation tool. It works, but Playwright is generally better.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=options)
driver.get("https://example.com/products")
# Wait for JS-rendered content
wait = WebDriverWait(driver, 10)
products = wait.until(
EC.presence_of_all_elements_located((By.CLASS_NAME, "product-card"))
)
for product in products:
name = product.find_element(By.CLASS_NAME, "name").text
price = product.find_element(By.CLASS_NAME, "price").text
print(f"{name}: {price}")
driver.quit()
Method 5: Requests-HTML (Lightweight Hybrid)
Requests-HTML adds basic JS rendering to the familiar requests library via Pyppeteer.
from requests_html import HTMLSession
session = HTMLSession()
resp = session.get("https://example.com/products")
# Render JavaScript
resp.html.render(timeout=30, sleep=3)
# Extract with CSS selectors
for product in resp.html.find(".product-card"):
name = product.find(".name", first=True).text
price = product.find(".price", first=True).text
print(f"{name}: {price}")
Pros: Familiar API for requests users. Cons: Slower than Playwright, limited browser features, less maintained.
When to Use Each Method
| Method | Speed | Complexity | Best For |
|---|---|---|---|
| API Interception | Fastest | Low | Any site with JSON APIs |
| ScrapeForge | Fast | Lowest | Production scraping at scale |
| Playwright | Medium | Medium | Complex SPAs, interaction needed |
| Selenium | Slow | Medium | Legacy projects |
| Requests-HTML | Slow | Low | Quick scripts |
Handling Infinite Scroll and Dynamic Loading
Many JS sites load more content as you scroll. Here's how to handle it with Playwright:
from playwright.sync_api import sync_playwright
def scrape_infinite_scroll(url: str) -> list:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
items = []
seen_urls = set()
while True:
# Collect new items
new_items = page.query_selector_all(".item")
for item in new_items:
link = item.query_selector("a").get_attribute("href")
if link and link not in seen_urls:
seen_urls.add(link)
items.append({
"title": item.query_selector(".title").inner_text(),
"url": link
})
# Scroll down
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(2000)
# Check if we've reached the end
if len(new_items) == 0 or "no-more" in page.content():
break
browser.close()
return items
Common JS Scraping Pitfalls
Race conditions. Elements may not exist when you try to find them. Always use explicit waits — never time.sleep().
Anti-bot detection. Headless browsers leave fingerprints. Use stealth plugins or a managed service like ScrapeForge that handles this.
Memory leaks. Browser automation consumes RAM fast. Always close browsers in finally blocks or use context managers.
Rate limiting. Even headless browsers trigger rate limits. Add delays between requests or use proxy rotation.
Getting Started
For production scraping, start with SearchHive's free tier — 500 credits, no credit card. Test ScrapeForge on your target URLs. If JS rendering works via the API, you've saved yourself weeks of browser infrastructure work.
For local development, Playwright is the standard. Install it, write your scrapers, then port to ScrapeForge when you need to scale.
See also: how to scrape without getting blocked and how to extract structured data from HTML.