Scraping Dynamic Content: Common Questions Answered

Q: How do I scrape JavaScript-rendered pages?

You have three main options: **1. Use ScrapeForge (recommended for most cases):** ```python import httpx import json response = httpx.post( "https://api.searchhive.dev/v1/scrapeforge", headers={"Authorization": "Bearer sh_live_..."}, json={ "url": "https://example.com/products", "render_js": True, "format": "markdown" } ) content = response.json().get("content", "") print(content) ``` ScrapeForge uses headless browsers with proxy rotation, so it handles JS rendering and bot detection in one call. **2. Use Selenium or Playwright:** ```python from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto("https://example.com/products") page.wait_for_selector(".product-card") # Wait for dynamic content content = page.content() browser.close() ``` This gives you full control but requires managing browsers, proxies, and infrastructure yourself. **3. Find the underlying API calls:** Open DevTools > Network tab, reload the page, and find the XHR/Fetch requests that load the data. Call those API endpoints directly with `httpx` or `requests` -- no browser needed.

Q: What is the difference between server-side and client-side rendering?

- **Server-side rendering (SSR):** The server sends fully populated HTML. Content is available in the initial HTTP response. Easy to scrape with `requests`. - **Client-side rendering (CSR):** The server sends minimal HTML. JavaScript in the browser fetches data and renders it into the DOM. Requires a browser engine or API discovery to scrape. - **Hybrid:** Common in modern frameworks like Next.js and Nuxt. Initial HTML has some content, but interactive elements load dynamically.

Q: How much does dynamic content scraping cost?

It depends on the approach: | Method | Cost per Page | Infrastructure | |--------|--------------|----------------| | ScrapeForge (SearchHive) | ~$0.001 (Builder plan) | None needed | | Playwright self-hosted | $0 + server costs | 1 server + proxies | | Scraper API services | $0.01-$0.05 | None needed | | Selenium cloud | $0.02-$0.10 | None needed | SearchHive's Builder plan ($49/month for 100K credits) handles dynamic scraping at a fraction of the cost of alternatives. Each ScrapeForge call uses roughly 5-10 credits depending on page complexity.

Q: How do I wait for specific content to load?

Use the `wait_for` parameter with a CSS selector that matches the element you need: ```python # Wait for a specific product element response = httpx.post( "https://api.searchhive.dev/v1/scrapeforge", headers={"Authorization": "Bearer sh_live_..."}, json={ "url": "https://shop.example.com/product/123", "render_js": True, "wait_for": ".price-display", "format": "html" } ) ``` This ensures the scrape does not complete until the target element is present in the DOM.

Modern websites load content dynamically using JavaScript. Over 90% of e-commerce sites, SaaS platforms, and news sites render at least some of their content client-side. If your web scraper returns empty pages or missing data, dynamic content rendering is almost certainly the reason. This FAQ covers the most common questions about scraping dynamic content, with practical solutions using SearchHive's ScrapeForge API.

Frequently Asked Questions

What is dynamic content scraping?

Dynamic content scraping means extracting data from websites that load their content via JavaScript after the initial HTML response. Unlike static HTML pages where all data is present in the source code, dynamic pages use JavaScript frameworks (React, Vue, Angular) to fetch and render content after the page loads. A standard HTTP request only gets the initial HTML shell -- the actual data is loaded separately via API calls.

Why does my scraper return empty pages?

Your scraper is likely fetching the initial HTML before JavaScript executes. Most scrapers using requests or httpx only get the server-rendered HTML. On JavaScript-heavy sites, this HTML is just a <div id="root"></div> with no actual content. The content loads moments later when JavaScript runs in a browser environment.

How do I scrape JavaScript-rendered pages?

You have three main options:

1. Use ScrapeForge (recommended for most cases):

import httpx
import json

response = httpx.post(
    "https://api.searchhive.dev/v1/scrapeforge",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://example.com/products",
        "render_js": True,
        "format": "markdown"
    }
)
content = response.json().get("content", "")
print(content)

ScrapeForge uses headless browsers with proxy rotation, so it handles JS rendering and bot detection in one call.

2. Use Selenium or Playwright:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com/products")
    page.wait_for_selector(".product-card")  # Wait for dynamic content
    content = page.content()
    browser.close()

This gives you full control but requires managing browsers, proxies, and infrastructure yourself.

3. Find the underlying API calls: Open DevTools > Network tab, reload the page, and find the XHR/Fetch requests that load the data. Call those API endpoints directly with httpx or requests -- no browser needed.

What is the difference between server-side and client-side rendering?

Server-side rendering (SSR): The server sends fully populated HTML. Content is available in the initial HTTP response. Easy to scrape with requests.
Client-side rendering (CSR): The server sends minimal HTML. JavaScript in the browser fetches data and renders it into the DOM. Requires a browser engine or API discovery to scrape.
Hybrid: Common in modern frameworks like Next.js and Nuxt. Initial HTML has some content, but interactive elements load dynamically.

How much does dynamic content scraping cost?

It depends on the approach:

Method	Cost per Page	Infrastructure
ScrapeForge (SearchHive)	~$0.001 (Builder plan)	None needed
Playwright self-hosted	$0 + server costs	1 server + proxies
Scraper API services	$0.01-$0.05	None needed
Selenium cloud	$0.02-$0.10	None needed

SearchHive's Builder plan ($49/month for 100K credits) handles dynamic scraping at a fraction of the cost of alternatives. Each ScrapeForge call uses roughly 5-10 credits depending on page complexity.

Can I scrape single-page applications (SPAs)?

Yes. SPAs like React, Vue, and Angular apps load all content dynamically. ScrapeForge renders the full page in a headless browser and returns the rendered content. Set render_js: True and optionally specify a wait selector to ensure content has loaded:

response = httpx.post(
    "https://api.searchhive.dev/v1/scrapeforge",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://spa-example.com/dashboard",
        "render_js": True,
        "wait_for": ".data-loaded",  # CSS selector to wait for
        "format": "markdown"
    }
)

How do I handle infinite scroll pages?

Infinite scroll pages load more content as you scroll down. ScrapeForge supports scroll-based content loading:

response = httpx.post(
    "https://api.searchhive.dev/v1/scrapeforge",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://example.com/feed",
        "render_js": True,
        "scroll_count": 5,  # Scroll down 5 times
        "scroll_delay": 1000,  # Wait 1s between scrolls
        "format": "markdown"
    }
)

This simulates a user scrolling through the page, triggering the lazy-loading JavaScript to fetch additional content.

What about CAPTCHAs on dynamic sites?

Dynamic sites often deploy CAPTCHAs to block automated scraping. ScrapeForge includes proxy rotation and browser fingerprint management to minimize CAPTCHAs. For high-volume scraping of heavily protected sites, the Unicorn plan ($199/month) adds residential proxies and advanced anti-detection.

If you are scraping at low volumes (under 100 pages/day), ScrapeForge's standard proxy rotation is usually sufficient. CAPTCHAs typically trigger when the same IP makes many rapid requests.

How do I wait for specific content to load?

Use the wait_for parameter with a CSS selector that matches the element you need:

# Wait for a specific product element
response = httpx.post(
    "https://api.searchhive.dev/v1/scrapeforge",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://shop.example.com/product/123",
        "render_js": True,
        "wait_for": ".price-display",
        "format": "html"
    }
)

This ensures the scrape does not complete until the target element is present in the DOM.

Can I extract structured data from dynamic pages?

Yes. Use SearchHive's DeepDive API after ScrapeForge to extract structured data:

# DeepDive handles JS rendering automatically
response = httpx.post(
    "https://api.searchhive.dev/v1/deepdive",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://example.com/products/123",
        "extract": {
            "title": {"type": "string", "description": "Product title"},
            "price": {"type": "number", "description": "Product price"},
            "reviews": {"type": "integer", "description": "Number of reviews"},
            "rating": {"type": "string", "description": "Star rating"},
            "in_stock": {"type": "string", "description": "Availability status"}
        }
    }
)
data = response.json().get("data", {})
print(json.dumps(data, indent=2))

How do I scrape content behind login walls?

For pages requiring authentication, pass cookies or headers in your ScrapeForge request:

response = httpx.post(
    "https://api.searchhive.dev/v1/scrapeforge",
    headers={"Authorization": "Bearer sh_live_..."},
    json={
        "url": "https://app.example.com/dashboard",
        "render_js": True,
        "cookies": [
            {"name": "session_id", "value": "your_session_token"}
        ]
    }
)

Note: Session tokens expire. For automated scraping behind login, you need to handle session management in your pipeline.

Summary

Scraping dynamic content does not need to be complicated. The key decisions are:

Can you call the underlying API directly? Check DevTools Network tab first -- this is the fastest approach.
Do you need rendered HTML? Use ScrapeForge with render_js: True -- it handles browsers, proxies, and CAPTCHAs for you.
Do you need structured data? Use DeepDive with an extraction schema -- AI-powered extraction adapts to layout changes.

SearchHive handles all three scenarios starting at $0 (500 free credits). Sign up at searchhive.dev and start scraping dynamic content in minutes.

See also: /blog/how-to-scrape-a-website-without-getting-blocked for anti-detection strategies, or /compare/firecrawl for a comparison with other JS rendering services.

Scraping Dynamic Content -- Common Questions Answered

AI-Powered Research