The Python ecosystem for web scraping is massive — BeautifulSoup, Scrapy, Playwright, Selenium — but when you need to scale past a few hundred requests, managing proxies, CAPTCHAs, and JavaScript rendering yourself becomes a full-time job.
Scraping APIs handle that infrastructure for you. You send a URL, you get clean data back. The question is which one fits your project's volume, budget, and technical requirements.
This guide focuses specifically on services with official or well-maintained Python SDKs — because if you're working in Python, you want pip install and a documented client, not raw HTTP calls.
Key Takeaways
- ZenRows offers the best value at $49/month for 250,000 requests with a mature Python SDK
- SearchHive ScrapeForge has the lowest barrier to entry at $9/month and returns markdown directly
- ScrapingBee has the cleanest Python SDK design — closest to idiomatic Python
- ScraperAPI is the best choice for async batch jobs at high volume
- Apify uses a unique compute-unit model with the largest marketplace of pre-built scrapers
- ScrapeGraphAI is the only option for LLM-powered extraction with Python support
- SDK quality varies significantly — some are wrappers around
requests, others provide proper async support and type hints
What Makes a Good Python SDK for Scraping
Before diving into specific services, here's what separates a good scraping SDK from a bad one:
- Type hints and IDE autocomplete. You should be able to discover parameters without reading docs.
- Async support. Scraping is I/O-bound. A sync-only SDK leaves throughput on the table.
- Error handling. Retries, timeouts, and rate limit handling should be built in.
- Response parsing. Returning parsed objects, not raw strings, saves you code.
- Context manager support.
with client:patterns for resource cleanup.
Most scraping APIs fail on at least two of these. The ones below are ranked by how well their Python SDKs actually work in practice.
1. ZenRows — Best Value with SDK
PyPI: pip install zenrows (v1.4.0)
ZenRows provides the best requests-per-dollar ratio at the starter tier ($49 for 250,000 requests). The Python SDK is well-maintained with regular updates.
from zenrows import ZenRowsClient
client = ZenRowsClient("YOUR_API_KEY")
# Basic scrape with anti-bot bypass
response = client.get(
"https://news.ycombinator.com",
params={"antibot": True}
)
print(response.status_code)
print(response.text)
# JavaScript rendering
response = client.get(
"https://spa-site.com/data",
params={
"js_render": True,
"wait": 3000 # Wait 3 seconds for JS to load
}
)
# Geotargeted requests
response = client.get(
"https://google.com/search?q=test",
params={"geolocation": "us"}
)
The SDK handles all HTTP details — headers, retries, connection pooling. Parameter names map directly to ZenRows API options, so the API docs and SDK docs are effectively the same.
Async support: The SDK is synchronous only. For concurrent requests, use asyncio + aiohttp directly with the ZenRows API, or use concurrent.futures.ThreadPoolExecutor.
2. SearchHive ScrapeForge — Best Entry Price
PyPI: pip install searchhive
SearchHive's SDK covers both search (SwiftSearch) and scraping (ScrapeForge) with one package and one API key.
from searchhive import ScrapeForge
client = ScrapeForge(api_key="sh_live_...")
# Scrape to markdown
result = client.scrape("https://blog.example.com/post")
print(result["markdown"])
# Scrape with JS rendering and custom selectors
result = client.scrape(
"https://spa-example.com/products",
render_js=True,
extract={"title": "h1", "price": ".product-price"}
)
# Use with SwiftSearch for end-to-end pipeline
from searchhive import SwiftSearch
search = SwiftSearch(api_key="sh_live_...")
results = search.search("best python web scraping tutorials", num_results=10)
scrape = ScrapeForge(api_key="sh_live_...")
for r in results["organic"]:
page = scrape.scrape(r["url"])
print(f"Title: {r['title']}")
print(f"Content: {page['markdown'][:200]}")
The unified SDK is the differentiator — one pip install, one API key, search and scraping together. At $9/month for the starter tier, it's the cheapest way to get both capabilities.
Async support: Check the latest docs — async support was being added in early 2026.
3. ScrapingBee — Cleanest SDK Design
PyPI: pip install scrapingbee (v1.3.0)
ScrapingBee's SDK is the most Pythonic in the space. Clean method signatures, sensible defaults, good error messages.
from scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key="YOUR_API_KEY")
# Basic request
response = client.get("https://example.com")
print(response.status_code, response.text)
# With JS rendering and extraction rules
response = client.get(
"https://store.example.com/product/123",
params={
"render_js": "True",
"extract_rules": json.dumps({
"name": "h1.product-title",
"price": ".price-value",
"rating": ".star-rating"
})
}
)
data = response.json()
print(f"{data['name']}: ${data['price']}")
# Screenshot capture
response = client.get(
"https://example.com",
params={"screenshot": "True", "screenshot_full_page": "True"}
)
# Returns base64-encoded PNG
with open("screenshot.png", "wb") as f:
f.write(base64.b64decode(response.content))
The extract_rules parameter is particularly useful — it runs CSS selectors server-side and returns free JSON formatter, so you never handle raw HTML.
Downside: Credit-based pricing means JS rendering (5 credits) and premium proxies (10-25 credits) drain your allocation fast. At 1,000 credits on the equivalent of the free tier, you'll burn through it quickly with any JS rendering.
4. ScraperAPI — Best for Batch Jobs
PyPI: pip install scraperapi (v0.1.7)
ScraperAPI's strength is its async batch endpoint for queuing thousands of URLs. The Python SDK is minimal — most developers use requests directly.
import requests
API_KEY = "YOUR_API_KEY"
# Synchronous request — ScraperAPI acts as a proxy
def scrape(url, render=False):
response = requests.get(
"https://api.scraperapi.com",
params={
"api_key": API_KEY,
"url": url,
"render": render,
"autoparse": True # Returns structured JSON for known page types
}
)
return response.json()
# Parse product page
data = scrape("https://store.example.com/product/123", render=True)
print(data)
# Async batch — submit up to 10,000 URLs
batch_response = requests.post(
"https://async.scraperapi.com/batch",
json={
"apiKey": API_KEY,
"requests": [
{"url": f"https://example.com/page/{i}"}
for i in range(1, 101)
]
}
)
batch_id = batch_response.json()["batchId"]
# Poll for results
import time
while True:
status = requests.get(
f"https://async.scraperapi.com/batch/{batch_id}",
params={"apiKey": API_KEY}
).json()
if status["status"] == "finished":
break
time.sleep(10)
The async batch endpoint handles retries, proxy rotation, and parallelism across their infrastructure. Submit 10,000 URLs, get notified when they're all done.
Downside: The official Python SDK (scraperapi v0.1.7) is barely maintained. The requests-based approach works fine but lacks the ergonomics of a proper client library.
5. Apify — Largest Scraper Marketplace
PyPI: pip install apify-client (v1.9.0) or pip install apify (v2.4.0)
Apify takes a different approach. Instead of a generic scraping API, they offer an actor marketplace — pre-built scrapers for specific sites (Amazon, LinkedIn, Google Maps, etc.) that you can run on their infrastructure.
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Run a pre-built scraper (e.g., Google Search Results)
run = client.actor("apify/google-search-scraper").call(run_input={
"queries": ["python web scraping tutorial"],
"maxResults": 10
})
# Fetch results from the dataset
dataset = client.dataset(run["defaultDatasetId"])
for item in dataset.iterate_items():
print(item)
The actor marketplace has hundreds of pre-built scrapers for popular sites. You can also write custom actors in Node.js or Python.
Pricing uses compute units ($0.25/CU), not per-request pricing. One CU ≈ 1 GB RAM-hour. This makes cost prediction harder but gives you flexibility.
Downside: Compute-unit pricing is confusing to estimate. The platform has a learning curve. Actor quality varies — some community actors are unmaintained.
6. ScrapeGraphAI — LLM-Powered Extraction
PyPI: pip install scrapegraphai (v1.18.0)
ScrapeGraphAI uses LLMs to extract structured data based on natural language prompts. It's an open-source Python library (MIT license), not a hosted API.
from scrapegraphai.graphs import SmartScraperGraph
config = {
"llm": {
"model": "gpt-4o-mini",
"api_key": "your-openai-key",
"temperature": 0
},
"verbose": False
}
scraper = SmartScraperGraph(
prompt="Extract product names, prices, ratings, and availability from this page",
source="https://store.example.com/collection",
config=config
)
result = scraper.run()
# Returns: [{"product": "...", "price": "...", "rating": "...", "available": True}, ...]
For cost-conscious usage, swap the LLM for a local model:
config = {
"llm": {
"model": "ollama/llama3.1",
"base_url": "http://localhost:11434"
}
}
Multiple graph types available: SmartScraperGraph (single page), SearchScraperGraph (search + scrape), SpeechGraph (audio extraction), and DeepSearchGraph (multi-page research).
Downside: Each extraction requires an LLM call (costs tokens). Slower than traditional scraping. Works best when you need structured extraction from complex, inconsistent page layouts.
Python SDK Comparison
| Service | Package | Version | Type Hints | Async | Context Manager | Install Size |
|---|---|---|---|---|---|---|
| ZenRows | zenrows | 1.4.0 | Yes | No | No | ~20KB |
| SearchHive | searchhive | Latest | Yes | In progress | Yes | ~30KB |
| ScrapingBee | scrapingbee | 1.3.0 | Partial | No | No | ~15KB |
| ScraperAPI | scraperapi | 0.1.7 | No | No | No | ~10KB |
| Apify | apify-client | 1.9.0 | Yes | Yes | Yes | ~2MB |
| ScrapeGraphAI | scrapegraphai | 1.18.0 | Partial | No | No | ~50MB (with deps) |
Recommendation
For most Python developers starting a new project: SearchHive ScrapeForge. One SDK, one API key, search + scraping, $9/month entry. The markdown output works out of the box for LLM/RAG pipelines.
For raw scraping volume at the best price: ZenRows. $49 for 250,000 requests is unmatched. The SDK is clean and well-documented.
For structured extraction from known page types: ScrapingBee. The extract_rules feature eliminates HTML parsing from your codebase.
For batch jobs (10K+ URLs): ScraperAPI. The async batch endpoint handles queueing, parallelism, and retry logic.
For no-code scraping with pre-built actors: Apify. The marketplace approach means someone may have already built the scraper you need.
For intelligent extraction from messy pages: ScrapeGraphAI. LLM-powered extraction handles pages that would require fragile CSS selectors otherwise.
→ /compare/scraperapi → Read our full scraping API pricing comparison → Try ScrapeForge free — 500 credits, no credit card