Every web scraper hits the same wall: rate limits. Send too many requests too fast and you get blocked, throttled, or IP-banned. Web scraping rate limiting tools solve this by controlling request frequency, distributing load across proxies, and handling retries intelligently.
This guide compares the top 5 tools for managing web scraping rate limits, from dedicated middleware to full scraping platforms with built-in throttling. Whether you need a simple delay function or enterprise-grade adaptive rate limiting, one of these will fit.
Key Takeaways
- Rate limiting is non-negotiable for any production scraper -- even "polite" scrapers need backoff logic
- Dedicated scraping platforms (Firecrawl, SearchHive) handle rate limiting internally, saving you setup time
- Open-source tools like
tenacityandscrapygive you fine-grained control but require more engineering - SearchHive ScrapeForge offers built-in rate limiting with automatic proxy rotation, starting at just $9/month
- Adaptive rate limiting (reading response headers and adjusting dynamically) outperforms fixed delays
1. SearchHive ScrapeForge -- Built-In Rate Limiting
SearchHive bundles rate limiting directly into its ScrapeForge API. You send requests and the platform handles throttling, proxy rotation, retries, and backoff -- no configuration needed.
ScrapeForge uses adaptive rate limiting based on the target site's response headers and historical patterns. If a site starts returning 429s, it automatically slows down and retries. Proxy rotation is built in, so your requests come from different IPs without managing a proxy pool yourself.
Pricing: Free tier with 500 credits, Starter at $9/month (5K credits), Builder at $49/month (100K credits). Each scrape costs 1 credit.
import requests
API_KEY = "your_searchhive_key"
BASE = "https://api.searchhive.dev/v1"
# Scrape multiple URLs -- rate limiting handled automatically
urls = [
"https://example.com/products",
"https://example.com/about",
"https://example.com/pricing"
]
results = []
for url in urls:
resp = requests.post(
f"{BASE}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"url": url, "format": "markdown"}
)
results.append(resp.json())
# ScrapeForge distributes requests, respects rate limits,
# and rotates proxies -- you just collect results
for r in results:
print(r.get("data", {}).get("content", "")[:200])
Best for: Developers who want zero-config rate limiting with a managed scraping API. No proxy pool to maintain, no retry logic to write.
2. Firecrawl -- Managed Rate Limiting with Crawl Scaling
Firecrawl is a popular scraping API that handles rate limiting internally. Their /crawl endpoint manages concurrency, respects robots.txt, and automatically throttles requests.
Firecrawl limits concurrency per plan: Free (2 concurrent), Hobby (5), Standard (50), Growth (100), Scale (150). Exceeding these limits queues requests rather than failing them.
Pricing: Free (500 credits one-time), Hobby $16/month (3K), Standard $83/month (100K), Growth $333/month (500K), Scale $599/month (1M). Each scrape costs 1 credit.
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="your_firecrawl_key")
# Firecrawl handles rate limiting internally
result = app.crawl_url(
"https://example.com",
params={
"limit": 50,
"concurrency": 5,
"allowBackwardCrawling": False
}
)
Best for: Teams that want a scraping API with built-in crawl management and decent concurrency limits.
3. Scrapy with AutoThrottle Extension
Scrapy is the most popular open-source web scraping framework in Python. Its AutoThrottle extension provides adaptive rate limiting by dynamically adjusting crawl speed based on server load.
AutoThrottle monitors response latency and adjusts concurrent requests accordingly. It starts conservative and ramps up as it detects the server can handle more load.
Pricing: Free and open-source (BSD license).
# scrapy project settings.py
BOT_NAME = "mybot"
SPIDER_MODULES = ["mybot.spiders"]
# Enable AutoThrottle for adaptive rate limiting
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 1.0 # Initial download delay (seconds)
AUTOTHROTTLE_MAX_DELAY = 10.0 # Max delay when server is slow
AUTOTHROTTLE_TARGET_CONCURRENCY = 2.0 # Target concurrent requests
AUTOTHROTTLE_DEBUG = True
# Optional: configure retry middleware
RETRY_ENABLED = True
RETRY_TIMES = 3
RETRY_HTTP_CODES = [429, 500, 502, 503, 504]
# Scrapy handles retries, backoff, and concurrency
# but you still need your own proxy rotation
Best for: Developers who need full control over scraping behavior and want to self-host everything. Requires more setup but gives maximum flexibility.
4. Tenacity -- Retry Logic Library
Tenacity is a general-purpose retry library by JetBrains that works with any Python callable. While not a scraping-specific tool, it's the go-to choice for adding retry and backoff logic to HTTP requests.
Tenacity supports exponential backoff, jitter, custom stop conditions, and retry on specific exceptions or status codes.
Pricing: Free and open-source (Apache 2.0).
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_status_code
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=30),
retry=retry_if_status_code(429),
)
def fetch_with_retry(url, headers=None):
resp = requests.get(url, headers=headers, timeout=30)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", 5))
print(f"Rate limited. Retrying after {retry_after}s")
raise Exception("Rate limited")
resp.raise_for_status()
return resp
# Use with a delay between calls
import time
urls = ["https://example.com/page/1", "https://example.com/page/2"]
for url in urls:
try:
resp = fetch_with_retry(url)
print(f"OK: {url}")
except Exception as e:
print(f"Failed after retries: {url} - {e}")
time.sleep(2) # Fixed delay between requests
Tenacity gives you retry logic but NOT rate limiting. You still need to manage request timing yourself. It pairs well with asyncio.Semaphore for concurrent scraping.
Best for: Adding retry/backoff logic to existing HTTP-based scrapers. Lightweight, no framework lock-in.
5. ScrapingBee -- Proxy + Rate Limiting as a Service
ScrapingBee is a managed scraping API that combines headless browser rendering, proxy rotation, and rate limiting. Their API handles request queuing and automatically retries failed requests.
ScrapingBee charges per request with different costs for different rendering modes. JavaScript rendering costs 5 credits vs 1 for simple HTTP requests. Premium proxies cost 10-25 credits extra per request.
Pricing: Freelancer $49/month (250K credits), Startup $99/month (1M), Business $249/month (3M).
import requests
API_KEY = "your_scrapingbee_key"
url = "https://example.com"
# ScrapingBee handles rate limiting and proxy rotation
response = requests.get(
"https://app.scrapingbee.com/api/v1/",
params={
"api_key": API_KEY,
"url": url,
"render_js": "false",
"premium_proxy": "true"
}
)
print(response.text)
Best for: Scrapers that need headless browser rendering combined with proxy rotation and managed rate limiting.
Comparison Table
| Tool | Type | Rate Limiting | Starting Price | Concurrency | Proxy Rotation |
|---|---|---|---|---|---|
| SearchHive ScrapeForge | Managed API | Adaptive, built-in | $9/mo (5K) | Managed | Built-in |
| Firecrawl | Managed API | Queuing, concurrency limits | $16/mo (3K) | 2-150 per plan | Built-in |
| Scrapy AutoThrottle | Open-source framework | Adaptive, self-tuned | Free | Unlimited (self-hosted) | Manual |
| Tenacity | Retry library | Exponential backoff | Free | Manual | Manual |
| ScrapingBee | Managed API | Request queuing | $49/mo (250K) | Managed | Built-in |
Recommendation
For most developers building production scrapers in 2025, SearchHive ScrapeForge offers the best combination of built-in rate limiting, proxy rotation, and price. At $49/month for 100K credits, it undercuts ScrapingBee by 5x on a per-request basis and Firecrawl by nearly 2x for comparable volume.
Scrapy with AutoThrottle remains the best choice if you need full control and want to self-host. Pair it with tenacity for retry logic and your own proxy provider for rotation.
If you need headless browser rendering specifically (JavaScript-heavy SPAs), Firecrawl or ScrapingBee are solid options, though both come at a significant price premium.
Get started with SearchHive's free tier -- 500 credits, no credit card required. Test the rate limiting on your target sites before committing to a paid plan.
Related: /compare/firecrawl | /compare/scrapingbee | /compare/serpapi