How to Handle Rate Limiting in Web Scraping
Rate limiting is the single biggest reason web scrapers fail at scale. You write a script that works beautifully on ten pages, then the target site starts returning 429 errors, CAPTCHAs, or blank responses. If you're building a production scraper — whether for price monitoring, lead generation, or market research — handling rate limits isn't optional. It's the difference between a pipeline that runs reliably and one that falls apart overnight.
This guide covers every technique you need: HTTP status codes reference codes, exponential backoff, proxy rotation, distributed scraping, and why using a managed API like SearchHive eliminates most of this complexity.
Key Takeaways
- 429 Too Many Requests is the standard rate limit HTTP status codes reference — always check for it in your response handling
- Exponential backoff with jitter is the gold-standard retry strategy (start at 1s, double each attempt, add randomness)
- Respect
Retry-Afterheaders — when a server tells you how long to wait, listen - Rotating proxies spread requests across IP addresses, but residential proxies cost $4–15/GB
- SearchHive handles rate limiting transparently — no backoff logic, no proxy pools, no CAPTCHAs to worry about
What HTTP status codes indicate rate limiting?
Most developers know about 429, but rate limiting shows up in multiple ways:
| Status Code | Meaning | What to Do |
|---|---|---|
| 429 Too Many Requests | Explicit rate limit hit | Back off using Retry-After header or exponential delay |
| 503 Service Unavailable | Server overloaded | Retry with backoff; check Retry-After |
| 403 Forbidden | Silent block or IP ban | Rotate proxy, change user agent parser, slow down |
| 200 + CAPTCHA page | Stealth rate limiting | Detect via response content, switch proxy or pause |
| 408 Request Timeout | Throttled into timeout | Back off, reduce request rate |
The 429 status code was formalized in RFC 6585 and is now the universal signal for rate limiting. When you receive it, the server typically includes headers that tell you exactly how to recover.
How does exponential backoff work?
Exponential backoff is a retry strategy where each failed attempt waits longer than the last. The formula is simple: delay = base × 2^attempt. So if your base delay is 1 second, you wait 1s, then 2s, then 4s, then 8s, then 16s.
The critical addition is jitter — random variation added to each delay. Without jitter, multiple clients hitting the same rate limit all retry at the same moment (the "thundering herd" problem), creating a new spike.
Here's a production-ready implementation:
import time
import random
import requests
def fetch_with_backoff(url, max_retries=5, base_delay=1.0, max_delay=60.0):
"""Fetch a URL with exponential backoff and jitter."""
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10)
if response.status_code == 429:
# Check Retry-After header first
retry_after = response.headers.get('Retry-After')
if retry_after:
delay = int(retry_after)
else:
# Exponential backoff with full jitter
delay = min(base_delay * (2 ** attempt), max_delay)
delay = delay * random.uniform(0.5, 1.5)
print(f"Rate limited (attempt {attempt + 1}), waiting {delay:.1f}s")
time.sleep(delay)
continue
return response
except requests.RequestException as e:
delay = min(base_delay * (2 ** attempt), max_delay)
delay *= random.uniform(0.5, 1.5)
print(f"Request failed: {e}, retrying in {delay:.1f}s")
time.sleep(delay)
raise Exception(f"Max retries ({max_retries}) exceeded for {url}")
Best practices for backoff:
- Cap maximum delay at 30–60 seconds
- Set max retries to 3–5 attempts
- Always check
Retry-Afterbefore calculating your own delay - Log every retry for monitoring
What is the Retry-After header and how do I use it?
The Retry-After header appears on 429 and 503 responses. It comes in two formats:
- Delay-seconds:
Retry-After: 120— wait 120 seconds - HTTP-date:
Retry-After: Fri, 31 Dec 2025 23:59:59 GMT— wait until that time
Some APIs use non-standard variants:
X-RateLimit-Remaining— how many requests you have leftX-RateLimit-Limit— your total allowanceX-RateLimit-Reset— Unix timestamp when the limit resets
The smartest approach is proactive throttling. Don't wait for 429s. Parse X-RateLimit-Remaining and slow down when it drops below 20% of your limit:
def smart_request(url, headers, rate_limit_remaining, rate_limit_limit):
"""Throttle proactively based on rate limit headers."""
threshold = rate_limit_limit * 0.2
if rate_limit_remaining < threshold:
# Calculate time until reset from X-RateLimit-Reset header
reset_time = int(headers.get('X-RateLimit-Reset', 0))
wait = max(0, reset_time - time.time())
print(f"Approaching rate limit ({rate_limit_remaining} left), waiting {wait}s")
time.sleep(wait + 1)
return requests.get(url, headers=headers)
How do rotating proxies help with rate limiting?
Most rate limits are tied to IP addresses. Rotate your IP, and you get a fresh rate limit allowance. There are four proxy types:
| Proxy Type | Cost | Success Rate | Speed | Best For |
|---|---|---|---|---|
| Datacenter | $0.50–2/GB | Low | Fast | High-volume, low-security sites |
| Residential | $4–15/GB | High | Medium | Most scraping use cases |
| ISP | $2–5/GB | Medium-High | Fast | Session-sensitive sites |
| Mobile | $15–30/GB | Very High | Slow | Sites with strict bot detection |
Smart rotation is the most IP-efficient strategy — rotate your proxy only when you hit a rate limit or get blocked. Per-request rotation works too, but burns through residential proxy bandwidth quickly.
How does distributed scraping work?
For serious scale, you need multiple workers hitting different targets simultaneously:
# Simplified distributed scraping with a queue
import redis
import threading
def worker(worker_id, proxy_pool):
"""Each worker pulls URLs from a shared queue."""
r = redis.Redis()
while True:
url = r.blpop('scrape_queue', timeout=30)
if not url:
break
url = url[1].decode()
proxy = proxy_pool.next() # Get next proxy from rotation
try:
response = fetch_with_backoff(url, proxies={'http': proxy, 'https': proxy})
process_result(url, response)
except Exception as e:
print(f"Worker {worker_id} failed on {url}: {e}")
# Launch 10 workers
threads = []
for i in range(10):
t = threading.Thread(target=worker, args=(i, proxy_pool))
t.start()
threads.append(t)
Key principles for distributed scraping:
- Use a central queue (Redis, SQS, RabbitMQ) to avoid duplicate work
- Track per-domain rate limits globally so combined worker traffic stays under limits
- Isolate failures — one worker hitting a rate limit shouldn't stop others
Should I respect robots.txt when scraping?
Yes. The robots.txt file at a domain's root declares crawling policies, including Crawl-delay directives that specify the minimum seconds between requests.
from urllib import robotparser
rp = robotparser.RobotFileParser()
rp.set_url('https://example.com/robots.txt')
rp.read()
can_fetch = rp.can_fetch('MyBot/1.0', 'https://example.com/page')
crawl_delay = rp.crawl_delay('MyBot/1.0') # Minimum seconds between requests
robots.txt generator isn't legally binding on its own, but ignoring it is used as evidence of bad faith in web scraping legal cases. Always parse it and respect Crawl-delay.
How does SearchHive handle rate limiting?
If you're tired of managing backoff logic, proxy pools, and CAPTCHA solving, SearchHive handles all of this transparently. Its web search and scraping APIs manage rate limiting upstream — your code never sees a 429.
from searchhive import SwiftSearch, ScrapeForge
# Search — no rate limit handling needed
search = SwiftSearch(api_key='your-key')
results = search.query('machine learning frameworks 2026')
for r in results:
print(f"{r['title']}: {r['url']}")
# Scrape — automatic retry, proxy rotation, CAPTCHA handling
scraper = ScrapeForge(api_key='your-key')
content = scraper.scrape('https://example.com/article')
print(content['markdown']) # Clean markdown output
No exponential backoff. No proxy pool management. No CAPTCHA solving. SearchHive absorbs all the complexity and returns clean, structured results. Compare this to self-managed scraping where you're paying $4–15/GB for residential proxies on top of engineering time.
SearchHive pricing starts with a free tier — see the full pricing comparison against competitors like Bright Data and ScraperAPI.
What are the most common rate limiting mistakes?
- Ignoring Retry-After — using fixed delays instead of server-specified wait times
- No jitter — synchronized retries that re-trigger the rate limit
- Only datacenter proxies — easily detected and blocked
- No proactive throttling — flying blind until 429s start firing
- Scraping one domain from many IPs at once — looks like a DDoS attack
- Not checking response content — 200 responses can still be CAPTCHA pages
Summary
Rate limiting is a multi-layered problem. At minimum, implement exponential backoff with jitter and respect Retry-After headers. For production scraping, add rotating proxies and proactive throttling. For teams that don't want to manage this infrastructure, SearchHive provides managed search and scraping APIs that handle rate limiting, CAPTCHAs, and proxy rotation out of the box.
Get started with SearchHive's free tier and stop worrying about 429s.
Related reading: What Is the Best Proxy for Web Scraping | How to Use SearchHive with Python | Is Web Scraping Legal