Scraping Google search results is one of the most common data extraction tasks. You need it for SEO monitoring, competitor analysis, keyword research, and building search-based tools. But Google aggressively blocks scrapers -- CAPTCHAs, rate limiting, and DOM changes make raw HTTP scraping unreliable within hours.
This guide covers the three realistic approaches: using a dedicated SERP API (recommended), building your own with SearchHive as infrastructure, and what to avoid.
Key Takeaways
- Raw scraping Google directly is a losing battle. Google detects and blocks scrapers within minutes, and the HTML structure changes frequently.
- SERP APIs are the standard approach. They handle proxy rotation, CAPTCHAs, and HTML parsing, returning structured free JSON formatter results.
- SearchHive's ScrapeForge + SwiftSearch offers a cost-effective alternative to premium SERP APIs for most use cases.
- For light use, free-tier SERP APIs give you 100-2,500 searches/month. For production workloads, budget $25-50/month.
Prerequisites
- Python 3.8+
requestslibrary (pip install requests)- A SearchHive API key (free signup with 500 credits)
Step 1: Understand Why Direct Scraping Fails
Before choosing an approach, understand what you're up against:
- CAPTCHAs -- After a handful of requests from the same IP, Google serves a CAPTCHA page instead of results.
- Rate limiting -- Google throttles requests per IP, then blocks entirely.
- DOM instability -- Google frequently updates its result page structure. CSS selectors that work today break next week.
- Personalization -- Results vary by location, device, and search history. Controlling these variables requires specific parameters.
A naive approach using requests and BeautifulSoup:
# DO NOT DO THIS IN PRODUCTION
import requests
from bs4 import BeautifulSoup
url = "https://www.google.com/search?q=web+scraping+api"
resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
soup = BeautifulSoup(resp.text, "html.parser")
# This will break. Google obfuscates class names and changes structure constantly.
for g in soup.select(".g"):
title = g.select_one("h3").text
link = g.select_one("a")["href"]
print(f"{title}: {link}")
This might work for 5-10 requests, then Google serves a CAPTCHA. Not viable for anything beyond testing.
Step 2: Use SearchHive SwiftSearch API (Recommended)
SearchHive's SwiftSearch API provides Google-like search results with structured output. It handles the proxy and parsing complexity internally:
import requests
API_KEY = "your-api-key"
BASE_URL = "https://api.searchhive.dev/v1"
def search_google(query, num_results=10, country="us", language="en"):
# Get Google-style search results via SwiftSearch API.
# Args: query, num_results, country code, language code
# Returns: list of search result dictionaries
response = requests.post(
f"{BASE_URL}/search",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"query": query,
"num_results": num_results,
"country": country,
"language": language
}
)
if response.status_code == 200:
data = response.json()
return data.get("results", data.get("organic_results", []))
else:
raise Exception(f"Search failed: {response.status_code} - {response.text}")
# Basic search
results = search_google("best web scraping API 2026")
for r in results:
print(f"{r.get('title', 'No title')}")
print(f" URL: {r.get('url', r.get('link', 'N/A'))}")
print(f" Snippet: {r.get('snippet', r.get('description', ''))[:100]}...")
print()
Advanced: Tracking Rankings Over Time
import json
import datetime
def track_keyword_rankings(keywords, target_domain, api_key):
# Track where a domain ranks for specific keywords.
rankings = []
for keyword in keywords:
try:
results = requests.post(
f"{BASE_URL}/search",
headers={"Authorization": f"Bearer {api_key}"},
json={"query": keyword, "num_results": 20}
).json()
organic = results.get("results", results.get("organic_results", []))
position = None
for i, r in enumerate(organic, 1):
url = r.get("url", r.get("link", ""))
if target_domain in url:
position = i
break
rankings.append({
"keyword": keyword,
"position": position,
"found": position is not None,
"checked_at": datetime.datetime.now().isoformat()
})
except Exception as e:
rankings.append({
"keyword": keyword,
"position": None,
"error": str(e)
})
return rankings
# Example: track your site's rankings
data = track_keyword_rankings(
keywords=["web scraping api", "serp api python", "google search scraper"],
target_domain="searchhive.dev",
api_key=API_KEY
)
for r in data:
if r.get("found"):
print(f"Keyword '{r['keyword']}' ranks #{r['position']}")
else:
print(f"Keyword '{r['keyword']}' not found in top 20")
Step 3: Use SearchHive DeepDive for SERP Analysis
When you need more than just organic results -- featured snippets, People Also Ask, knowledge panels -- use DeepDive:
def analyze_serp_features(query):
# Extract all SERP features from a search results page.
response = requests.post(
f"{BASE_URL}/deepdive",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": f"https://www.google.com/search?q={query}",
"prompt": (
"Extract all visible SERP features: featured snippet, "
"People Also Ask questions, knowledge panel info, "
"top 10 organic results with title, URL, and snippet, "
"and any ads shown. Return as structured JSON."
)
}
)
return response.json()
serp_data = analyze_serp_features("searchhive web scraping api")
print(json.dumps(serp_data, indent=2))
Step 4: Comparing SERP API Providers
If you specifically need Google SERP data (with exact ranking positions, local pack data, etc.), here's how the main providers compare:
| Provider | Free Tier | Base Price | Per-Search Cost | Rate Limits |
|---|---|---|---|---|
| SearchHive | 500 credits | $9/mo | ~$0.002/credit | Per-plan limits |
| SerpAPI | 100 searches/mo | $50/mo (5K) | $0.01-$0.005 | 5-50/sec |
| Serper.dev | 2,500 searches | $50/mo (50K) | $0.001-$0.01 | Varies by plan |
| Brave Search API | $5 free/mo | $5/1K searches | $0.005/search | 15/sec |
| Google Custom Search | 100 queries/day | $5/1K queries | $0.005/query | 100/sec |
| Tavily | 1,000 searches/mo | $0.008/credit | $0.008/search | Varies |
When to use which:
- SearchHive -- Best for general-purpose search + scraping combined. One API for search, scraping, and deep extraction.
- Serper.dev -- Cheapest for high-volume Google SERP specifically. Good if you only need organic results.
- SerpAPI -- Most comprehensive SERP features (local, images, news, shopping). Highest price but most complete.
- Brave Search API -- Independent search index. Not Google results, but good for privacy-focused tools.
Step 5: Build a Keyword Research Tool
Here's a practical example combining search and scraping:
class KeywordResearcher:
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.searchhive.dev/v1"
self.headers = {"Authorization": f"Bearer {api_key}"}
def get_related_keywords(self, seed_keyword):
# Search for a keyword and extract People Also Ask questions
# and related searches from the SERP.
response = requests.post(
f"{self.base_url}/deepdive",
headers=self.headers,
json={
"url": f"https://www.google.com/search?q={seed_keyword}",
"prompt": (
"Extract all 'People Also Ask' questions, "
"'Related searches' at the bottom, and "
"the top 5 organic result titles and URLs."
)
}
)
return response.json()
def analyze_competitor_content(self, url, focus_keyword):
# Analyze a competitor's page for keyword optimization.
response = requests.post(
f"{self.base_url}/deepdive",
headers=self.headers,
json={
"url": url,
"prompt": (
f"Analyze this page for the keyword '{focus_keyword}'. "
"Extract: title tag, H1, meta description, "
"word count estimate, number of H2 headings, "
"and whether the keyword appears in the first paragraph."
)
}
)
return response.json()
def research(self, keyword, analyze_top_n=3):
# Full keyword research workflow.
serp = self.get_related_keywords(keyword)
top_urls = []
results = serp.get("organic_results", serp.get("top_results", []))
for r in results[:analyze_top_n]:
url = r.get("url", r.get("link", ""))
if url:
analysis = self.analyze_competitor_content(url, keyword)
top_urls.append({"url": url, "analysis": analysis})
return {
"keyword": keyword,
"related_questions": serp.get("people_also_ask", []),
"related_searches": serp.get("related_searches", []),
"competitor_analysis": top_urls
}
researcher = KeywordResearcher(API_KEY)
report = researcher.research("web scraping api python")
print(json.dumps(report, indent=2))
Common Issues
Rate limiting
Start with small batches and add delays between requests. Most APIs enforce rate limits per second, not per minute.
Inconsistent result formats
Different search engines and APIs return results in different formats. Always normalize your parsing code to handle variations:
def normalize_result(raw):
# Normalize search results from any API format.
return {
"title": raw.get("title") or raw.get("name") or "",
"url": raw.get("url") or raw.get("link") or "",
"snippet": raw.get("snippet") or raw.get("description") or "",
}
Google changing result structure
This is why you use an API instead of scraping directly. APIs handle the parsing layer and update their extraction logic when Google changes the DOM. With SearchHive's DeepDive, the AI-based extraction adapts to structural changes automatically.
Next Steps
- Start with 500 free credits: Sign up at searchhive.dev and test SwiftSearch for your SERP monitoring needs.
- Build your first keyword tracker: Use the code above to track rankings for your target keywords.
- Explore the full API: Check searchhive.dev/docs for SwiftSearch, ScrapeForge, and DeepDive documentation.
Related: /blog/best-serp-api-alternatives-for-developers | /compare/serpapi | /compare/serper