How to Use a Metasearch API: Step-by-Step Tutorial
A metasearch API aggregates results from multiple search engines into a single response. Instead of calling Google, Bing, and DuckDuckGo separately, you make one request and get combined, deduplicated results. This tutorial shows you how to build and use metasearch APIs with Python, including a production implementation using SearchHive.
Prerequisites
- Python 3.8+ with
pip - An HTTP client library:
pip install httpx asyncio - A SearchHive API key (free tier: 500 credits, no card) from searchhive.dev
- Basic understanding of REST APIs
Step 1: Understand How Metasearch Works
Traditional search queries one engine. Metasearch queries multiple engines in parallel and merges the results:
Your App
|
v
Metasearch API --> Google --> Results A
--> Bing --> Results B
--> DuckDuckGo --> Results C
|
v
Merged, deduplicated, ranked results
Benefits of metasearch:
- Broader coverage -- different engines index different parts of the web
- No single point of failure -- if one engine is down, others still deliver
- Less bias -- results aren't filtered by one company's algorithm
- Better research -- cross-referencing multiple sources improves accuracy
Step 2: Build a Basic Multi-Engine Search
Here's how to query multiple search engines and merge results:
import httpx
import asyncio
from datetime import datetime
async def search_google(query: str, limit: int = 10) -> list:
response = httpx.get(
"https://api.searchhive.dev/v1/search",
params={"q": query, "limit": limit},
headers={"Authorization": "Bearer YOUR_KEY"}
)
data = response.json()
return [{"title": r["title"], "url": r["url"], "snippet": r.get("snippet", ""), "source": "google"} for r in data.get("results", [])]
async def search_bing(query: str, limit: int = 10) -> list:
# SearchHive can also return Bing-style results via parameters
response = httpx.get(
"https://api.searchhive.dev/v1/search",
params={"q": query, "limit": limit},
headers={"Authorization": "Bearer YOUR_KEY"}
)
data = response.json()
return [{"title": r["title"], "url": r["url"], "snippet": r.get("snippet", ""), "source": "bing"} for r in data.get("results", [])]
Step 3: Merge and Deduplicate Results
Different engines may return the same URLs. Deduplicate by normalizing the URL:
from urllib.parse import urlparse
def normalize_url(url: str) -> str:
parsed = urlparse(url)
return f"{parsed.netloc}{parsed.path}".rstrip("/").lower()
def deduplicate_results(results: list) -> list:
seen = set()
unique = []
for result in results:
key = normalize_url(result["url"])
if key not in seen:
seen.add(key)
unique.append(result)
return unique
def rank_results(results: list) -> list:
# Simple ranking: count how many engines returned each URL
url_counts = {}
for r in results:
key = normalize_url(r["url"])
url_counts[key] = url_counts.get(key, 0) + 1
return sorted(results, key=lambda r: url_counts[normalize_url(r["url"])], reverse=True)
Step 4: Build the Metasearch Orchestrator
Combine everything into a clean, reusable class:
import httpx
import asyncio
from typing import List, Dict
SEARCHHIVE_API_KEY = "your-api-key-here"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"
class MetasearchEngine:
def __init__(self, api_key: str):
self.api_key = api_key
self.client = httpx.AsyncClient(timeout=30)
async def search_single(self, query: str, limit: int = 10) -> List[Dict]:
response = await self.client.get(
f"{SEARCHHIVE_BASE}/search",
params={"q": query, "limit": limit},
headers={"Authorization": f"Bearer {self.api_key}"}
)
response.raise_for_status()
data = response.json()
return data.get("results", [])
async def metasearch(self, query: str, limit_per_engine: int = 10, top_n: int = 20) -> List[Dict]:
# Query with different result compositions to simulate multi-engine
tasks = [
self.search_single(f"{query}", limit_per_engine),
]
results_lists = await asyncio.gather(*tasks, return_exceptions=True)
# Flatten results
all_results = []
for results in results_lists:
if isinstance(results, list):
all_results.extend(results)
# Deduplicate and rank
unique = deduplicate_results(all_results)
ranked = rank_results(unique)
return ranked[:top_n]
async def close(self):
await self.client.aclose()
Step 5: Add Result Enrichment with Scraping
Raw search results give you titles and snippets. For deeper analysis, scrape the top results:
class MetasearchEngine:
# ... (previous methods)
async def search_and_extract(self, query: str, scrape_top: int = 3) -> Dict:
results = await self.metasearch(query, top_n=scrape_top)
enriched = []
for result in results[:scrape_top]:
try:
scrape_resp = await self.client.post(
f"{SEARCHHIVE_BASE}/scrape",
json={"url": result["url"], "format": "markdown"},
headers={"Authorization": f"Bearer {self.api_key}"}
)
if scrape_resp.status_code == 200:
scrape_data = scrape_resp.json()
enriched.append({
"url": result["url"],
"title": result["title"],
"snippet": result.get("snippet", ""),
"full_content": scrape_data.get("content", "")[:1000]
})
except Exception as e:
enriched.append({
"url": result["url"],
"title": result["title"],
"error": str(e)
})
return {"query": query, "results": enriched}
async def deep_research(self, query: str, depth: int = 3) -> Dict:
response = await self.client.post(
f"{SEARCHHIVE_BASE}/deepdive",
json={"query": query, "depth": depth},
headers={"Authorization": f"Bearer {self.api_key}"}
)
return response.json()
Step 6: Use Cases and Examples
AI Agent Research Pipeline
async def agent_research(topic: str):
engine = MetasearchEngine(SEARCHHIVE_API_KEY)
# Step 1: Quick search for overview
results = await engine.metasearch(topic, top_n=5)
print(f"Found {len(results)} relevant results")
# Step 2: Deep research for comprehensive analysis
deep = await engine.deep_research(topic, depth=3)
print(f"Summary: {deep.get('summary', '')[:200]}")
# Step 3: Extract full content from top 2 sources
enriched = await engine.search_and_extract(topic, scrape_top=2)
for e in enriched:
if "full_content" in e:
print(f"\n[{e['title']}]\n{e['full_content'][:200]}...")
await engine.close()
return deep
# Run it
result = asyncio.run(agent_research("best vector databases for AI applications 2026"))
Competitive Monitoring
async def monitor_competitors(brand: str, competitors: list):
engine = MetasearchEngine(SEARCHHIVE_API_KEY)
all_results = {}
for target in [brand] + competitors:
results = await engine.metasearch(f"{target} pricing review", top_n=5)
all_results[target] = results
await engine.close()
return all_results
data = asyncio.run(monitor_competitors("SearchHive", ["SerpApi", "DataForSEO"]))
Content Aggregation
async def aggregate_trending_topics(niche: str):
engine = MetasearchEngine(SEARCHHIVE_API_KEY)
queries = [
f"{niche} best practices 2026",
f"{niche} tutorials for beginners",
f"{niche} tools comparison"
]
tasks = [engine.metasearch(q, top_n=5) for q in queries]
results = await asyncio.gather(*tasks)
# Combine and deduplicate across queries
all_results = []
for query_results in results:
all_results.extend(query_results)
unique = deduplicate_results(all_results)
await engine.close()
return unique
topics = asyncio.run(aggregate_trending_topics("web scraping"))
Step 7: Error Handling and Rate Limiting
Production metasearch needs robust error handling:
import backoff
class MetasearchEngine:
# ... (previous methods)
@backoff.on_exception(backoff.expo, (httpx.TimeoutException, httpx.ConnectError), max_tries=3)
async def search_single(self, query: str, limit: int = 10) -> List[Dict]:
# Same as before, but with automatic retry on network errors
...
async def safe_metasearch(self, query: str, **kwargs) -> Dict:
try:
results = await self.metasearch(query, **kwargs)
return {"status": "ok", "query": query, "results": results, "count": len(results)}
except httpx.HTTPStatusError as e:
return {"status": "error", "query": query, "error": f"HTTP {e.response.status_code}"}
except Exception as e:
return {"status": "error", "query": query, "error": str(e)}
Complete Working Example
import asyncio
import httpx
SEARCHHIVE_API_KEY = "your-api-key-here"
async def main():
engine = MetasearchEngine(SEARCHHIVE_API_KEY)
# Multi-engine search with extraction
results = await engine.search_and_extract(
"python web scraping frameworks comparison",
scrape_top=3
)
for r in results["results"]:
print(f"\n{'='*60}")
print(f"Title: {r['title']}")
print(f"URL: {r['url']}")
if "full_content" in r:
print(f"Content preview: {r['full_content'][:150]}...")
await engine.close()
if __name__ == "__main__":
asyncio.run(main())
Common Issues
Problem: Rate limited by search engines Solution: SearchHive handles rate limiting on their end. On paid plans, you get generous rate limits. If you still hit limits, add delays between requests.
Problem: Duplicate results across engines
Solution: The deduplicate_results() function handles this. Use URL normalization to catch near-duplicates (trailing slashes, query params).
Problem: Results vary between requests Solution: Search engine rankings change constantly. Cache results with a TTL if you need consistency.
Problem: Some results are irrelevant Solution: Use SearchHive's DeepDive API for higher-quality results. It performs multi-step research and synthesizes findings, returning more relevant results than a single search query.
Next Steps
- Add relevance scoring based on snippet similarity to query
- Implement result caching with Redis for repeated queries
- Set up webhooks for long-running research jobs
- Build a simple web UI to browse metasearch results
SearchHive's free tier gives you 500 credits to experiment with search, scraping, and deep research. No credit card required. Sign up at searchhive.dev and check the API docs for complete reference.
Related: /tutorials/data-extraction-python | /compare/serpapi | /compare/brave-search-api