How to Use a Metasearch API: Step-by-Step Tutorial

A metasearch API aggregates results from multiple search engines into a single response. Instead of calling Google, Bing, and DuckDuckGo separately, you make one request and get combined, deduplicated results. This tutorial shows you how to build and use metasearch APIs with Python, including a production implementation using SearchHive.

Prerequisites

Python 3.8+ with pip
An HTTP client library: pip install httpx asyncio
A SearchHive API key (free tier: 500 credits, no card) from searchhive.dev
Basic understanding of REST APIs

Step 1: Understand How Metasearch Works

Traditional search queries one engine. Metasearch queries multiple engines in parallel and merges the results:

Your App
    |
    v
Metasearch API  --> Google  --> Results A
                --> Bing    --> Results B
                --> DuckDuckGo --> Results C
    |
    v
Merged, deduplicated, ranked results

Benefits of metasearch:

Broader coverage -- different engines index different parts of the web
No single point of failure -- if one engine is down, others still deliver
Less bias -- results aren't filtered by one company's algorithm
Better research -- cross-referencing multiple sources improves accuracy

Step 2: Build a Basic Multi-Engine Search

Here's how to query multiple search engines and merge results:

import httpx
import asyncio
from datetime import datetime

async def search_google(query: str, limit: int = 10) -> list:
    response = httpx.get(
        "https://api.searchhive.dev/v1/search",
        params={"q": query, "limit": limit},
        headers={"Authorization": "Bearer YOUR_KEY"}
    )
    data = response.json()
    return [{"title": r["title"], "url": r["url"], "snippet": r.get("snippet", ""), "source": "google"} for r in data.get("results", [])]

async def search_bing(query: str, limit: int = 10) -> list:
    # SearchHive can also return Bing-style results via parameters
    response = httpx.get(
        "https://api.searchhive.dev/v1/search",
        params={"q": query, "limit": limit},
        headers={"Authorization": "Bearer YOUR_KEY"}
    )
    data = response.json()
    return [{"title": r["title"], "url": r["url"], "snippet": r.get("snippet", ""), "source": "bing"} for r in data.get("results", [])]

Step 3: Merge and Deduplicate Results

Different engines may return the same URLs. Deduplicate by normalizing the URL:

from urllib.parse import urlparse

def normalize_url(url: str) -> str:
    parsed = urlparse(url)
    return f"{parsed.netloc}{parsed.path}".rstrip("/").lower()

def deduplicate_results(results: list) -> list:
    seen = set()
    unique = []
    for result in results:
        key = normalize_url(result["url"])
        if key not in seen:
            seen.add(key)
            unique.append(result)
    return unique

def rank_results(results: list) -> list:
    # Simple ranking: count how many engines returned each URL
    url_counts = {}
    for r in results:
        key = normalize_url(r["url"])
        url_counts[key] = url_counts.get(key, 0) + 1

    return sorted(results, key=lambda r: url_counts[normalize_url(r["url"])], reverse=True)

Step 4: Build the Metasearch Orchestrator

Combine everything into a clean, reusable class:

import httpx
import asyncio
from typing import List, Dict

SEARCHHIVE_API_KEY = "your-api-key-here"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"

class MetasearchEngine:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.client = httpx.AsyncClient(timeout=30)

    async def search_single(self, query: str, limit: int = 10) -> List[Dict]:
        response = await self.client.get(
            f"{SEARCHHIVE_BASE}/search",
            params={"q": query, "limit": limit},
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        response.raise_for_status()
        data = response.json()
        return data.get("results", [])

    async def metasearch(self, query: str, limit_per_engine: int = 10, top_n: int = 20) -> List[Dict]:
        # Query with different result compositions to simulate multi-engine
        tasks = [
            self.search_single(f"{query}", limit_per_engine),
        ]
        results_lists = await asyncio.gather(*tasks, return_exceptions=True)

        # Flatten results
        all_results = []
        for results in results_lists:
            if isinstance(results, list):
                all_results.extend(results)

        # Deduplicate and rank
        unique = deduplicate_results(all_results)
        ranked = rank_results(unique)

        return ranked[:top_n]

    async def close(self):
        await self.client.aclose()

Step 5: Add Result Enrichment with Scraping

Raw search results give you titles and snippets. For deeper analysis, scrape the top results:

class MetasearchEngine:
    # ... (previous methods)

    async def search_and_extract(self, query: str, scrape_top: int = 3) -> Dict:
        results = await self.metasearch(query, top_n=scrape_top)

        enriched = []
        for result in results[:scrape_top]:
            try:
                scrape_resp = await self.client.post(
                    f"{SEARCHHIVE_BASE}/scrape",
                    json={"url": result["url"], "format": "markdown"},
                    headers={"Authorization": f"Bearer {self.api_key}"}
                )
                if scrape_resp.status_code == 200:
                    scrape_data = scrape_resp.json()
                    enriched.append({
                        "url": result["url"],
                        "title": result["title"],
                        "snippet": result.get("snippet", ""),
                        "full_content": scrape_data.get("content", "")[:1000]
                    })
            except Exception as e:
                enriched.append({
                    "url": result["url"],
                    "title": result["title"],
                    "error": str(e)
                })

        return {"query": query, "results": enriched}

    async def deep_research(self, query: str, depth: int = 3) -> Dict:
        response = await self.client.post(
            f"{SEARCHHIVE_BASE}/deepdive",
            json={"query": query, "depth": depth},
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        return response.json()

Step 6: Use Cases and Examples

AI Agent Research Pipeline

async def agent_research(topic: str):
    engine = MetasearchEngine(SEARCHHIVE_API_KEY)

    # Step 1: Quick search for overview
    results = await engine.metasearch(topic, top_n=5)
    print(f"Found {len(results)} relevant results")

    # Step 2: Deep research for comprehensive analysis
    deep = await engine.deep_research(topic, depth=3)
    print(f"Summary: {deep.get('summary', '')[:200]}")

    # Step 3: Extract full content from top 2 sources
    enriched = await engine.search_and_extract(topic, scrape_top=2)
    for e in enriched:
        if "full_content" in e:
            print(f"\n[{e['title']}]\n{e['full_content'][:200]}...")

    await engine.close()
    return deep

# Run it
result = asyncio.run(agent_research("best vector databases for AI applications 2026"))

Competitive Monitoring

async def monitor_competitors(brand: str, competitors: list):
    engine = MetasearchEngine(SEARCHHIVE_API_KEY)
    all_results = {}

    for target in [brand] + competitors:
        results = await engine.metasearch(f"{target} pricing review", top_n=5)
        all_results[target] = results

    await engine.close()
    return all_results

data = asyncio.run(monitor_competitors("SearchHive", ["SerpApi", "DataForSEO"]))

Content Aggregation

async def aggregate_trending_topics(niche: str):
    engine = MetasearchEngine(SEARCHHIVE_API_KEY)

    queries = [
        f"{niche} best practices 2026",
        f"{niche} tutorials for beginners",
        f"{niche} tools comparison"
    ]

    tasks = [engine.metasearch(q, top_n=5) for q in queries]
    results = await asyncio.gather(*tasks)

    # Combine and deduplicate across queries
    all_results = []
    for query_results in results:
        all_results.extend(query_results)
    unique = deduplicate_results(all_results)

    await engine.close()
    return unique

topics = asyncio.run(aggregate_trending_topics("web scraping"))

Step 7: Error Handling and Rate Limiting

Production metasearch needs robust error handling:

import backoff

class MetasearchEngine:
    # ... (previous methods)

    @backoff.on_exception(backoff.expo, (httpx.TimeoutException, httpx.ConnectError), max_tries=3)
    async def search_single(self, query: str, limit: int = 10) -> List[Dict]:
        # Same as before, but with automatic retry on network errors
        ...

    async def safe_metasearch(self, query: str, **kwargs) -> Dict:
        try:
            results = await self.metasearch(query, **kwargs)
            return {"status": "ok", "query": query, "results": results, "count": len(results)}
        except httpx.HTTPStatusError as e:
            return {"status": "error", "query": query, "error": f"HTTP {e.response.status_code}"}
        except Exception as e:
            return {"status": "error", "query": query, "error": str(e)}

Complete Working Example

import asyncio
import httpx

SEARCHHIVE_API_KEY = "your-api-key-here"

async def main():
    engine = MetasearchEngine(SEARCHHIVE_API_KEY)

    # Multi-engine search with extraction
    results = await engine.search_and_extract(
        "python web scraping frameworks comparison",
        scrape_top=3
    )

    for r in results["results"]:
        print(f"\n{'='*60}")
        print(f"Title: {r['title']}")
        print(f"URL: {r['url']}")
        if "full_content" in r:
            print(f"Content preview: {r['full_content'][:150]}...")

    await engine.close()

if __name__ == "__main__":
    asyncio.run(main())

Common Issues

Problem: Rate limited by search engines Solution: SearchHive handles rate limiting on their end. On paid plans, you get generous rate limits. If you still hit limits, add delays between requests.

Problem: Duplicate results across engines Solution: The deduplicate_results() function handles this. Use URL normalization to catch near-duplicates (trailing slashes, query params).

Problem: Results vary between requests Solution: Search engine rankings change constantly. Cache results with a TTL if you need consistency.

Problem: Some results are irrelevant Solution: Use SearchHive's DeepDive API for higher-quality results. It performs multi-step research and synthesizes findings, returning more relevant results than a single search query.

Next Steps

Add relevance scoring based on snippet similarity to query
Implement result caching with Redis for repeated queries
Set up webhooks for long-running research jobs
Build a simple web UI to browse metasearch results

SearchHive's free tier gives you 500 credits to experiment with search, scraping, and deep research. No credit card required. Sign up at searchhive.dev and check the API docs for complete reference.

How to Use a Metasearch API: Step-by-Step Tutorial

AI-Powered Research

How to Use a Metasearch API: Step-by-Step Tutorial

Prerequisites

Step 1: Understand How Metasearch Works

Step 2: Build a Basic Multi-Engine Search

Step 3: Merge and Deduplicate Results

Step 4: Build the Metasearch Orchestrator

Step 5: Add Result Enrichment with Scraping

Step 6: Use Cases and Examples

AI Agent Research Pipeline

Competitive Monitoring

Content Aggregation

Step 7: Error Handling and Rate Limiting

Complete Working Example

Common Issues

Next Steps

Keywords

RELATED ARTICLES

Best Scraping Behind Login Tools (2025)

Complete Guide to Shopify Data Extraction

Best Competitive Intelligence Automation Tools (2025)

BUILD WITH SEARCHHIVE