How to Use a Search API for RAG -- Step-by-Step Tutorial

RAG (Retrieval-Augmented Generation) combines an LLM with real-time data retrieval to produce accurate, grounded answers. A search API is the retrieval engine that feeds relevant documents into the LLM's context.

This tutorial walks through building a production RAG pipeline using SearchHive's SwiftSearch API, from basic setup to advanced optimization.

Prerequisites

Python 3.9+
A SearchHive API key (free, 500 credits)
An LLM API key (OpenAI, Anthropic, or local model)
Basic familiarity with Python and REST APIs

Key Takeaways

RAG reduces hallucinations by 40-60% compared to bare LLMs
Search quality matters more than LLM quality for factual accuracy
Token-efficient retrieval saves money -- clean snippets beat raw web pages
SearchHive provides the cheapest unified search + scrape + research API for RAG

Step 1: Understand the RAG Architecture

A RAG system has four components:

Query processing: Transform the user question into an effective search query
Retrieval: Search for relevant documents using a search API
Context assembly: Format and rank the retrieved documents
Generation: Pass the context to an LLM for answer generation

The search API is the most critical component. Garbage in, garbage out -- if retrieval returns irrelevant results, the LLM can't generate a good answer.

Step 2: Set Up the Search Client

Create a search client that wraps SearchHive's SwiftSearch API:

import requests
from typing import Optional

SEARCHHIVE_API_KEY = "your_key"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1/swift-search"

def search_web(
    query: str,
    limit: int = 5,
    format: str = "markdown",
    recency: Optional[str] = None
) -> list[dict]:
    """Search the web and return structured results."""
    payload = {
        "query": query,
        "limit": limit,
        "format": format,
    }
    if recency:
        payload["recency"] = recency

    resp = requests.post(
        SEARCHHIVE_BASE,
        headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
        json=payload,
        timeout=15
    )
    resp.raise_for_status()
    data = resp.json()
    return data.get("results", [])

Step 3: Build the Context Window

Format search results into a context string that the LLM can use. The key is to include source attribution and keep the context token-efficient:

def build_context(results: list[dict], max_chars: int = 8000) -> str:
    """Build a token-efficient context from search results."""
    parts = []
    total_chars = 0

    for i, result in enumerate(results, 1):
        title = result.get("title", "Untitled")
        snippet = result.get("snippet", "")
        url = result.get("url", "")

        entry = f"[{i}] {title}\nSource: {url}\n{snippet}\n"
        entry_len = len(entry)

        if total_chars + entry_len > max_chars:
            # Truncate the last entry to fit
            remaining = max_chars - total_chars
            if remaining > 100:
                entry = f"[{i}] {title}\nSource: {url}\n{snippet[:remaining - len(title) - len(url) - 10]}\n"
                parts.append(entry)
            break

        parts.append(entry)
        total_chars += entry_len

    return "\n".join(parts)

Step 4: Implement the RAG Pipeline

Connect the search retrieval to your LLM for answer generation:

import json

OPENAI_API_KEY = "your_openai_key"

def rag_query(question: str, search_depth: str = "standard") -> str:
    """Full RAG pipeline: search -> context -> generate."""

    # Step 1: Retrieve relevant documents
    results = search_web(question, limit=5)

    if not results:
        return "I couldn't find relevant information for your question. Try rephrasing or check the topic."

    # Step 2: Build context
    context = build_context(results)

    # Step 3: Generate answer using LLM
    system_prompt = """You are a research assistant. Answer questions using ONLY the provided search results.
- Cite sources using [1], [2], etc. referencing the numbered results
- If the search results don't contain enough information, say so clearly
- Be concise and factual
- Do not make up information not present in the search results"""

    user_prompt = f"""SEARCH RESULTS:
{context}

QUESTION: {question}

ANSWER (with source citations):"""

    resp = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
        json={
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            "temperature": 0,  # Factual, not creative
            "max_tokens": 1000
        },
        timeout=30
    )

    answer = resp.json()["choices"][0]["message"]["content"]
    return answer

# Usage
answer = rag_query("What is the pricing for SearchHive API?")
print(answer)

Step 5: Add Query Expansion for Better Retrieval

User questions are often vague. Query expansion reformulates the question into more effective search queries:

def expand_query(question: str) -> list[str]:
    """Generate multiple search queries from a single question."""
    # Strategy 1: Direct query
    queries = [question]

    # Strategy 2: Extract key terms (simple approach)
    # Remove question words and common filler
    stop_words = {"what", "is", "the", "how", "does", "a", "an", "to", "for", "of", "and", "in", "on", "can", "best"}
    terms = [w for w in question.lower().split() if w not in stop_words and len(w) > 2]
    if terms:
        queries.append(" ".join(terms))

    # Strategy 3: Add context qualifiers
    if any(w in question.lower() for w in ["price", "cost", "pricing", "cheap", "expensive"]):
        queries.append(f"{question} pricing comparison 2026")
    if any(w in question.lower() for w in ["best", "top", "recommend"]):
        queries.append(f"{question} review comparison")

    return queries[:3]  # Limit to avoid excessive API calls

def rag_with_expansion(question: str) -> str:
    """RAG pipeline with query expansion."""
    all_results = []
    seen_urls = set()

    for query in expand_query(question):
        results = search_web(query, limit=5)
        for r in results:
            url = r.get("url", "")
            if url not in seen_urls:
                all_results.append(r)
                seen_urls.add(url)

    # Deduplicate and take top results
    context = build_context(all_results[:7])
    # ... proceed with LLM generation (same as Step 4)

Step 6: Implement Multi-Source Retrieval

For complex questions, combine search snippets with full page content for deeper context:

def deep_rag(question: str) -> str:
    """RAG with fallback to full page content for complex queries."""

    # Start with search snippets
    results = search_web(question, limit=5)

    # For complex questions, also scrape the top 2 results for full content
    full_contents = []
    for result in results[:2]:
        url = result.get("url")
        if url:
            try:
                scrape_resp = requests.post(
                    "https://api.searchhive.dev/v1/scrapeforge",
                    headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
                    json={
                        "url": url,
                        "format": "markdown",
                        "render_js": True
                    },
                    timeout=30
                )
                scrape_resp.raise_for_status()
                content = scrape_resp.json().get("content", "")
                if content:
                    full_contents.append(f"FULL CONTENT from {url}:\n{content[:3000]}\n")
            except Exception:
                continue

    # Combine snippets and full content
    snippet_context = build_context(results)
    full_context = "\n\n".join(full_contents)

    combined_context = f"SNIPPETS:\n{snippet_context}\n\nFULL PAGES:\n{full_context}"

    # Pass to LLM with combined context
    # ... (same LLM call as Step 4, but with combined_context)
    return combined_context  # In practice, pass to LLM

Step 7: Add Response Quality Evaluation

Evaluate whether your RAG system is actually improving answers:

def evaluate_rag(question: str, answer: str, sources: list[dict]) -> dict:
    """Simple self-evaluation of RAG response quality."""
    # Check if answer cites sources
    has_citations = any(f"[{i}]" in answer for i in range(1, len(sources) + 1))

    # Check if answer acknowledges insufficient info (honesty)
    honest_disclaimer = any(phrase in answer.lower() for phrase in [
        "couldn't find", "not enough information", "unclear", "not specified"
    ])

    # Token efficiency (rough estimate: 1 token ~ 4 chars)
    context_chars = sum(len(s.get("snippet", "")) for s in sources)
    answer_chars = len(answer)
    efficiency = answer_chars / max(context_chars, 1)

    return {
        "has_citations": has_citations,
        "honest_when_uncertain": honest_disclaimer,
        "context_used_chars": context_chars,
        "answer_length_chars": answer_chars,
        "context_efficiency": round(efficiency, 2),
        "num_sources": len(sources)
    }

Complete Code Example

Here's the full pipeline in one file:

import requests
from typing import Optional

SEARCHHIVE_KEY = "your_key"
OPENAI_KEY = "your_openai_key"

def search_web(query: str, limit: int = 5) -> list:
    resp = requests.post(
        "https://api.searchhive.dev/v1/swift-search",
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        json={"query": query, "limit": limit, "format": "markdown"},
        timeout=15
    )
    resp.raise_for_status()
    return resp.json().get("results", [])

def build_context(results: list, max_chars: int = 8000) -> str:
    parts, total = [], 0
    for i, r in enumerate(results, 1):
        entry = f"[{i}] {r.get('title', '')}\n{r.get('url', '')}\n{r.get('snippet', '')}\n"
        if total + len(entry) > max_chars:
            break
        parts.append(entry)
        total += len(entry)
    return "\n".join(parts)

def ask(question: str) -> str:
    results = search_web(question, limit=5)
    if not results:
        return "No relevant results found."
    context = build_context(results)
    resp = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {OPENAI_KEY}"},
        json={
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "Answer using ONLY the provided search results. Cite sources as [1], [2], etc."},
                {"role": "user", "content": f"RESULTS:\n{context}\n\nQ: {question}\nA:"}
            ],
            "temperature": 0,
            "max_tokens": 800
        },
        timeout=30
    )
    return resp.json()["choices"][0]["message"]["content"]

# Test it
print(ask("What is SearchHive and how much does it cost?"))

Common Issues

Irrelevant search results: Improve your query formulation. Add site-specific operators (e.g., site:docs.openai.com) for technical questions. Use the recency parameter for time-sensitive queries.

Context window overflow: Large search results can exceed the LLM's context limit. The build_context function with max_chars prevents this. GPT-4o supports 128K tokens (~500K chars), but keeping context under 8K chars improves accuracy and reduces cost.

Slow responses: SearchHive SwiftSearch responds in ~200ms. The bottleneck is usually the LLM. Use gpt-4o-mini for speed, gpt-4o for quality, or local models for cost.

Cost optimization: At GPT-4o-mini pricing ($0.15/1M input tokens), a typical RAG query with 5 search results costs less than $0.001 in LLM tokens. SearchHive credits at $0.0001 each make the search layer equally cheap. Total per-query cost: under $0.002.

Next Steps

Add caching: Cache search results for common questions to reduce API calls
Implement DeepDive: Use SearchHive's DeepDive for research-heavy questions that need multi-page synthesis
Build evaluation: Track answer accuracy over time with a test set
Add filtering: Restrict search to trusted sources for domain-specific applications

Summary

A search API is the foundation of any effective RAG system. SearchHive's SwiftSearch provides fast, token-efficient retrieval that grounds LLM answers in real-time web data. Combined with ScrapeForge for deep content extraction, you have everything needed for production-grade RAG.

Start with 500 free credits -- no credit card required. Build your first RAG pipeline in under 50 lines of Python. Check out the docs for complete API references and integration guides.

For more on LLM integration patterns, see /blog/api-for-llm-integration-common-questions-answered and /compare/tavily.

How to Use a Search API for RAG: Step-by-Step Tutorial

AI-Powered Research