How to Build a RAG Pipeline with Web Search -- Step-by-Step

Retrieval Augmented Generation (RAG) combines an LLM's reasoning ability with external knowledge retrieval. Instead of relying only on training data, a RAG pipeline fetches relevant documents, web pages, or search results and includes them in the LLM's context. This tutorial shows you how to build a production RAG pipeline using SearchHive for web search and retrieval, with practical Python code you can deploy today.

Key Takeaways

A RAG pipeline has four stages: Query, Retrieve, Augment, Generate
SwiftSearch finds relevant web sources for any user question
ScrapeForge fetches full page content from those sources
DeepDive extracts structured data when you need specific facts
SearchHive starts at $0 (500 free credits) -- enough to build and test a complete pipeline

Prerequisites

Python 3.9+
An OpenAI API key (or Anthropic, Gemini)
A SearchHive API key (free at searchhive.dev)
Basic understanding of how LLMs work

pip install openai httpx tiktoken

Step 1: Set Up the Web Retriever

The retriever finds relevant web content for a user's query. We use SwiftSearch to find sources and ScrapeForge to fetch full content.

# rag/retriever.py
import httpx
import json
from typing import Optional

class WebRetriever:
    """Retrieve relevant web content for RAG pipelines."""

    BASE_URL = "https://api.searchhive.dev/v1"

    def __init__(self, api_key: str, max_sources: int = 5):
        self.api_key = api_key
        self.headers = {"Authorization": f"Bearer {api_key}"}
        self.max_sources = max_sources

    def search_and_retrieve(self, query: str) -> list[dict]:
        """Search the web and fetch full content from top results.

        Returns list of dicts with 'title', 'url', 'content', 'snippet'.
        """
        # Step 1: Find relevant sources via SwiftSearch
        search_resp = httpx.post(
            f"{self.BASE_URL}/swiftsearch",
            headers=self.headers,
            json={"query": query, "num_results": self.max_sources * 2},
            timeout=30.0
        )
        search_resp.raise_for_status()
        results = search_resp.json().get("results", [])

        # Step 2: Fetch full content from top results via ScrapeForge
        documents = []
        for result in results[:self.max_sources]:
            try:
                scrape_resp = httpx.post(
                    f"{self.BASE_URL}/scrapeforge",
                    headers=self.headers,
                    json={
                        "url": result["url"],
                        "render_js": True,
                        "format": "markdown"
                    },
                    timeout=60.0
                )
                scrape_resp.raise_for_status()
                content = scrape_resp.json().get("content", "")

                documents.append({
                    "title": result.get("title", ""),
                    "url": result.get("url", ""),
                    "snippet": result.get("snippet", ""),
                    "content": content[:4000]  # Truncate to control context size
                })
            except Exception as e:
                print(f"  Warning: could not scrape {result['url']}: {e}")
                # Fall back to snippet from search results
                documents.append({
                    "title": result.get("title", ""),
                    "url": result.get("url", ""),
                    "snippet": result.get("snippet", ""),
                    "content": result.get("snippet", "")
                })

        return documents

Step 2: Build the Context Builder

The context builder formats retrieved documents into a prompt that the LLM can use. This is the "Augment" step.

# rag/context.py
from typing import Optional

def build_context(documents: list[dict], max_tokens: int = 6000) -> str:
    """Build RAG context from retrieved documents.

    Args:
        documents: List of retrieved document dicts
        max_tokens: Approximate token limit for context (1 token ~ 4 chars)
    """
    max_chars = max_tokens * 4
    context_parts = []

    total_chars = 0
    for i, doc in enumerate(documents, 1):
        source_line = f"[Source {i}: {doc['title']}]({doc['url']})
"
        content = doc.get("content", doc.get("snippet", ""))
        section = f"{source_line}{content}

"

        if total_chars + len(section) > max_chars:
            # Truncate to fit remaining space
            remaining = max_chars - total_chars - len(source_line)
            if remaining > 200:
                section = f"{source_line}{content[:remaining]}...

"
            else:
                break

        context_parts.append(section)
        total_chars += len(section)

    return "".join(context_parts)

def build_rag_prompt(query: str, context: str) -> list[dict]:
    """Build the full RAG prompt with system message."""
    return [
        {
            "role": "system",
            "content": (
                "You are a helpful research assistant. Answer the user's question "
                "using the provided web sources. Cite sources by number [1], [2], etc. "
                "If the sources do not contain enough information, say so clearly "
                "rather than making up answers."
            )
        },
        {
            "role": "user",
            "content": (
                f"## Sources

{context}
"
                f"## Question

{query}

"
                f"Provide a comprehensive answer based on the sources above. "
                f"Include specific facts and cite your sources."
            )
        }
    ]

Step 3: Implement the Generator

The generator calls the LLM with the augmented context to produce a grounded answer.

# rag/generator.py
import openai
from context import build_rag_prompt
from typing import Optional

class RAGGenerator:
    """LLM generator for RAG pipeline."""

    def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
        self.client = openai.OpenAI(api_key=api_key)
        self.model = model

    def generate(self, query: str, context: str,
                 temperature: float = 0.3) -> dict:
        """Generate a RAG response.

        Returns dict with 'answer', 'model', 'prompt_tokens', 'completion_tokens'.
        """
        messages = build_rag_prompt(query, context)

        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=temperature,
            max_tokens=1500
        )

        return {
            "answer": response.choices[0].message.content,
            "model": self.model,
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens
        }

Step 4: Build the Complete Pipeline

Wire everything together into a single RAG pipeline class.

# rag/pipeline.py
import json
from retriever import WebRetriever
from context import build_context
from generator import RAGGenerator

class RAGPipeline:
    """Complete RAG pipeline: Query -> Retrieve -> Augment -> Generate."""

    def __init__(
        self,
        searchhive_key: str,
        openai_key: str,
        max_sources: int = 5,
        max_context_tokens: int = 6000,
        model: str = "gpt-4o-mini"
    ):
        self.retriever = WebRetriever(searchhive_key, max_sources)
        self.generator = RAGGenerator(openai_key, model)
        self.max_context_tokens = max_context_tokens

    def query(self, question: str, verbose: bool = True) -> dict:
        """Run a full RAG query.

        Returns dict with 'answer', 'sources', 'token_usage'.
        """
        # Step 1: Retrieve relevant documents
        if verbose:
            print(f"Query: {question}")
            print("Retrieving sources...")

        documents = self.retriever.search_and_retrieve(question)

        if verbose:
            print(f"  Found {len(documents)} sources")
            for i, doc in enumerate(documents, 1):
                print(f"  [{i}] {doc['title']} ({len(doc['content'])} chars)")

        # Step 2: Build context
        context = build_context(documents, self.max_context_tokens)

        # Step 3: Generate answer
        if verbose:
            print("Generating answer...")

        result = self.generator.generate(question, context)

        if verbose:
            print(f"  Tokens: {result['prompt_tokens']} prompt + "
                  f"{result['completion_tokens']} completion")
            print(f"\n{result['answer']}")

        return {
            "answer": result["answer"],
            "sources": [
                {"title": d["title"], "url": d["url"]}
                for d in documents
            ],
            "token_usage": {
                "prompt": result["prompt_tokens"],
                "completion": result["completion_tokens"],
                "total": result["total_tokens"]
            }
        }


# Usage
if __name__ == "__main__":
    import os

    pipeline = RAGPipeline(
        searchhive_key=os.environ.get("SEARCHHIVE_API_KEY", "sh_live_..."),
        openai_key=os.environ.get("OPENAI_API_KEY", "sk-..."),
        max_sources=4
    )

    result = pipeline.query(
        "What are the best practices for building RAG systems in 2026?"
    )

    print(f"\n\nSources used:")
    for src in result["sources"]:
        print(f"  - {src['title']}: {src['url']}")

Step 5: Add Structured Data Retrieval

For factual queries (pricing, specs, statistics), use DeepDive to extract structured data instead of full page content:

# rag/structured_retriever.py
import httpx

def retrieve_structured(query: str, urls: list[str],
                        api_key: str) -> list[dict]:
    """Extract structured facts from specific URLs using DeepDive."""
    facts = []

    for url in urls:
        response = httpx.post(
            "https://api.searchhive.dev/v1/deepdive",
            headers={"Authorization": f"Bearer {api_key}"},
            json={
                "url": url,
                "extract": {
                    "key_facts": {
                        "type": "array",
                        "description": f"Key facts relevant to: {query}",
                        "items": {"type": "string"}
                    },
                    "statistics": {
                        "type": "array",
                        "description": "Numbers, percentages, or measurements mentioned",
                        "items": {"type": "string"}
                    }
                }
            },
            timeout=60.0
        )
        data = response.json().get("data", {})
        facts.append({"url": url, **data})

    return facts

Step 6: Cost Analysis

Here is what a RAG pipeline costs per query:

Component	Cost per Query	Notes
SwiftSearch (5 results)	$0.0025	~5 credits
ScrapeForge (4 pages)	$0.02	~40 credits (JS rendering)
DeepDive (optional)	$0.01	~10 credits
GPT-4o-mini (6K prompt + 1K output)	~$0.005
Total per query	~$0.03-0.04

At $0.03-0.04 per query, you can run 1,000 RAG queries for $30-40 -- well within the Builder plan ($49/month for 100K credits).

Common Issues

Slow responses: The pipeline makes multiple sequential HTTP calls (search + scrape + generate). Use asyncio with httpx.AsyncClient to parallelize search and scraping. Expect 5-15 seconds per query with the synchronous version.

Context too large: LLMs have context limits. The build_context function truncates content, but if you need more sources, use a map-reduce approach: summarize each document individually, then combine summaries.

Irrelevant sources: SwiftSearch returns relevant results for most queries. For ambiguous queries, reformulate with specific terms. You can also use SwiftSearch with different search engines (Google vs Bing) and merge results.

Stale data: Web content changes. For applications where freshness matters, add a caching layer with TTL and re-scrape when the cache expires.

Complete Code Example

The full pipeline structure:

rag_pipeline/
  retriever.py          # Web search + page scraping
  context.py            # Context formatting
  generator.py          # LLM generation
  pipeline.py           # Orchestration
  structured_retriever.py  # Structured data extraction
  requirements.txt      # openai, httpx, tiktoken

Next Steps

Add vector embeddings for local document retrieval (combine with web retrieval)
Implement query reformulation for better search results
Add source quality scoring to prioritize authoritative sources
Build a caching layer to avoid re-scraping unchanged pages
Add streaming responses for better user experience

Start building your RAG pipeline with 500 free SearchHive credits -- no credit card required.

See also: /blog/complete-guide-to-ai-agent-frameworks for connecting RAG to agent frameworks, or /blog/top-7-llm-function-calling-tools for tool calling integrations.

How to Build a RAG Pipeline -- Step-by-Step Tutorial

AI-Powered Research

How to Build a RAG Pipeline with Web Search -- Step-by-Step

Key Takeaways

Prerequisites

Step 1: Set Up the Web Retriever

Step 2: Build the Context Builder

Step 3: Implement the Generator

Step 4: Build the Complete Pipeline

Step 5: Add Structured Data Retrieval

Step 6: Cost Analysis

Common Issues

Complete Code Example

Next Steps

Keywords

RELATED ARTICLES

Best AI Agents for Search Tools in 2025

Top 10 Real-Time Search API Tools for Developers in 2026

Complete Guide to Ecommerce Data Extraction

BUILD WITH SEARCHHIVE