How to Build a RAG Pipeline with Web Search -- Step-by-Step
Retrieval Augmented Generation (RAG) combines an LLM's reasoning ability with external knowledge retrieval. Instead of relying only on training data, a RAG pipeline fetches relevant documents, web pages, or search results and includes them in the LLM's context. This tutorial shows you how to build a production RAG pipeline using SearchHive for web search and retrieval, with practical Python code you can deploy today.
Key Takeaways
- A RAG pipeline has four stages: Query, Retrieve, Augment, Generate
- SwiftSearch finds relevant web sources for any user question
- ScrapeForge fetches full page content from those sources
- DeepDive extracts structured data when you need specific facts
- SearchHive starts at $0 (500 free credits) -- enough to build and test a complete pipeline
Prerequisites
- Python 3.9+
- An OpenAI API key (or Anthropic, Gemini)
- A SearchHive API key (free at searchhive.dev)
- Basic understanding of how LLMs work
pip install openai httpx tiktoken
Step 1: Set Up the Web Retriever
The retriever finds relevant web content for a user's query. We use SwiftSearch to find sources and ScrapeForge to fetch full content.
# rag/retriever.py
import httpx
import json
from typing import Optional
class WebRetriever:
"""Retrieve relevant web content for RAG pipelines."""
BASE_URL = "https://api.searchhive.dev/v1"
def __init__(self, api_key: str, max_sources: int = 5):
self.api_key = api_key
self.headers = {"Authorization": f"Bearer {api_key}"}
self.max_sources = max_sources
def search_and_retrieve(self, query: str) -> list[dict]:
"""Search the web and fetch full content from top results.
Returns list of dicts with 'title', 'url', 'content', 'snippet'.
"""
# Step 1: Find relevant sources via SwiftSearch
search_resp = httpx.post(
f"{self.BASE_URL}/swiftsearch",
headers=self.headers,
json={"query": query, "num_results": self.max_sources * 2},
timeout=30.0
)
search_resp.raise_for_status()
results = search_resp.json().get("results", [])
# Step 2: Fetch full content from top results via ScrapeForge
documents = []
for result in results[:self.max_sources]:
try:
scrape_resp = httpx.post(
f"{self.BASE_URL}/scrapeforge",
headers=self.headers,
json={
"url": result["url"],
"render_js": True,
"format": "markdown"
},
timeout=60.0
)
scrape_resp.raise_for_status()
content = scrape_resp.json().get("content", "")
documents.append({
"title": result.get("title", ""),
"url": result.get("url", ""),
"snippet": result.get("snippet", ""),
"content": content[:4000] # Truncate to control context size
})
except Exception as e:
print(f" Warning: could not scrape {result['url']}: {e}")
# Fall back to snippet from search results
documents.append({
"title": result.get("title", ""),
"url": result.get("url", ""),
"snippet": result.get("snippet", ""),
"content": result.get("snippet", "")
})
return documents
Step 2: Build the Context Builder
The context builder formats retrieved documents into a prompt that the LLM can use. This is the "Augment" step.
# rag/context.py
from typing import Optional
def build_context(documents: list[dict], max_tokens: int = 6000) -> str:
"""Build RAG context from retrieved documents.
Args:
documents: List of retrieved document dicts
max_tokens: Approximate token limit for context (1 token ~ 4 chars)
"""
max_chars = max_tokens * 4
context_parts = []
total_chars = 0
for i, doc in enumerate(documents, 1):
source_line = f"[Source {i}: {doc['title']}]({doc['url']})
"
content = doc.get("content", doc.get("snippet", ""))
section = f"{source_line}{content}
"
if total_chars + len(section) > max_chars:
# Truncate to fit remaining space
remaining = max_chars - total_chars - len(source_line)
if remaining > 200:
section = f"{source_line}{content[:remaining]}...
"
else:
break
context_parts.append(section)
total_chars += len(section)
return "".join(context_parts)
def build_rag_prompt(query: str, context: str) -> list[dict]:
"""Build the full RAG prompt with system message."""
return [
{
"role": "system",
"content": (
"You are a helpful research assistant. Answer the user's question "
"using the provided web sources. Cite sources by number [1], [2], etc. "
"If the sources do not contain enough information, say so clearly "
"rather than making up answers."
)
},
{
"role": "user",
"content": (
f"## Sources
{context}
"
f"## Question
{query}
"
f"Provide a comprehensive answer based on the sources above. "
f"Include specific facts and cite your sources."
)
}
]
Step 3: Implement the Generator
The generator calls the LLM with the augmented context to produce a grounded answer.
# rag/generator.py
import openai
from context import build_rag_prompt
from typing import Optional
class RAGGenerator:
"""LLM generator for RAG pipeline."""
def __init__(self, api_key: str, model: str = "gpt-4o-mini"):
self.client = openai.OpenAI(api_key=api_key)
self.model = model
def generate(self, query: str, context: str,
temperature: float = 0.3) -> dict:
"""Generate a RAG response.
Returns dict with 'answer', 'model', 'prompt_tokens', 'completion_tokens'.
"""
messages = build_rag_prompt(query, context)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature,
max_tokens=1500
)
return {
"answer": response.choices[0].message.content,
"model": self.model,
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens
}
Step 4: Build the Complete Pipeline
Wire everything together into a single RAG pipeline class.
# rag/pipeline.py
import json
from retriever import WebRetriever
from context import build_context
from generator import RAGGenerator
class RAGPipeline:
"""Complete RAG pipeline: Query -> Retrieve -> Augment -> Generate."""
def __init__(
self,
searchhive_key: str,
openai_key: str,
max_sources: int = 5,
max_context_tokens: int = 6000,
model: str = "gpt-4o-mini"
):
self.retriever = WebRetriever(searchhive_key, max_sources)
self.generator = RAGGenerator(openai_key, model)
self.max_context_tokens = max_context_tokens
def query(self, question: str, verbose: bool = True) -> dict:
"""Run a full RAG query.
Returns dict with 'answer', 'sources', 'token_usage'.
"""
# Step 1: Retrieve relevant documents
if verbose:
print(f"Query: {question}")
print("Retrieving sources...")
documents = self.retriever.search_and_retrieve(question)
if verbose:
print(f" Found {len(documents)} sources")
for i, doc in enumerate(documents, 1):
print(f" [{i}] {doc['title']} ({len(doc['content'])} chars)")
# Step 2: Build context
context = build_context(documents, self.max_context_tokens)
# Step 3: Generate answer
if verbose:
print("Generating answer...")
result = self.generator.generate(question, context)
if verbose:
print(f" Tokens: {result['prompt_tokens']} prompt + "
f"{result['completion_tokens']} completion")
print(f"\n{result['answer']}")
return {
"answer": result["answer"],
"sources": [
{"title": d["title"], "url": d["url"]}
for d in documents
],
"token_usage": {
"prompt": result["prompt_tokens"],
"completion": result["completion_tokens"],
"total": result["total_tokens"]
}
}
# Usage
if __name__ == "__main__":
import os
pipeline = RAGPipeline(
searchhive_key=os.environ.get("SEARCHHIVE_API_KEY", "sh_live_..."),
openai_key=os.environ.get("OPENAI_API_KEY", "sk-..."),
max_sources=4
)
result = pipeline.query(
"What are the best practices for building RAG systems in 2026?"
)
print(f"\n\nSources used:")
for src in result["sources"]:
print(f" - {src['title']}: {src['url']}")
Step 5: Add Structured Data Retrieval
For factual queries (pricing, specs, statistics), use DeepDive to extract structured data instead of full page content:
# rag/structured_retriever.py
import httpx
def retrieve_structured(query: str, urls: list[str],
api_key: str) -> list[dict]:
"""Extract structured facts from specific URLs using DeepDive."""
facts = []
for url in urls:
response = httpx.post(
"https://api.searchhive.dev/v1/deepdive",
headers={"Authorization": f"Bearer {api_key}"},
json={
"url": url,
"extract": {
"key_facts": {
"type": "array",
"description": f"Key facts relevant to: {query}",
"items": {"type": "string"}
},
"statistics": {
"type": "array",
"description": "Numbers, percentages, or measurements mentioned",
"items": {"type": "string"}
}
}
},
timeout=60.0
)
data = response.json().get("data", {})
facts.append({"url": url, **data})
return facts
Step 6: Cost Analysis
Here is what a RAG pipeline costs per query:
| Component | Cost per Query | Notes |
|---|---|---|
| SwiftSearch (5 results) | $0.0025 | ~5 credits |
| ScrapeForge (4 pages) | $0.02 | ~40 credits (JS rendering) |
| DeepDive (optional) | $0.01 | ~10 credits |
| GPT-4o-mini (6K prompt + 1K output) | ~$0.005 | |
| Total per query | ~$0.03-0.04 |
At $0.03-0.04 per query, you can run 1,000 RAG queries for $30-40 -- well within the Builder plan ($49/month for 100K credits).
Common Issues
Slow responses: The pipeline makes multiple sequential HTTP calls (search + scrape + generate). Use asyncio with httpx.AsyncClient to parallelize search and scraping. Expect 5-15 seconds per query with the synchronous version.
Context too large: LLMs have context limits. The build_context function truncates content, but if you need more sources, use a map-reduce approach: summarize each document individually, then combine summaries.
Irrelevant sources: SwiftSearch returns relevant results for most queries. For ambiguous queries, reformulate with specific terms. You can also use SwiftSearch with different search engines (Google vs Bing) and merge results.
Stale data: Web content changes. For applications where freshness matters, add a caching layer with TTL and re-scrape when the cache expires.
Complete Code Example
The full pipeline structure:
rag_pipeline/
retriever.py # Web search + page scraping
context.py # Context formatting
generator.py # LLM generation
pipeline.py # Orchestration
structured_retriever.py # Structured data extraction
requirements.txt # openai, httpx, tiktoken
Next Steps
- Add vector embeddings for local document retrieval (combine with web retrieval)
- Implement query reformulation for better search results
- Add source quality scoring to prioritize authoritative sources
- Build a caching layer to avoid re-scraping unchanged pages
- Add streaming responses for better user experience
Start building your RAG pipeline with 500 free SearchHive credits -- no credit card required.
See also: /blog/complete-guide-to-ai-agent-frameworks for connecting RAG to agent frameworks, or /blog/top-7-llm-function-calling-tools for tool calling integrations.