LlamaIndex Web Search Integration — Best APIs Compared
LlamaIndex is the go-to framework for building RAG (Retrieval-Augmented Generation) pipelines. But static document indexes become stale fast. Integrating a web search API lets your LlamaIndex applications query the live internet, keeping responses current and relevant.
This article compares the best search APIs for LlamaIndex, with code examples and pricing analysis for each.
Key Takeaways
- LlamaIndex provides
FunctionToolfor wrapping any REST API as a tool, plusToolSpecfrom LlamaHub for pre-built integrations - SearchHive offers the best value for LlamaIndex — search, scraping, and deep research from one API at $9/mo for 5,000 credits
- For simple RAG augmentation, a custom retriever using any search API is the lightest-weight approach
- SerpApi and Serper.dev are strong for Google SERP data, but significantly more expensive
Search APIs for LlamaIndex Compared
| API | Integration Method | Free Tier | Pricing per 1K | Best For |
|---|---|---|---|---|
| SearchHive | FunctionTool / custom retriever | 500 credits | $0.18 (Starter) | All-in-one search + scrape + research |
| SerpApi | LlamaHub ToolSpec | 250/mo | $25 | Google SERP data |
| Serper.dev | FunctionTool | 2,500 queries | $50 | Fast Google results |
| Tavily | LlamaHub ToolSpec | 1K credits/mo | $8 | AI-optimized search |
| Brave Search | FunctionTool | $5/mo credit | $5 | Privacy-focused results |
| Exa.ai | LlamaHub ToolSpec | 1K/mo | $7 | Neural/semantic search |
| DuckDuckGo | Built-in (free) | Unlimited | Free | Quick prototyping |
Approach 1: Custom Retriever (Lightest Weight)
For RAG pipelines, the cleanest integration is a custom retriever that queries a search API and returns results as LlamaIndex Document objects:
import requests
from llama_index.core.retrievers import BaseRetriever
from llama_index.core import QueryBundle
from llama_index.core.schema import Document, NodeWithScore
from typing import List
SEARCHHIVE_KEY = "your-searchhive-key"
class WebSearchRetriever(BaseRetriever):
# Retrieve documents from the live web using SearchHive SwiftSearch
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
query_str = query_bundle.query_str
resp = requests.get(
"https://api.searchhive.dev/v1/swift-search",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
params={"query": query_str, "limit": 5}
)
nodes = []
for r in resp.json().get("results", []):
doc = Document(
text=r["snippet"],
metadata={"title": r["title"], "url": r["url"], "source": "web_search"}
)
nodes.append(NodeWithScore(node=doc, score=1.0))
return nodes
retriever = WebSearchRetriever()
# Use as a retriever in your RAG pipeline
from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine(retriever=retriever)
response = query_engine.query("What are the latest features in LlamaIndex 0.12?")
print(response)
Approach 2: FunctionTool (Agent Integration)
For LlamaIndex agent workflows, wrap the search API as a FunctionTool:
import requests
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
SEARCHHIVE_KEY = "your-searchhive-key"
def swift_search(query: str) -> str:
# Search the web for current information
resp = requests.get(
"https://api.searchhive.dev/v1/swift-search",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
params={"query": query, "limit": 5}
)
results = []
for r in resp.json().get("results", []):
results.append(f"{r['title']}\n {r['url']}\n {r['snippet']}")
return "\n\n".join(results)
def scrape_forge(url: str) -> str:
# Extract clean content from a web page as markdown
resp = requests.post(
"https://api.searchhive.dev/v1/scrape-forge",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}", "Content-Type": "application/json"},
json={"url": url, "format": "markdown"}
)
return resp.json().get("content", "Failed to scrape")[:4000]
def deep_dive(query: str) -> str:
# Run comprehensive research with multi-source synthesis
resp = requests.get(
"https://api.searchhive.dev/v1/deep-dive",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
params={"query": query}
)
return resp.json().get("summary", "No results")[:3000]
# Create FunctionTool instances
search_tool = FunctionTool.from_defaults(
fn=swift_search,
name="web_search",
description="Search the web for current information. Use for factual lookups, pricing, news."
)
scrape_tool = FunctionTool.from_defaults(
fn=scrape_forge,
name="page_scraper",
description="Scrape a web page's content as markdown. Provide the full URL."
)
research_tool = FunctionTool.from_defaults(
fn=deep_dive,
name="deep_research",
description="Conduct in-depth research on a topic with synthesized multi-source analysis."
)
# Build an agent with all three tools
agent = ReActAgent.from_tools(
[search_tool, scrape_tool, research_tool],
llm=OpenAI(model="gpt-4o"),
verbose=True
)
response = agent.chat("Compare the pricing of SearchHive vs SerpApi vs Tavily for 50K searches/month")
print(response)
Approach 3: Using LlamaHub ToolSpecs
LlamaHub provides pre-built integrations for some search APIs:
# Tavily via LlamaHub
from llama_index.tools.tavily import TavilyToolSpec
tavily_spec = TavilyToolSpec(api_key="your-tavily-key")
tavily_tools = tavily_spec.to_tool_list()
# Returns: tavily_search, tavily_extract, tavily_crawl
# DuckDuckGo via LlamaHub
from llama_index.tools.duckduckgo import DuckDuckGoSearchToolSpec
ddg_spec = DuckDuckGoSearchToolSpec()
ddg_tools = ddg_spec.to_tool_list()
Note: LlamaHub doesn't have a built-in SearchHive ToolSpec yet, but the FunctionTool approach above takes just a few lines of code and gives you more control over parameters and response formatting.
Pricing Comparison for LlamaIndex Projects
Real-world costs for common LlamaIndex use cases:
| Use Case | Monthly Queries | Serper.dev | Tavily | Brave | SearchHive |
|---|---|---|---|---|---|
| Personal RAG | 1,000 | Free tier | Free tier | Free tier | Free tier |
| Startup prototype | 5,000 | $50 | $40 | $25 | $9 |
| Production app | 25,000 | $250 | $200 | $125 | $49 |
| Enterprise RAG | 100,000 | $1,000+ | $800+ | $500 | $199 |
SearchHive's credit system is especially efficient for RAG workflows where you mix search queries with page scraping. Each SwiftSearch costs 1 credit, each ScrapeForge costs 1 credit — the $49/month Builder plan gives you 100K credits, enough for a mix of searches and scrapes that would cost $200+ with separate providers.
Feature-by-Feature Comparison
| Feature | SearchHive | SerpApi | Tavily | Serper |
|---|---|---|---|---|
| Web search | SwiftSearch (multi-engine) | Google, Bing, etc. | AI-optimized | |
| Page scraping | ScrapeForge | No | Extract tool | No |
| Deep research | DeepDive | No | No | No |
| Custom retriever | Yes | Yes | Yes | Yes |
| LlamaHub integration | FunctionTool | ToolSpec | ToolSpec | FunctionTool |
| Rate limits (starter) | 5K/mo | 1K/mo | 1K/mo | 50K (one-time) |
| Response format | free JSON formatter + markdown | JSON | JSON | JSON |
Verdict
For LlamaIndex web search integration:
- Quick prototyping: DuckDuckGo via LlamaHub — zero cost, zero setup
- Google SERP data: SerpApi — most comprehensive search engine coverage
- Production RAG with mixed workloads: SearchHive — search + scraping + deep research from one API key, at a fraction of the cost of running multiple providers
The biggest advantage of SearchHive for LlamaIndex is the ability to search, scrape pages for full content, and run deep research all from the same credit pool. Most LlamaIndex RAG pipelines need both search (to find relevant pages) and scraping (to extract content for the LLM context). With other providers, that requires two separate API subscriptions.
Get started with SearchHive's free tier — 500 credits, no credit card. See the API docs for complete integration guides.
Related: /compare/serpapi | /compare/tavily | /blog/langchain-web-search-integration