Complete Guide to LlamaIndex Web Search in 2025
LlamaIndex web search transforms how developers build retrieval-augmented generation (RAG) pipelines by connecting language models to live web data. Whether you need up-to-date information for your AI application or want to ground responses in real sources, LlamaIndex's web search capabilities provide the bridge between static models and the dynamic internet.
This guide covers everything you need to know about using web search with LlamaIndex — from basic setup to production-ready patterns — including why SearchHive is the strongest backend option for live web data retrieval.
Key Takeaways
- LlamaIndex provides a
WebSearchToolthat integrates live web search into LLM workflows - It works with multiple search providers, including Tavily, Google, and Bing
- Combining LlamaIndex web search with a dedicated data extraction API like SearchHive delivers more complete results
- The framework supports both synchronous and asynchronous search patterns
- Production deployments need proper rate limiting, caching, and error handling
What Is LlamaIndex Web Search?
LlamaIndex is a data framework for building LLM applications. Its web search module allows your AI agents and RAG pipelines to query the internet in real time, retrieve relevant pages, and use that content as context for generating responses.
The core component is WebSearchTool, a wrapper that turns any search engine API into a LlamaIndex-compatible tool. This means your LlamaIndex agents can perform web lookups the same way they query vector stores or databases.
Why Web Search Matters for LLM Applications
Language models have a training cutoff. They don't know about events that happened last week, yesterday, or ten minutes ago. Web search closes that gap:
- Real-time data: Get current prices, news, and statistics
- Source attribution: Ground AI responses in verifiable web sources
- Reduced hallucination: When the model cites actual web pages, factual accuracy improves
- Broad coverage: Access information beyond what's in your training data or knowledge base
How LlamaIndex Web Search Works
The architecture is straightforward:
- Your agent decides it needs web information
- The
WebSearchToolsends a query to a configured search provider - Search results (titles, snippets, URLs) are returned
- Optionally, full page content is extracted using a
WebPageLoaderor reader tool - The retrieved content becomes context for the LLM response
Supported Search Providers
LlamaIndex works with several search backends:
| Provider | Type | Free Tier | Notes |
|---|---|---|---|
| Tavily | API | 1,000 searches/month | Purpose-built for AI agents |
| Google Custom Search | API | 100 queries/day | Requires GCP project setup |
| Bing Web Search | API | 1,000 transactions/month | Good general-purpose coverage |
| DuckDuckGo | Free | Unlimited | Rate-limited, no API key needed |
| SearXNG | Self-hosted | Free | Full control over search backend |
Setting Up LlamaIndex Web Search
Installation
pip install llama-index llama-index-tools-web
Basic Tavily Integration
from llama_index.core.agent import ReActAgent
from llama_index.tools.tavily import TavilyToolSpec
# Configure Tavily search
tavily_tool = TavilyToolSpec(
api_key="your-tavily-api-key",
max_results=5
).to_tool_list()
# Create an agent with web search
agent = ReActAgent.from_tools(
tavily_tool,
llm=your_llm,
verbose=True
)
# Use it
response = agent.chat("What are the latest developments in quantum computing?")
print(response)
Using DuckDuckGo (No API Key Required)
from llama_index.tools.duckduckgo import DuckDuckGoSearchToolSpec
search_tool = DuckDuckGoSearchToolSpec().to_tool_list()
agent = ReActAgent.from_tools(
search_tool,
llm=your_llm,
system_prompt="You are a research assistant with access to live web search."
)
This is the fastest way to get started, but DuckDuckGo has rate limits and less structured output than paid providers.
Combining LlamaIndex Web Search with Full Page Extraction
Search snippets alone often aren't enough. You need the full page content to give your LLM comprehensive context. This is where LlamaIndex's web readers come in:
from llama_index.readers.web import SimpleWebPageReader
# After getting search results, load full pages
reader = SimpleWebPageReader()
documents = reader.load_data(urls=["https://example.com/article"])
# Add to your query engine
query_engine = VectorIndex.from_documents(documents).as_query_engine()
The Problem with Simple Page Readers
SimpleWebPageReader works for basic HTML pages but fails on:
- JavaScript-rendered content (React, Angular, Vue sites)
- Bot-protected pages (Cloudflare challenges, CAPTCHAs)
- Paywalled content (many news sites, research papers)
- Complex layouts (infinite scroll, SPA navigation)
This is where SearchHive's ScrapeForge fills the gap.
Using SearchHive as the Backend for LlamaIndex Web Search
SearchHive provides a more complete web data pipeline for LlamaIndex. Instead of relying on basic HTTP requests that fail on modern websites, SearchHive handles JavaScript rendering, bot detection bypass, and structured data extraction.
SearchHive + LlamaIndex Integration
import requests
from llama_index.core import Document
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool
def searchhive_search(query: str, max_results: int = 5) -> str:
"""Search the web using SearchHive SwiftSearch and return structured results."""
response = requests.post(
"https://api.searchhive.dev/v1/swiftsearch",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"query": query, "max_results": max_results}
)
results = response.json()
return str(results)
def searchhive_extract(url: str) -> str:
"""Extract full page content from a URL using SearchHive ScrapeForge."""
response = requests.post(
"https://api.searchhive.dev/v1/scrapeforge",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"url": url, "render_js": True}
)
return response.json().get("content", "")
# Create LlamaIndex tools
search_tool = FunctionTool.from_defaults(
fn=searchhive_search,
name="web_search",
description="Search the web for current information"
)
extract_tool = FunctionTool.from_defaults(
fn=searchhive_extract,
name="extract_page",
description="Extract full content from a web page URL"
)
# Build agent
agent = ReActAgent.from_tools(
[search_tool, extract_tool],
llm=your_llm,
system_prompt="Use web_search to find relevant pages, then extract_page to read full content."
)
Why SearchHive Over Other Providers?
| Feature | Tavily | Google CSE | SearchHive |
|---|---|---|---|
| JS rendering | No | No | Yes |
| Bot bypass | No | No | Yes |
| Structured extraction | Limited | No | Yes |
| Search + scrape combined | Separate | Separate | Unified API |
| Free tier | 1K/month | 100/day | 1K requests/month |
| Pricing (beyond free) | $40/mo | $5/1K queries | $15/mo |
SearchHive's unified API means your LlamaIndex agent can search and scrape through a single interface, reducing complexity and cost.
See /compare/tavily for a detailed pricing breakdown.
Building a RAG Pipeline with Live Web Search
Here's a production pattern that combines vector search with live web lookup:
from llama_index.core import VectorStoreIndex, Document, StorageContext
from llama_index.core.query_engine import SubQuestionQueryEngine
# Your existing knowledge base
existing_index = VectorStoreIndex.from_documents(your_documents)
# Live web search fallback
def web_augmented_query(query: str):
# First, check existing knowledge
existing_response = existing_index.as_query_engine().query(query)
# If confidence is low, augment with web search
if existing_response.metadata.get("confidence", 1.0) < 0.7:
web_content = searchhive_search(query)
web_doc = Document(text=web_content, metadata={"source": "web"})
augmented_index = VectorStoreIndex.from_documents(
[web_doc] + existing_index.docstore.docs.values()
)
return augmented_index.as_query_engine().query(query)
return existing_response
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[...]
)
This hybrid approach gives you the best of both worlds — your curated knowledge base for domain-specific questions and live web search for current information.
Production Best Practices
1. Implement Caching
Don't hit the search API for every query. Cache results with a TTL:
from functools import lru_cache
import time
class CachedSearch:
def __init__(self, ttl_seconds=3600):
self.cache = {}
self.ttl = ttl_seconds
def search(self, query: str) -> dict:
if query in self.cache:
cached_result, timestamp = self.cache[query]
if time.time() - timestamp < self.ttl:
return cached_result
result = searchhive_search(query)
self.cache[query] = (result, time.time())
return result
2. Rate Limit Handling
import time
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(wait=wait_exponential(min=1, max=10), stop=stop_after_attempt(3))
def safe_web_search(query: str) -> str:
return searchhive_search(query)
3. Source Citation
Always track which web pages informed each response:
def search_with_sources(query: str):
results = searchhive_search(query)
sources = [r["url"] for r in results.get("results", [])]
return {"content": str(results), "sources": sources}
4. Cost Control
Set per-query budgets to prevent runaway API costs:
class BudgetAwareSearch:
def __init__(self, max_queries_per_minute=30):
self.queries = []
self.max_qpm = max_queries_per_minute
def search(self, query):
now = time.time()
recent = [t for t in self.queries if now - t < 60]
if len(recent) >= self.max_qpm:
raise RuntimeError(f"Rate limit: {self.max_qpm} queries/minute")
self.queries.append(now)
return searchhive_search(query)
Common Use Cases
AI Research Assistants
Automated research tools that gather information from multiple web sources and synthesize findings. LlamaIndex web search handles the retrieval; the LLM handles synthesis.
Real-Time Competitive Intelligence
Monitor competitor websites, pricing pages, and product launches automatically. SearchHive's ScrapeForge handles the JavaScript-heavy product pages that basic fetchers can't render.
Customer Support Automation
Ground support chatbot responses in current documentation by searching your own site plus relevant community resources.
Financial Data Pipeline
Pull current market data, news, and analyst reports into your LLM pipeline. Web search keeps financial AI tools up-to-date without manual data feeds.
Limitations to Watch
- Latency: Web search adds 1-3 seconds per query. Cache aggressively for frequently asked questions.
- Quality variance: Search results vary in quality. Use relevance scoring to filter low-quality results.
- API costs: At scale, search API costs add up. SearchHive's $15/month plan with 1K included requests is more economical than Tavily's $40/month equivalent.
- Rate limits: Search providers enforce rate limits. Implement queuing for high-throughput applications.
Conclusion
LlamaIndex web search is a powerful way to ground your LLM applications in real-time data. The framework makes integration straightforward, but the quality of your web data backend matters enormously.
SearchHive offers the most complete solution — combining fast search with JavaScript rendering, bot detection bypass, and structured extraction in a single API. At $15/month for the Pro tier (compared to Tavily's $40), it delivers more features at a lower cost.
Ready to power your LlamaIndex pipeline with better web data? Get started with SearchHive's free tier — 1,000 requests/month, no credit card required. Check out the API docs for integration examples.
See also: