Complete Guide to LangChain Web Search: Building AI Agents with Real-Time Data
LangChain is the most popular framework for building LLM applications, and web search is the most important tool for grounding those applications in reality. Without search, your LangChain app is limited to its training data. With search, it can answer questions about current events, look up documentation, verify facts, and research topics on demand.
This guide shows you how to integrate web search into LangChain applications using SearchHive, with practical code examples and a real-world case study.
Key Takeaways
- LangChain's tool system makes it straightforward to add web search to any LLM chain or agent
- SearchHive is the most cost-effective search tool for LangChain -- $0.0001/credit vs $0.008/credit (Tavily) or $25/mo for 1K searches (SerpApi)
- The
@tooldecorator pattern is the simplest way to add search capabilities - ScrapeForge integration lets agents read full web pages, not just snippets
- DeepDive adds AI-powered research synthesis for complex queries
The Challenge: Grounding LangChain in Real-Time Data
A common pattern: a team builds a LangChain chatbot that answers questions about their product. It works great -- until a user asks about pricing that changed last week, or a feature that launched yesterday. The LLM doesn't know because its training data is frozen.
The solution is web search integration. When the LLM encounters a question it can't confidently answer, it searches the web and uses the results to formulate a current, accurate response.
Setting Up SearchHive with LangChain
SearchHive provides three APIs that map perfectly to LangChain's tool system:
| SearchHive API | LangChain Tool Use | Use Case |
|---|---|---|
| SwiftSearch | @tool -- web search | Find current information |
| ScrapeForge | @tool -- page content | Read full web pages |
| DeepDive | @tool -- research | Synthesize complex topics |
Basic Web Search Tool
import httpx
from langchain_core.tools import tool
SEARCHHIVE_API_KEY = "sh_live_xxxxx"
@tool
def web_search(query: str) -> str:
"""Search the web for current information. Returns titles, URLs, and snippets.
Use this tool when you need up-to-date information, current events, or
data that may have changed since your training cutoff.
"""
resp = httpx.get(
"https://api.searchhive.dev/v1/swiftsearch",
params={"q": query, "num": 5},
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"}
)
resp.raise_for_status()
results = resp.json().get("results", [])
if not results:
return f"No results found for: {query}"
formatted = []
for r in results:
formatted.append(f"Title: {r['title']}\nURL: {r['url']}\nSnippet: {r['snippet']}")
return "\n\n".join(formatted)
Adding Page Scraping
Search snippets are useful, but sometimes the LLM needs full page content. ScrapeForge fills that gap:
@tool
def scrape_webpage(url: str) -> str:
"""Extract full content from a web page as clean markdown.
Use this after web_search when you need more detail from a specific page.
Strips navigation, ads, and boilerplate. Returns content optimized for LLM consumption.
"""
resp = httpx.post(
"https://api.searchhive.dev/v1/scrapeforge",
json={"url": url, "format": "markdown"},
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"}
)
resp.raise_for_status()
return resp.json()["content"][:4000]
Adding Deep Research
For complex questions that need synthesis across multiple sources:
@tool
def deep_research(query: str) -> str:
"""Conduct comprehensive AI-powered research on a topic.
Searches multiple sources, reads and synthesizes content, and returns
a structured research summary with cited sources. Use for complex,
multi-faceted questions that require in-depth analysis.
"""
resp = httpx.post(
"https://api.searchhive.dev/v1/deepdive",
json={"query": query, "depth": "detailed"},
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"}
)
resp.raise_for_status()
data = resp.json()
output = f"Research Summary:\n{data.get('summary', 'No summary available')}\n\n"
sources = data.get("sources", [])
if sources:
output += "Sources:\n" + "\n".join(f"- {s}" for s in sources[:5])
return output
Building a Search-Enabled LangChain Agent
With the tools defined, building the agent is straightforward:
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Define available tools
tools = [web_search, scrape_webpage, deep_research]
# Create ReAct agent
agent = create_react_agent(llm, tools)
# Use it
result = agent.invoke({
"messages": [{"role": "user", "content": "What are the latest features in LangChain v0.3?"}]
})
for msg in result["messages"]:
print(f"{msg.type}: {msg.content[:200] if msg.content else ''}")
The agent automatically decides when to search, when to scrape a specific page, and when to use deep research. You don't need to hardcode any of that logic.
Real-World Case Study: Product Research Agent
A SaaS company built a LangChain agent that helps their sales team research prospects. The agent:
- Searches the web for the prospect's company information
- Scrapes their website to understand their product
- Uses DeepDive to research their market and competitors
- Synthesizes everything into a pre-meeting briefing
def prospect_research(company_name: str) -> str:
"""Research a prospect company using SearchHive + LangChain."""
briefing_prompt = f"""Research {company_name} and create a prospect briefing:
1. Company overview and recent news
2. Main products/services
3. Key competitors
4. Market position
Use web_search to find information, scrape_webpage to read their website,
and deep_research for market analysis.
"""
result = agent.invoke({
"messages": [{"role": "user", "content": briefing_prompt}]
})
return result["messages"][-1].content
briefing = prospect_research("Acme Corp")
print(briefing)
Results:
- Briefing generation time: 30-45 seconds (down from 4+ hours manual research)
- Accuracy: 92% factual correctness on evaluation set
- Cost: ~$0.03 per briefing (SearchHive credits + LLM tokens)
- Sales team adoption: 85% of reps use it for every prospect meeting
SearchHive vs Alternatives for LangChain
| Tool | LangChain Integration | Pricing for 10K searches | Search Quality |
|---|---|---|---|
| SearchHive | Custom @tool (3 lines) | ~$1 | Excellent |
| Tavily | Native LangChain tool | ~$80 | Good |
| SerpApi | Community LangChain tool | $25 | Good |
| DuckDuckGo | Native (free) | $0 | Inconsistent |
| Exa | Custom @tool | ~$70 | Excellent (semantic) |
SearchHive wins on cost by a wide margin because the credit system is extremely efficient ($0.0001/credit). The custom @tool wrapper takes three lines of code and gives you full control over the response format.
/compare/tavily /compare/serpapi /compare/exa
Best Practices for LangChain Web Search
1. Use specific tool descriptions. The LLM decides which tool to call based on the description. "Search the web for current information" is better than "Search."
2. Limit search results. Don't return 20 results -- the LLM can't process that many. 5-10 results with clean snippets is optimal.
3. Set max iterations. Use create_react_agent(llm, tools, max_iterations=8) to prevent the agent from burning through credits in a loop.
4. Cache search results. If multiple users ask similar questions, cache the search results to save credits and reduce latency.
5. Monitor credit usage. Track how many SearchHive credits each agent run consumes. Typical runs use 3-8 credits (1-3 searches + 1-2 scrapes).
Get Started
Adding web search to your LangChain app takes three lines of code with SearchHive. Sign up free and get 500 credits to experiment. No credit card required.
Check the SearchHive docs for LangChain integration guides, async examples, and production deployment patterns. /blog/autonomous-agents-design-common-questions-answered /blog/complete-guide-to-api-for-llm-integration /blog/complete-guide-to-web-automation-tools