How to Build an AI Agent with Web Access: The Complete Answer
AI agents without web access are trapped in their training data cutoff. Adding real-time web search and scraping capabilities is what transforms a static chatbot into an agent that can research, compare, and act on current information.
This guide covers the architecture, tools, and code patterns for building AI agents with web access in 2026.
Key Takeaways
- Modern AI agents combine an LLM with web search, scraping, and tool-calling frameworks
- The architecture follows: user query -> agent decides -> web tools execute -> LLM synthesizes
- Search APIs (SwiftSearch, Tavily, Brave) handle search; scraping APIs (ScrapeForge, Firecrawl) handle page extraction
- Frameworks like LangChain, Vercel AI SDK, and OpenAI Agents SDK simplify orchestration
- MCP (Model Context Protocol) is the emerging standard for connecting agents to web tools
- Cost optimization matters -- search API pricing varies 10x between providers
What architecture does a web-connected AI agent use?
The standard architecture has four components:
- Orchestrator (LLM): Decides when to search, what to search, and how to combine results. Uses function/tool calling.
- Search layer: Executes web searches and returns structured results. Examples: SwiftSearch, Tavily, Brave Search API.
- Scraping layer: Extracts full content from specific URLs. Examples: ScrapeForge, Firecrawl, Jina AI Reader.
- Memory/state: Maintains conversation context, search history, and intermediate results.
The flow looks like this:
User Query -> LLM analyzes intent -> Agent calls search tool
-> Search results returned -> LLM decides if more context needed
-> Agent calls scrape tool on relevant URLs -> Content extracted
-> LLM synthesizes final answer
What tools do I need to give an AI agent web access?
You need at least two API capabilities:
Search API: Returns relevant URLs and snippets for a query.
| Provider | Pricing | Free Tier | Best For |
|---|---|---|---|
| SearchHive SwiftSearch | $0.0001/credit | 500 credits | All-in-one search + scrape + deep dive |
| Tavily | $0.008/credit | 1,000/mo | AI-optimized search results |
| Brave Search API | $5/1K queries | $5 free credits/mo | Independent web index |
| SerpAPI | $25/mo (1K) | 250/mo | Google SERP data |
| Exa | $7/1K searches | None | Neural/semantic search |
Scraping API: Extracts structured content from URLs the search layer returns.
| Provider | Pricing | JS Rendering | Best For |
|---|---|---|---|
| SearchHive ScrapeForge | Included in credits | Yes | Integrated with search pipeline |
| Firecrawl | $0.83/1K (Standard) | Yes | Markdown output, AI-native |
| ScrapeGraphAI | $0.28/1K (Growth) | Yes | AI-powered extraction |
| Jina AI Reader | Free (1M tokens/day) | No | Simple single-page extraction |
How do I build a basic web search agent?
Here is a working example using SearchHive's APIs and OpenAI's tool calling:
import openai
import requests
import json
client = openai.OpenAI()
SEARCHHIVE_KEY = "your_searchhive_api_key"
def web_search(query: str, num_results: int = 5) -> str:
# Search the web and return results
resp = requests.get(
"https://api.searchhive.dev/v1/swiftsearch",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
params={"query": query, "limit": num_results}
)
results = resp.json()
formatted = []
for r in results.get("results", []):
formatted.append(f"- {r.get('title', '')}: {r.get('url', '')}\n {r.get('snippet', '')}")
return "\n".join(formatted)
def scrape_url(url: str) -> str:
# Extract content from a specific URL
resp = requests.post(
"https://api.searchhive.dev/v1/scrapeforge",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
json={"url": url, "format": "markdown"}
)
data = resp.json()
return data.get("content", "")[:3000] # Truncate for context window
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"num_results": {"type": "integer", "description": "Number of results", "default": 5}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "scrape_url",
"description": "Extract full content from a URL",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to scrape"}
},
"required": ["url"]
}
}
}
]
tool_map = {"web_search": web_search, "scrape_url": scrape_url}
messages = [{"role": "user", "content": "What are the top 5 Python web scraping libraries in 2026?"}]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
msg = response.choices[0].message
messages.append(msg)
if not msg.tool_calls:
print(msg.content)
break
for tool_call in msg.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
result = tool_map[func_name](**func_args)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
This agent autonomously decides when to search and when to scrape, combining both to produce informed answers.
Should I use MCP for web access?
The Model Context Protocol (MCP) is an open standard for connecting AI models to external tools. It is rapidly becoming the preferred way to give agents web access because it provides a universal interface -- any MCP-compatible agent can use any MCP-compatible web tool.
Brave Search offers an MCP server. Bright Data also ships an MCP server. SearchHive can be wrapped as an MCP server with a thin adapter.
MCP is worth adopting if you are building agents that need to work across multiple tool providers. For simpler setups, direct API calls (as shown above) are faster to implement.
How do I optimize costs for a web-connected agent?
Web API costs add up fast. Here are practical strategies:
- Cache aggressively. Do not search for the same query twice. Use Redis or SQLite with a TTL of a few hours.
- Use cheaper search for initial queries, expensive scraping only for final URLs. SearchHive credits work across both, making cost tracking simple.
- Limit scraping depth. Most answers come from the first 2-3 pages. Do not scrape every search result.
- Use the free tiers. SearchHive gives 500 free credits, Brave gives $5/month free, Tavily gives 1,000 free credits/month. Combine free tiers for low-volume agents.
- Implement query deduplication. Normalize queries (lowercase, remove stop words) before checking your cache.
How does SearchHive compare for AI agent web access?
SearchHive provides all three web capabilities in one API -- search, scrape, and deep research:
| Feature | SearchHive | Tavily | Brave |
|---|---|---|---|
| Web Search | SwiftSearch | Yes | Yes |
| Page Scraping | ScrapeForge | Yes | No |
| Deep Research | DeepDive | No | No |
| Free Tier | 500 credits | 1,000/mo | $5/mo credits |
| Price per search | ~$0.0001 | $0.008 | $0.005 |
| JS Rendering | Yes | N/A | N/A |
SearchHive's unified API means one authentication, one billing, and one integration for all your agent's web needs. The free tier gives you enough to build and test a working agent before committing.
What frameworks work best for web-connected agents?
- LangChain / LangGraph: Most mature ecosystem, large tool library, good for complex multi-step agents
- Vercel AI SDK: Best for production web apps, great streaming support, TypeScript-first
- OpenAI Agents SDK: Tightest integration with GPT models, simple tool definitions
- CrewAI: Multi-agent orchestration, good for teams of specialized agents
- LlamaIndex: Best for RAG-heavy agents with document retrieval
For a quick start, the OpenAI Agents SDK with direct API calls (shown above) has the least boilerplate. For production, LangGraph gives you more control over the agent loop.
Frequently Asked Questions
Can an AI agent browse the web like a human? Not natively. Agents use search and scraping APIs to get web data programmatically. Browser automation (Playwright, Puppeteer) can simulate human browsing but is slower, more fragile, and more expensive.
How do I prevent my agent from hallucinating web data? Ground every factual claim in actual search results. Pass the source URLs alongside the agent's answer. Use RAG techniques -- retrieve first, generate second.
What about rate limits? All search and scraping APIs have rate limits. Start with the free tier, implement exponential backoff, and scale up as your agent's usage grows.
Can I build this without an API key?
Technically yes, using libraries like requests and BeautifulSoup to scrape directly. But this is fragile, slow, and raises legal concerns. APIs handle proxy rotation, CAPTCHAs, and rate limiting for you.
Summary
Building an AI agent with web access requires a search API, a scraping API, and an orchestration framework. The simplest setup uses OpenAI's tool calling with SearchHive's SwiftSearch and ScrapeForge APIs -- two endpoints, one API key, and you have a fully functional research agent.
Get started with SearchHive's free tier -- 500 credits, no credit card required.