How to Give AI Agents Realtime Data Access -- Step-by-Step Guide
AI agents are powerful, but they're only as good as the data they can access. A chatbot answering from a static knowledge base will hallucinate the moment you ask about anything current -- stock prices, news, competitor pricing, weather, or live sports scores.
Giving your AI agent realtime data access turns it from a toy into a production tool. This guide walks you through exactly how to do it, from architecture to working code.
Key Takeaways
- Realtime data access means your agent can query live web data, APIs, and databases at inference time -- not just pre-indexed documents
- The three main patterns are: search-based RAG, direct API calls, and tool-use agents
- SearchHive provides a unified API for search, scraping, and research that plugs directly into agent frameworks
- Most agent frameworks (LangChain, LlamaIndex, CrewAI) support tool integration with just a few lines of code
- The hardest part isn't the code -- it's deciding which data sources your agent actually needs
Prerequisites
Before you start, you'll need:
- Python 3.10+ installed
- An OpenAI API key (or Anthropic, or any LLM provider)
- A SearchHive API key (free tier works -- 500 credits, no credit card)
- Basic familiarity with Python async/await
pip install openai requests httpx
Get your free SearchHive key at searchhive.dev -- signup takes 30 seconds.
Step 1: Understand the Architecture
There are three patterns for giving agents realtime data:
Pattern 1: Search RAG -- Agent queries a search API, retrieves results, includes them as context for the LLM. Good for answering questions about current events.
Pattern 2: Direct API Calls -- Agent calls specific APIs (weather, stocks, prices) with structured parameters. Good for well-defined data domains.
Pattern 3: Tool-Use Agents -- LLM decides which tools to call based on the user's question. The most flexible pattern -- the agent chooses between search, scraping, or specific APIs as needed.
We'll build Pattern 3, the most powerful approach.
Step 2: Define Your Data Tools
Create individual tool functions that your agent can call. Each function should have a clear name, description, and return structured data.
import requests
import json
from datetime import datetime
SEARCHHIVE_API_KEY = "your-key-here"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"
def search_web(query: str, num_results: int = 5) -> str:
"""Search the web for realtime information. Returns titles, URLs, and snippets."""
resp = requests.get(f"{SEARCHHIVE_BASE}/swiftsearch", headers={
"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"
}, params={
"q": query,
"engine": "google",
"num": num_results
})
results = resp.json().get("organic", [])
output = []
for r in results:
output.append(f"- {r['title']}: {r['snippet']} ({r['url']})")
return "\n".join(output) if output else "No results found."
def scrape_page(url: str) -> str:
"""Extract content from a web page as clean markdown."""
resp = requests.post(f"{SEARCHHIVE_BASE}/scrapeforge", headers={
"Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
"Content-Type": "application/json"
}, json={
"url": url,
"format": "markdown"
})
return resp.json().get("content", "Failed to scrape page.")[:3000]
def research_topic(query: str) -> str:
"""AI-powered deep research on a topic. Synthesizes from multiple sources."""
resp = requests.post(f"{SEARCHHIVE_BASE}/deepdive", headers={
"Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
"Content-Type": "application/json"
}, json={
"query": query,
"max_results": 8
})
data = resp.json()
return data.get("summary", "No summary available.")
def get_current_time() -> str:
"""Get the current date and time."""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")
Step 3: Create the Tool Schema for the LLM
The LLM needs to know what tools are available and when to use them. Define tool schemas that the agent can reason over.
tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for current information. Use this for questions about recent events, current prices, news, or anything that changes frequently.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string"
},
"num_results": {
"type": "integer",
"description": "Number of results to return (default 5)",
"default": 5
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "scrape_page",
"description": "Extract full content from a specific URL. Use this when you need detailed information from a particular page that was found in search results.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to scrape"
}
},
"required": ["url"]
}
}
},
{
"type": "function",
"function": {
"name": "research_topic",
"description": "Deep research on a complex topic. AI synthesizes information from multiple sources. Use for analysis, comparisons, or when a simple search isn't enough.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Research topic or question"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Get the current date and time. Use this when the user asks about the current date, time, or day of week.",
"parameters": {
"type": "object",
"properties": {}
}
}
}
]
# Map function names to actual Python functions
tool_dispatch = {
"search_web": search_web,
"scrape_page": scrape_page,
"research_topic": research_topic,
"get_current_time": get_current_time,
}
Step 4: Build the Agent Loop
The agent loop sends the user's message to the LLM, checks if it wants to call any tools, executes those tools, and feeds results back to the LLM. This repeats until the LLM has enough information to answer.
from openai import OpenAI
client = OpenAI()
SYSTEM_PROMPT = """You are a helpful AI assistant with realtime data access.
When you need current information, use the available tools.
Think step-by-step: decide which tools to call, call them, then synthesize the results.
Always cite your sources when providing factual information."""
def run_agent(user_message: str, max_iterations: int = 5) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message}
]
for i in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
msg = response.choices[0].message
messages.append(msg)
# If no tool calls, the LLM is ready to respond
if not msg.tool_calls:
return msg.content
# Execute each tool call and add results
for tool_call in msg.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f" [Tool call] {func_name}({func_args})")
try:
result = tool_dispatch[func_name](**func_args)
except Exception as e:
result = f"Error: {str(e)}"
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "Agent reached maximum iterations without producing a final answer."
Step 5: Test It
# Example 1: Current events question
print(run_agent("What are the top 3 tech news stories today?"))
# Example 2: Competitive analysis
print(run_agent("Compare the pricing of Vercel and Netlify for a team of 10 developers"))
# Example 3: Deep research
print(run_agent("What is the current state of the autonomous vehicle market?"))
When you run these, you'll see the agent call search_web first, then potentially scrape_page for detailed info from specific results, and finally synthesize everything into a coherent answer with citations.
Step 6: Add Caching and Rate Limiting
In production, you don't want to re-search for the same query every time. Add a simple cache:
from functools import lru_cache
import time
_last_call_time = {}
_rate_limit = 1.0 # seconds between calls per tool
def rate_limited_call(func):
"""Decorator to add rate limiting to tool functions."""
def wrapper(*args, **kwargs):
name = func.__name__
now = time.time()
if name in _last_call_time:
elapsed = now - _last_call_time[name]
if elapsed < _rate_limit:
time.sleep(_rate_limit - elapsed)
_last_call_time[name] = time.time()
return func(*args, **kwargs)
return wrapper
# Apply to all tools
search_web = rate_limited_call(search_web)
scrape_page = rate_limited_call(scrape_page)
research_topic = rate_limited_call(research_topic)
Complete Working Example
Here's the full agent in one file, ready to run:
import requests
import json
from datetime import datetime
from openai import OpenAI
API_KEY = "your-searchhive-key"
BASE = "https://api.searchhive.dev/v1"
client = OpenAI()
def search_web(query, num_results=5):
resp = requests.get(f"{BASE}/swiftsearch", headers={
"Authorization": f"Bearer {API_KEY}"
}, params={"q": query, "engine": "google", "num": num_results})
results = resp.json().get("organic", [])
return "\n".join(f"- {r['title']}: {r['snippet']}" for r in results[:num_results])
def scrape_page(url):
resp = requests.post(f"{BASE}/scrapeforge", headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}, json={"url": url, "format": "markdown"})
return resp.json().get("content", "Scrape failed.")[:3000]
def research_topic(query):
resp = requests.post(f"{BASE}/deepdive", headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}, json={"query": query, "max_results": 8})
return resp.json().get("summary", "Research failed.")
tools = [
{"type": "function", "function": {
"name": "search_web", "description": "Search for current information",
"parameters": {"type": "object", "properties": {
"query": {"type": "string"}, "num_results": {"type": "integer", "default": 5}
}, "required": ["query"]}
}},
{"type": "function", "function": {
"name": "scrape_page", "description": "Extract content from a URL",
"parameters": {"type": "object", "properties": {
"url": {"type": "string"}
}, "required": ["url"]}
}},
{"type": "function", "function": {
"name": "research_topic", "description": "Deep research synthesizing multiple sources",
"parameters": {"type": "object", "properties": {
"query": {"type": "string"}
}, "required": ["query"]}
}},
]
dispatch = {"search_web": search_web, "scrape_page": scrape_page, "research_topic": research_topic}
def agent(message):
msgs = [
{"role": "system", "content": "You have realtime web data access. Use tools to find current information, then answer with citations."},
{"role": "user", "content": message}
]
for _ in range(5):
resp = client.chat.completions.create(model="gpt-4o", messages=msgs, tools=tools)
msg = resp.choices[0].message
msgs.append(msg)
if not msg.tool_calls:
return msg.content
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments)
result = dispatch[tc.function.name](**args)
msgs.append({"role": "tool", "tool_call_id": tc.id, "content": result})
return "Max iterations reached."
if __name__ == "__main__":
print(agent("What is the current pricing for Claude, GPT-4o, and Gemini Pro?"))
Common Issues
Agent loops endlessly calling the same tool. Add a deduplication check -- before calling a tool, check if it was already called with the same arguments in the current conversation.
Search results are stale or irrelevant. Use research_topic instead of search_web for complex questions. DeepDive synthesizes from multiple sources and produces better summaries.
Rate limit errors. SearchHive's free tier has generous limits, but add exponential backoff in production. The Builder plan ($49/mo, 100K credits) removes this concern for most workloads.
Token costs are too high. Tool results can be verbose. Truncate scraped content to the most relevant sections, or use DeepDive which returns concise summaries instead of raw page content.
Next Steps
Now that you have a working agent with realtime data access, consider:
- Adding more tools -- weather APIs, stock APIs, database queries
- Using SearchHive's DeepDive for complex research tasks instead of manual search + scrape
- Adding streaming -- stream both the LLM response and tool execution status to the user
- Deploying as a service -- wrap the agent in a FastAPI endpoint for production use
Ready to build? Start with 500 free credits at searchhive.dev -- no credit card required. See /blog/searchhive-vs-tavily-for-ai-search for a comparison of search APIs for AI agents, or /tutorials/build-ai-agent-realtime-data for more advanced patterns.