How to Give AI Agents Realtime Data Access -- Step-by-Step Guide

AI agents are powerful, but they're only as good as the data they can access. A chatbot answering from a static knowledge base will hallucinate the moment you ask about anything current -- stock prices, news, competitor pricing, weather, or live sports scores.

Giving your AI agent realtime data access turns it from a toy into a production tool. This guide walks you through exactly how to do it, from architecture to working code.

Key Takeaways

Realtime data access means your agent can query live web data, APIs, and databases at inference time -- not just pre-indexed documents
The three main patterns are: search-based RAG, direct API calls, and tool-use agents
SearchHive provides a unified API for search, scraping, and research that plugs directly into agent frameworks
Most agent frameworks (LangChain, LlamaIndex, CrewAI) support tool integration with just a few lines of code
The hardest part isn't the code -- it's deciding which data sources your agent actually needs

Prerequisites

Before you start, you'll need:

Python 3.10+ installed
An OpenAI API key (or Anthropic, or any LLM provider)
A SearchHive API key (free tier works -- 500 credits, no credit card)
Basic familiarity with Python async/await

pip install openai requests httpx

Get your free SearchHive key at searchhive.dev -- signup takes 30 seconds.

Step 1: Understand the Architecture

There are three patterns for giving agents realtime data:

Pattern 1: Search RAG -- Agent queries a search API, retrieves results, includes them as context for the LLM. Good for answering questions about current events.

Pattern 2: Direct API Calls -- Agent calls specific APIs (weather, stocks, prices) with structured parameters. Good for well-defined data domains.

Pattern 3: Tool-Use Agents -- LLM decides which tools to call based on the user's question. The most flexible pattern -- the agent chooses between search, scraping, or specific APIs as needed.

We'll build Pattern 3, the most powerful approach.

Step 2: Define Your Data Tools

Create individual tool functions that your agent can call. Each function should have a clear name, description, and return structured data.

import requests
import json
from datetime import datetime

SEARCHHIVE_API_KEY = "your-key-here"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"

def search_web(query: str, num_results: int = 5) -> str:
    """Search the web for realtime information. Returns titles, URLs, and snippets."""
    resp = requests.get(f"{SEARCHHIVE_BASE}/swiftsearch", headers={
        "Authorization": f"Bearer {SEARCHHIVE_API_KEY}"
    }, params={
        "q": query,
        "engine": "google",
        "num": num_results
    })
    results = resp.json().get("organic", [])
    output = []
    for r in results:
        output.append(f"- {r['title']}: {r['snippet']} ({r['url']})")
    return "\n".join(output) if output else "No results found."

def scrape_page(url: str) -> str:
    """Extract content from a web page as clean markdown."""
    resp = requests.post(f"{SEARCHHIVE_BASE}/scrapeforge", headers={
        "Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
        "Content-Type": "application/json"
    }, json={
        "url": url,
        "format": "markdown"
    })
    return resp.json().get("content", "Failed to scrape page.")[:3000]

def research_topic(query: str) -> str:
    """AI-powered deep research on a topic. Synthesizes from multiple sources."""
    resp = requests.post(f"{SEARCHHIVE_BASE}/deepdive", headers={
        "Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
        "Content-Type": "application/json"
    }, json={
        "query": query,
        "max_results": 8
    })
    data = resp.json()
    return data.get("summary", "No summary available.")

def get_current_time() -> str:
    """Get the current date and time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S UTC")

Step 3: Create the Tool Schema for the LLM

The LLM needs to know what tools are available and when to use them. Define tool schemas that the agent can reason over.

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information. Use this for questions about recent events, current prices, news, or anything that changes frequently.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query string"
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results to return (default 5)",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "scrape_page",
            "description": "Extract full content from a specific URL. Use this when you need detailed information from a particular page that was found in search results.",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The URL to scrape"
                    }
                },
                "required": ["url"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "research_topic",
            "description": "Deep research on a complex topic. AI synthesizes information from multiple sources. Use for analysis, comparisons, or when a simple search isn't enough.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Research topic or question"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current date and time. Use this when the user asks about the current date, time, or day of week.",
            "parameters": {
                "type": "object",
                "properties": {}
            }
        }
    }
]

# Map function names to actual Python functions
tool_dispatch = {
    "search_web": search_web,
    "scrape_page": scrape_page,
    "research_topic": research_topic,
    "get_current_time": get_current_time,
}

Step 4: Build the Agent Loop

The agent loop sends the user's message to the LLM, checks if it wants to call any tools, executes those tools, and feeds results back to the LLM. This repeats until the LLM has enough information to answer.

from openai import OpenAI

client = OpenAI()

SYSTEM_PROMPT = """You are a helpful AI assistant with realtime data access.
When you need current information, use the available tools.
Think step-by-step: decide which tools to call, call them, then synthesize the results.
Always cite your sources when providing factual information."""

def run_agent(user_message: str, max_iterations: int = 5) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message}
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )

        msg = response.choices[0].message
        messages.append(msg)

        # If no tool calls, the LLM is ready to respond
        if not msg.tool_calls:
            return msg.content

        # Execute each tool call and add results
        for tool_call in msg.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)

            print(f"  [Tool call] {func_name}({func_args})")

            try:
                result = tool_dispatch[func_name](**func_args)
            except Exception as e:
                result = f"Error: {str(e)}"

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result
            })

    return "Agent reached maximum iterations without producing a final answer."

Step 5: Test It

# Example 1: Current events question
print(run_agent("What are the top 3 tech news stories today?"))

# Example 2: Competitive analysis
print(run_agent("Compare the pricing of Vercel and Netlify for a team of 10 developers"))

# Example 3: Deep research
print(run_agent("What is the current state of the autonomous vehicle market?"))

When you run these, you'll see the agent call search_web first, then potentially scrape_page for detailed info from specific results, and finally synthesize everything into a coherent answer with citations.

Step 6: Add Caching and Rate Limiting

In production, you don't want to re-search for the same query every time. Add a simple cache:

from functools import lru_cache
import time

_last_call_time = {}
_rate_limit = 1.0  # seconds between calls per tool

def rate_limited_call(func):
    """Decorator to add rate limiting to tool functions."""
    def wrapper(*args, **kwargs):
        name = func.__name__
        now = time.time()
        if name in _last_call_time:
            elapsed = now - _last_call_time[name]
            if elapsed < _rate_limit:
                time.sleep(_rate_limit - elapsed)
        _last_call_time[name] = time.time()
        return func(*args, **kwargs)
    return wrapper

# Apply to all tools
search_web = rate_limited_call(search_web)
scrape_page = rate_limited_call(scrape_page)
research_topic = rate_limited_call(research_topic)

Complete Working Example

Here's the full agent in one file, ready to run:

import requests
import json
from datetime import datetime
from openai import OpenAI

API_KEY = "your-searchhive-key"
BASE = "https://api.searchhive.dev/v1"
client = OpenAI()

def search_web(query, num_results=5):
    resp = requests.get(f"{BASE}/swiftsearch", headers={
        "Authorization": f"Bearer {API_KEY}"
    }, params={"q": query, "engine": "google", "num": num_results})
    results = resp.json().get("organic", [])
    return "\n".join(f"- {r['title']}: {r['snippet']}" for r in results[:num_results])

def scrape_page(url):
    resp = requests.post(f"{BASE}/scrapeforge", headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }, json={"url": url, "format": "markdown"})
    return resp.json().get("content", "Scrape failed.")[:3000]

def research_topic(query):
    resp = requests.post(f"{BASE}/deepdive", headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }, json={"query": query, "max_results": 8})
    return resp.json().get("summary", "Research failed.")

tools = [
    {"type": "function", "function": {
        "name": "search_web", "description": "Search for current information",
        "parameters": {"type": "object", "properties": {
            "query": {"type": "string"}, "num_results": {"type": "integer", "default": 5}
        }, "required": ["query"]}
    }},
    {"type": "function", "function": {
        "name": "scrape_page", "description": "Extract content from a URL",
        "parameters": {"type": "object", "properties": {
            "url": {"type": "string"}
        }, "required": ["url"]}
    }},
    {"type": "function", "function": {
        "name": "research_topic", "description": "Deep research synthesizing multiple sources",
        "parameters": {"type": "object", "properties": {
            "query": {"type": "string"}
        }, "required": ["query"]}
    }},
]

dispatch = {"search_web": search_web, "scrape_page": scrape_page, "research_topic": research_topic}

def agent(message):
    msgs = [
        {"role": "system", "content": "You have realtime web data access. Use tools to find current information, then answer with citations."},
        {"role": "user", "content": message}
    ]
    for _ in range(5):
        resp = client.chat.completions.create(model="gpt-4o", messages=msgs, tools=tools)
        msg = resp.choices[0].message
        msgs.append(msg)
        if not msg.tool_calls:
            return msg.content
        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            result = dispatch[tc.function.name](**args)
            msgs.append({"role": "tool", "tool_call_id": tc.id, "content": result})
    return "Max iterations reached."

if __name__ == "__main__":
    print(agent("What is the current pricing for Claude, GPT-4o, and Gemini Pro?"))

Common Issues

Agent loops endlessly calling the same tool. Add a deduplication check -- before calling a tool, check if it was already called with the same arguments in the current conversation.

Search results are stale or irrelevant. Use research_topic instead of search_web for complex questions. DeepDive synthesizes from multiple sources and produces better summaries.

Rate limit errors. SearchHive's free tier has generous limits, but add exponential backoff in production. The Builder plan ($49/mo, 100K credits) removes this concern for most workloads.

Token costs are too high. Tool results can be verbose. Truncate scraped content to the most relevant sections, or use DeepDive which returns concise summaries instead of raw page content.

Next Steps

Now that you have a working agent with realtime data access, consider:

Adding more tools -- weather APIs, stock APIs, database queries
Using SearchHive's DeepDive for complex research tasks instead of manual search + scrape
Adding streaming -- stream both the LLM response and tool execution status to the user
Deploying as a service -- wrap the agent in a FastAPI endpoint for production use

Ready to build? Start with 500 free credits at searchhive.dev -- no credit card required. See /blog/searchhive-vs-tavily-for-ai-search for a comparison of search APIs for AI agents, or /tutorials/build-ai-agent-realtime-data for more advanced patterns.

How to Give AI Agents Realtime Data Access -- Step-by-Step Guide

AI-Powered Research

How to Give AI Agents Realtime Data Access -- Step-by-Step Guide

Key Takeaways

Prerequisites

Step 1: Understand the Architecture

Step 2: Define Your Data Tools

Step 3: Create the Tool Schema for the LLM

Step 4: Build the Agent Loop

Step 5: Test It

Step 6: Add Caching and Rate Limiting

Complete Working Example

Common Issues

Next Steps

Keywords

RELATED ARTICLES

Complete Guide to LlamaIndex Web Search: Tools, Setup, and Best Practices

Complete Guide to Automation for Competitive Analysis

Complete Guide to API Testing Strategies for Developers

BUILD WITH SEARCHHIVE