AI agent API integration patterns define how autonomous agents connect to external services -- search engines, databases, SaaS tools, and internal APIs. Getting these patterns right determines whether your agent is useful or unreliable.
This FAQ covers the most common questions developers ask when building API integrations for AI agents, with practical examples using SearchHive's APIs.
Key Takeaways
- Function calling is the dominant pattern for LLM-to-API integration in 2026
- Search APIs give agents real-time knowledge beyond their training data
- Rate limiting and error handling are the two most common failure points
- SearchHive's SwiftSearch and ScrapeForge APIs are purpose-built for agent workflows
- The Model Context Protocol (MCP) standardizes tool discovery for agents
What are the main API integration patterns for AI agents?
There are four primary patterns:
-
Function calling / tool use -- The LLM decides when to call an API based on user intent. The host application executes the call and returns results. This is the standard pattern used by OpenAI, Anthropic, Google, and open-source models.
-
Pre-fetching / retrieval augmentation -- The application fetches API data before sending it to the LLM as context. RAG (Retrieval-Augmented Generation) falls here. Good for predictable data needs, bad for dynamic multi-step tasks.
-
Code generation -- The LLM writes code (Python, JavaScript) that calls APIs directly. Used by tools like Claude Code, OpenAI Codex, and ChatGPT's Code Interpreter. Maximum flexibility but requires sandboxing.
-
Sidecar / middleware -- A proxy service sits between the agent and external APIs, handling auth, rate limiting, caching, and transformation. This is where tools like SearchHive fit.
How do I add web search capabilities to my AI agent?
Use a search API as a tool in your agent's function-calling loop. Here's a minimal example:
import json
import requests
SEARCHHIVE_KEY = "your_api_key"
def web_search(query, engine="google", limit=5):
resp = requests.post(
"https://api.searchhive.dev/v1/search",
headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
json={"query": query, "engine": engine, "limit": limit}
)
resp.raise_for_status()
data = resp.json()
return "\n".join(
f"- {r['title']}: {r['url']}" for r in data.get("results", [])
)
# Define as a tool for your LLM
tools = [{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"limit": {"type": "integer", "description": "Number of results", "default": 5}
},
"required": ["query"]
}
}
}]
The agent calls web_search whenever it needs information beyond its training data. SearchHive returns structured results with titles, URLs, and snippets that the LLM can parse directly.
How do I handle rate limits when an agent makes many API calls?
Three strategies, from simplest to most robust:
1. Simple sleep-based throttling:
import time
from functools import wraps
def rate_limit(calls_per_minute=30):
interval = 60.0 / calls_per_minute
def decorator(func):
last_call = [0.0]
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_call[0]
if elapsed < interval:
time.sleep(interval - elapsed)
last_call[0] = time.time()
return func(*args, **kwargs)
return wrapper
return decorator
@rate_limit(calls_per_minute=20)
def agent_search(query):
return web_search(query)
2. Token bucket with burst allowance:
import threading
class TokenBucket:
def __init__(self, rate=20, burst=5):
self.rate = rate
self.tokens = burst
self.max_tokens = burst
self.last = time.time()
self.lock = threading.Lock()
def acquire(self):
with self.lock:
now = time.time()
self.tokens = min(self.max_tokens, self.tokens + (now - self.last) * self.rate)
self.last = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
bucket = TokenBucket(rate=20, burst=10)
3. Queue-based with retry: For production agents, use a task queue (Celery, Redis) with exponential backoff on 429 responses. SearchHive returns a retry-after header you can use directly.
/tutorials/searchhive-api-error-handling
Should my agent cache API responses?
Yes, selectively. Cache these:
- Search results for identical queries (TTL: 1-4 hours for trending topics, 24h+ for stable ones)
- Scraped page content (TTL: 24h+ -- pages change infrequently)
- API reference data (TTL: days to weeks)
Do NOT cache these:
- User-specific data (profiles, accounts)
- Real-time data (prices, stock, weather)
- Authentication tokens (handle via the API client)
from functools import lru_cache
import hashlib
import time
class TimedCache:
def __init__(self, ttl_seconds=3600):
self.cache = {}
self.ttl = ttl_seconds
def get(self, key):
entry = self.cache.get(key)
if entry and time.time() - entry["time"] < self.ttl:
return entry["value"]
return None
def set(self, key, value):
self.cache[key] = {"value": value, "time": time.time()}
search_cache = TimedCache(ttl_seconds=7200)
def cached_search(query, **kwargs):
cache_key = hashlib.md5(f"{query}:{kwargs}".encode()).hexdigest()
cached = search_cache.get(cache_key)
if cached:
return cached
result = web_search(query, **kwargs)
search_cache.set(cache_key, result)
return result
How do I make my agent handle API errors gracefully?
Wrap every API call in structured error handling. Never let an API failure crash the agent loop.
def safe_api_call(func, *args, retries=3, **kwargs):
for attempt in range(retries):
try:
return func(*args, **kwargs)
except requests.exceptions.HTTPError as e:
status = e.response.status_code
if status == 429:
wait = int(e.response.headers.get("retry-after", 2 ** attempt))
time.sleep(wait)
continue
elif status == 401:
return {"error": "Authentication failed -- check API key"}
elif status == 402:
return {"error": "Credits exhausted -- upgrade your plan"}
elif status >= 500:
time.sleep(2 ** attempt)
continue
else:
return {"error": f"API error: {status}"}
except requests.exceptions.ConnectionError:
time.sleep(2 ** attempt)
continue
except requests.exceptions.Timeout:
return {"error": "API request timed out"}
return {"error": "Max retries exceeded"}
The agent should always be able to tell the user what went wrong and suggest next steps, rather than silently failing or crashing.
What is the Model Context Protocol (MCP) and does it matter?
MCP is an open standard (by Anthropic) that standardizes how AI agents discover and use external tools. Instead of hardcoding every API integration, the agent connects to MCP servers that expose tools via a common protocol.
How it works:
- An MCP server hosts tool definitions (search, scrape, database query, etc.)
- The agent connects to the server and discovers available tools
- When the agent needs a tool, it calls it through the MCP protocol
SearchHive can be wrapped as an MCP server, exposing SwiftSearch, ScrapeForge, and DeepDive as standardized tools. This means any MCP-compatible agent can use SearchHive without custom integration code.
/blog/complete-guide-to-mcp-tools-for-ai-agents
How does SearchHive compare to other APIs for agent integration?
SearchHive is built specifically for agent and automation use cases:
- Unified API -- Search, scrape, and deep-dive under one API key
- Structured output -- free JSON formatter responses designed for LLM consumption
- Credits system -- Pay per request, not per seat. The Starter plan ($9/mo) gives you 5,000 credits
- Free tier -- 500 credits/month, enough for prototyping and light use
- No vendor lock-in -- Standard REST API, works with any framework
Compared to using multiple APIs (one for search, one for scraping, one for content extraction), SearchHive reduces integration complexity from three services to one.
What are common mistakes in AI agent API integration?
- No error handling -- Agents crash on the first API failure instead of degrading gracefully
- Ignoring rate limits -- Burst calls trigger 429 errors and can get your key temporarily blocked
- Over-fetching -- Requesting 100 results when the agent only needs 5 wastes credits and latency
- No caching -- Repeated identical queries multiply costs unnecessarily
- Hardcoded credentials -- API keys in source code instead of environment variables
- Synchronous blocking -- Long API calls block the agent loop; use async or background processing
- No timeout handling -- Without timeouts, a slow API can hang the agent indefinitely
Can I use multiple search APIs as fallbacks?
Yes, and it's a good pattern for reliability. Here's how to set up a fallback chain:
def search_with_fallback(query, engines=["searchhive", "brave", "serper"]):
for engine in engines:
try:
if engine == "searchhive":
return web_search(query, engine="google", limit=5)
elif engine == "brave":
# Brave Search API fallback
return brave_search(query)
elif engine == "serper":
# Serper.dev fallback
return serper_search(query)
except Exception as e:
print(f"{engine} failed: {e}, trying next...")
continue
return {"error": "All search engines failed"}
With SearchHive's 99.9% uptime, fallbacks are rarely needed -- but they add resilience for production systems.
Summary
AI agent API integration comes down to four things: clean tool definitions, robust error handling, smart caching, and rate limit management. SearchHive's unified search and scraping API simplifies all four by giving you one service, one key, and one SDK for everything your agent needs to interact with the web.
Get Started Free
Add web search and scraping capabilities to your AI agent with SearchHive. Sign up free and get 500 credits. Check the API docs for integration guides in Python, Node.js, and more.