How to Give GPT-4 Web Access — Search API Integration Patterns

GPT-4 doesn't browse the web by default. It generates text based on its training data, which has a cutoff date. If your application needs up-to-date information, real-time data, or live web content, you need to bridge that gap with a search API integration.

This tutorial covers three proven patterns for giving GPT-4 web access, with working Python code for each approach. All examples use SearchHive's SwiftSearch API as the search backend, but the patterns work with any search API that returns free JSON formatter results.

Key Takeaways

Three patterns for GPT-4 web access: tool/function calling, RAG retrieval, and direct context injection
OpenAI function calling is the cleanest integration -- define a search tool, let GPT-4 decide when to use it
SearchHive SwiftSearch returns structured results that map directly to GPT-4 context windows
RAG retrieval works best for knowledge base applications; direct injection works best for simple lookups
All three patterns can be combined with ScrapeForge for full page content extraction

Prerequisites

Python 3.10+
OpenAI API key (for GPT-4)
SearchHive API key (free tier works)
openai and requests packages installed

pip install openai requests

Pattern 1: Function Calling (Recommended)

OpenAI's function calling (now called "tool use") lets you define a search function that GPT-4 can invoke when it needs web data. The model decides when to search, what to search for, and how to use the results.

This is the best approach for conversational applications, AI agents, and chatbots where you don't want to pre-fetch data.

import json
import requests
from openai import OpenAI

SEARCHHIVE_KEY = "your_searchhive_api_key"
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"

openai_client = OpenAI()

def search_web(query: str, num_results: int = 5) -> str:
    """Search the web using SearchHive SwiftSearch and return formatted results."""
    resp = requests.post(
        f"{SEARCHHIVE_BASE}/swiftsearch",
        json={
            "engine": "google",
            "query": query,
            "num_results": num_results,
        },
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        timeout=30,
    )
    data = resp.json()
    results = data.get("results", [])
    if not results:
        return "No results found."
    
    formatted = []
    for r in results[:num_results]:
        formatted.append(f"- {r['title']}\n  URL: {r['url']}\n  {r.get('snippet', '')}")
    return "\n".join(formatted)


# Define the search tool for GPT-4
tools = [{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search the web for current information. Use this when you need data beyond your training cutoff or need real-time information.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "num_results": {"type": "integer", "description": "Number of results (1-10)", "default": 5},
            },
            "required": ["query"],
        },
    },
}]

def chat_with_web_access(user_message: str) -> str:
    """Send a message to GPT-4 with web search capability."""
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
        
        msg = response.choices[0].message
        messages.append(msg)
        
        # If GPT-4 wants to call the search tool, execute it
        if msg.tool_calls:
            for tool_call in msg.tool_calls:
                if tool_call.function.name == "search_web":
                    args = json.loads(tool_call.function.arguments)
                    search_results = search_web(args["query"], args.get("num_results", 5))
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": search_results,
                    })
            # Continue the loop so GPT-4 can process the results
            continue
        
        # No tool calls -- return the final response
        return msg.content


# Usage
result = chat_with_web_access("What are the latest Python web scraping best practices in 2026?")
print(result)

Pattern 2: RAG Retrieval

For applications where you need to ground GPT-4 responses in specific web content, use search results as a retrieval layer. Search first, format the results, then ask GPT-4 to answer based on those results.

This works well for question-answering systems, research assistants, and any application where grounding in sources matters.

import requests
from openai import OpenAI

SEARCHHIVE_KEY = "your_searchhive_api_key"
openai_client = OpenAI()

def rag_search_and_answer(question: str, top_k: int = 5) -> dict:
    """Search the web and use GPT-4 to answer based on results."""
    
    # Step 1: Retrieve relevant web results
    resp = requests.post(
        "https://api.searchhive.dev/v1/swiftsearch",
        json={"engine": "google", "query": question, "num_results": top_k},
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        timeout=30,
    )
    results = resp.json().get("results", [])
    
    if not results:
        return {"answer": "I couldn't find relevant information for that query.", "sources": []}
    
    # Step 2: Format results as context
    context_parts = []
    sources = []
    for i, r in enumerate(results[:top_k]):
        context_parts.append(f"[{i+1}] {r['title']}\n{r.get('snippet', '')}")
        sources.append({"title": r["title"], "url": r["url"]})
    
    context = "\n\n".join(context_parts)
    
    # Step 3: Ask GPT-4 to answer using the context
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the question using ONLY the provided search results. Cite sources by number. If the results don't contain enough information, say so."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
        ],
    )
    
    return {"answer": response.choices[0].message.content, "sources": sources}


# Usage
result = rag_search_and_answer("What is the current state of web scraping legislation in the EU?")
print(result["answer"])
print("\nSources:")
for s in result["sources"]:
    print(f"  - {s['title']}: {s['url']}")

Pattern 3: Direct Context Injection

The simplest approach: search, format the results, and prepend them to the user message. No tool calling, no multi-turn loop. Best for one-shot queries where you know the user will need web data.

import requests
from openai import OpenAI

SEARCHHIVE_KEY = "your_searchhive_api_key"
openai_client = OpenAI()

def search_augmented_query(user_query: str) -> str:
    """Prepend search results to a user query before sending to GPT-4."""
    
    # Build a search query from the user's input
    resp = requests.post(
        "https://api.searchhive.dev/v1/swiftsearch",
        json={"engine": "google", "query": user_query, "num_results": 5},
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        timeout=30,
    )
    results = resp.json().get("results", [])
    
    context = "Recent web search results:\n"
    for r in results:
        context += f"- {r['title']}: {r.get('snippet', '')}\n"
    
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": f"{context}\n\nBased on the above information, answer: {user_query}"},
        ],
    )
    return response.choices[0].message.content


print(search_augmented_query("What are the best new Python libraries released in 2026?"))

Adding Full Page Content with ScrapeForge

Search snippets are useful, but sometimes you need the full page content. SearchHive's ScrapeForge API extracts complete page content in markdown format, which is ideal for feeding to GPT-4.

def search_and_scrape(query: str) -> str:
    """Search for pages, then scrape the top result for full content."""
    
    # Search
    resp = requests.post(
        "https://api.searchhive.dev/v1/swiftsearch",
        json={"engine": "google", "query": query, "num_results": 3},
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        timeout=30,
    )
    results = resp.json().get("results", [])
    
    if not results:
        return "No results found."
    
    # Scrape the top result
    scrape_resp = requests.post(
        "https://api.searchhive.dev/v1/scrapeforge/scrape",
        json={"url": results[0]["url"], "format": "markdown"},
        headers={"Authorization": f"Bearer {SEARCHHIVE_KEY}"},
        timeout=30,
    )
    
    markdown = scrape_resp.json().get("markdown", "")
    # Truncate to fit within GPT-4's context window
    return markdown[:15000]

Which Pattern Should You Use?

Pattern	Best For	Complexity	Latency
Function Calling	Chatbots, agents, conversations	Medium	Variable (multi-turn)
RAG Retrieval	Q&A, research, knowledge apps	Medium	One search + one GPT call
Direct Injection	Simple lookups, one-shot queries	Low	One search + one GPT call
Search + Scrape	Deep analysis, content extraction	Medium	Higher (scrape latency)

Common Issues

Rate limiting: SearchHive's free tier has request limits. If you hit them, either upgrade or cache results aggressively.
Context window overflow: GPT-4 has a 128K token limit. Don't stuff it with too many search results. 5-10 results with snippets is usually optimal.
Stale results: Search engines cache results. If you need real-time data (stock prices, live events), check the publication date in results and filter accordingly.
GPT-4 hallucinating tool calls: If the model makes up search queries, add more constraints to the function description or use tool_choice: "none" for queries that don't need web access.

Next Steps

Once you have basic web access working, consider adding:

DeepDive for extracting structured insights from scraped content
Multi-engine search (Google + Bing) for broader coverage
Response caching to avoid redundant search API calls for similar queries
Streaming for better UX in conversational applications

Give your GPT-4 applications web access with SearchHive's free tier -- SwiftSearch, ScrapeForge, and DeepDive, one API key.

How to Give GPT-4 Web Access — Search API Integration Patterns

AI-Powered Research

Key Takeaways

Prerequisites

Pattern 1: Function Calling (Recommended)

Pattern 2: RAG Retrieval

Pattern 3: Direct Context Injection

Adding Full Page Content with ScrapeForge

Which Pattern Should You Use?

Common Issues

Next Steps

Keywords

RELATED ARTICLES

Playwright vs Scraping APIs — When to Use What

Scrapy vs API Scraping — Which Approach Is Better

Zapier Web Scraping — Automate Without Code

BUILD WITH SEARCHHIVE