How to Build a Price Comparison API — Step-by-Step Tutorial

Building a price comparison API lets you aggregate product prices from multiple retailers, track price changes over time, and deliver real-time pricing data to your applications. Whether you're building a shopping comparison site, a price alert system, or a competitive intelligence dashboard, this tutorial walks you through the complete process.

We'll use SearchHive's ScrapeForge API for web data extraction — it handles JavaScript rendering and bot detection automatically, which is critical when scraping e-commerce sites.

Key Takeaways

A price comparison API needs three components: data extraction, normalization, and a serving layer
E-commerce sites are heavily protected — you need anti-bot bypass capabilities
SearchHive's ScrapeForge handles JS rendering and bot detection in a single API call
Price normalization (currency conversion, unit standardization) is essential for accurate comparisons
Caching and rate limiting keep your API fast and cost-effective

Prerequisites

Before starting, you'll need:

Python 3.9+ installed
A SearchHive API key (free tier available — 1,000 requests/month)
Basic familiarity with REST APIs and Python
A code editor or IDE

pip install requests fastapi uvicorn redis python-dotenv

Step 1: Define Your Data Model

Start by defining the structure of your price comparison data:

# models.py
from dataclasses import dataclass, asdict
from typing import Optional
from datetime import datetime

@dataclass
class ProductPrice:
    product_name: str
    retailer: str
    price: float
    currency: str
    url: str
    in_stock: bool
    timestamp: str
    shipping_cost: Optional[float] = None
    condition: str = "new"
    
    def to_dict(self):
        return asdict(self)

@dataclass  
class PriceComparison:
    query: str
    results: list[ProductPrice]
    lowest_price: Optional[ProductPrice] = None
    highest_price: Optional[ProductPrice] = None
    scraped_at: str = ""
    
    def __post_init__(self):
        self.scraped_at = datetime.utcnow().isoformat()
        if self.results:
            prices = [r for r in self.results if r.in_stock]
            if prices:
                self.lowest_price = min(prices, key=lambda r: r.price)
                self.highest_price = max(prices, key=lambda r: r.price)

Step 2: Set Up SearchHive ScrapeForge for E-Commerce

Each retailer needs its own extraction configuration. Here's how to extract prices from a typical product page:

# scrapers.py
import requests
import os

SEARCHHIVE_API_KEY = os.getenv("SEARCHHIVE_API_KEY")
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1/scrapeforge"

headers = {"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"}

RETAILER_CONFIGS = {
    "amazon": {
        "url_template": "https://www.amazon.com/s?k={query}",
        "extraction": {
            "type": "structured",
            "fields": {
                "products": {
                    "selector": "[data-component-type='s-search-result']",
                    "multiple": True,
                    "fields": {
                        "name": "h2 a span",
                        "price": ".a-price .a-offscreen::text",
                        "url": "h2 a::attr(href)",
                        "rating": ".a-icon-star-small .a-icon-alt::text",
                        "reviews": ".a-size-small .a-link-normal::text"
                    }
                }
            }
        }
    },
    "walmart": {
        "url_template": "https://www.walmart.com/search?q={query}",
        "extraction": {
            "type": "structured",
            "fields": {
                "products": {
                    "selector": "[data-item-id]",
                    "multiple": True,
                    "fields": {
                        "name": "[data-automation-id='product-title']::text",
                        "price": "[data-automation-id='product-price']::text",
                        "url": "a::attr(href)",
                        "rating": ".rating-number::text"
                    }
                }
            }
        }
    }
}

def scrape_retailer(retailer: str, query: str) -> dict:
    """Scrape product prices from a specific retailer."""
    config = RETAILER_CONFIGS[retailer]
    url = config["url_template"].format(query=query.replace(" ", "+"))
    
    response = requests.post(
        SEARCHHIVE_BASE,
        headers=headers,
        json={
            "url": url,
            "render_js": True,
            "anti_bot": True,
            "extraction": config["extraction"]
        },
        timeout=30
    )
    response.raise_for_status()
    return response.json()

Step 3: Normalize and Clean Price Data

Raw scraped data is messy. Prices come in different formats, currencies, and may include text like "Free shipping" or "Was $99.99". Here's how to normalize:

# normalizer.py
import re
from decimal import Decimal, InvalidOperation

def parse_price(raw_price: str) -> float:
    """Extract numeric price from messy scraped text."""
    if not raw_price:
        return 0.0
    # Remove currency symbols, commas, whitespace
    cleaned = re.sub(r'[^\d.]', '', raw_price)
    try:
        return float(Decimal(cleaned))
    except (InvalidOperation, ValueError):
        return 0.0

def normalize_retailer_results(raw_data: dict, retailer: str, query: str) -> list:
    """Convert raw scrape results into ProductPrice objects."""
    from models import ProductPrice
    
    products = []
    raw_products = raw_data.get("data", {}).get("products", [])
    
    for item in raw_products[:10]:  # Top 10 results
        price = parse_price(item.get("price", ""))
        if price <= 0:
            continue
            
        url = item.get("url", "")
        if url.startswith("/"):
            base = {
                "amazon": "https://www.amazon.com",
                "walmart": "https://www.walmart.com"
            }.get(retailer, "")
            url = base + url
        
        products.append(ProductPrice(
            product_name=item.get("name", "").strip()[:200],
            retailer=retailer,
            price=price,
            currency="USD",
            url=url,
            in_stock=price > 0,
            timestamp=__import__('datetime').datetime.utcnow().isoformat()
        ))
    
    return products

Step 4: Build the Comparison Engine

Aggregate results from multiple retailers and sort by price:

# engine.py
import concurrent.futures
from models import PriceComparison
from scrapers import scrape_retailer, RETAILER_CONFIGS
from normalizer import normalize_retailer_results

def compare_prices(query: str, retailers: list[str] = None) -> PriceComparison:
    """Compare prices across multiple retailers."""
    if retailers is None:
        retailers = list(RETAILER_CONFIGS.keys())
    
    all_products = []
    
    # Scrape retailers in parallel for speed
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = {
            executor.submit(scrape_retailer, r, query): r 
            for r in retailers
        }
        
        for future in concurrent.futures.as_completed(futures):
            retailer = futures[future]
            try:
                raw_data = future.result()
                products = normalize_retailer_results(raw_data, retailer, query)
                all_products.extend(products)
            except Exception as e:
                print(f"Failed to scrape {retailer}: {e}")
    
    # Sort by price (lowest first)
    all_products.sort(key=lambda p: p.price)
    
    return PriceComparison(
        query=query,
        results=all_products
    )

Step 5: Create the FastAPI Endpoint

Wrap everything in a REST API:

# main.py
from fastapi import FastAPI, HTTPException, Query
from fastapi.middleware.cors import CORSMiddleware
from engine import compare_prices
import json
import time

app = FastAPI(
    title="Price Comparison API",
    description="Compare product prices across multiple retailers",
    version="1.0.0"
)

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["GET"],
    allow_headers=["*"],
)

# Simple in-memory cache
cache = {}

@app.get("/compare")
def compare(
    q: str = Query(..., description="Product to search for"),
    retailers: str = Query("amazon,walmart", description="Comma-separated retailer list"),
    cache_ttl: int = Query(3600, description="Cache TTL in seconds")
):
    cache_key = f"{q}:{retailers}"
    
    # Check cache first
    if cache_key in cache:
        cached_data, cached_time = cache[cache_key]
        if time.time() - cached_time < cache_ttl:
            return cached_data
    
    # Run comparison
    retailer_list = [r.strip() for r in retailers.split(",")]
    try:
        result = compare_prices(q, retailer_list)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
    
    response = {
        "query": result.query,
        "total_results": len(result.results),
        "lowest_price": result.lowest_price.to_dict() if result.lowest_price else None,
        "highest_price": result.highest_price.to_dict() if result.highest_price else None,
        "results": [r.to_dict() for r in result.results],
        "scraped_at": result.scraped_at
    }
    
    # Cache the result
    cache[cache_key] = (response, time.time())
    
    return response

@app.get("/health")
def health():
    return {"status": "ok"}

Step 6: Run and Test

# Start the server
uvicorn main:app --host 0.0.0.0 --port 8000

# Test it
curl "http://localhost:8000/compare?q=wireless+headphones&retailers=amazon,walmart"

Expected response:

{
  "query": "wireless headphones",
  "total_results": 18,
  "lowest_price": {
    "product_name": "Sony WH-1000XM5",
    "retailer": "amazon",
    "price": 248.00,
    "currency": "USD",
    "url": "https://www.amazon.com/dp/B09XS7JWHH",
    "in_stock": true
  },
  "highest_price": {
    "product_name": "Apple AirPods Max",
    "retailer": "walmart",
    "price": 549.00
  },
  "results": [...]
}

Step 7: Add Persistence and Price History

Track price changes over time with Redis:

# history.py
import redis
import json
from datetime import datetime

r = redis.Redis(host='localhost', port=6379, db=0)

def save_price_history(comparison_result: dict):
    """Save price data for historical tracking."""
    query = comparison_result["query"]
    date_key = datetime.utcnow().strftime("%Y-%m-%d")
    
    for product in comparison_result["results"]:
        key = f"price:{query}:{product['retailer']}:{product['product_name']}"
        r.zadd(key, {json.dumps(product): datetime.utcnow().timestamp()})

def get_price_trend(product_name: str, retailer: str, days: int = 30) -> list:
    """Get price history for a product."""
    key = f"price:*:{retailer}:{product_name}"
    cutoff = (datetime.utcnow().timestamp()) - (days * 86400)
    
    history = []
    for matching_key in r.scan_iter(key):
        entries = r.zrangebyscore(matching_key, cutoff, "+inf")
        for entry in entries:
            history.append(json.loads(entry))
    
    return sorted(history, key=lambda x: x["timestamp"])

Common Issues and Solutions

1. Bot Detection Blocking Your Scrapes

Problem: E-commerce sites return CAPTCHAs or 403 errors.

Solution: SearchHive's anti-bot bypass is enabled by default on paid plans. Ensure anti_bot: True is set in your ScrapeForge request.

2. Inconsistent Price Formats

Problem: Prices appear as "$29.99", "29,99 €", "Price: $30.00 (was $45.00)".

Solution: The parse_price() function in Step 3 handles most formats. Add retailer-specific parsers for edge cases.

3. Rate Limiting

Problem: Scraping too fast triggers rate limits.

Solution: Add delays between requests and use concurrent futures with limited workers:

import time

# Add a small delay between retailer scrapes
for retailer in retailers:
    result = scrape_retailer(retailer, query)
    time.sleep(2)  # 2 second delay

4. Product Matching Across Retailers

Problem: The same product has different names on different sites.

Solution: Use fuzzy string matching:

from difflib import SequenceMatcher

def match_products(products_a: list, products_b: list, threshold=0.7) -> list:
    matches = []
    for pa in products_a:
        for pb in products_b:
            ratio = SequenceMatcher(None, pa.name.lower(), pb.name.lower()).ratio()
            if ratio >= threshold:
                matches.append((pa, pb, ratio))
    return matches

Next Steps

Add more retailers: Extend RETAILER_CONFIGS with eBay, Target, Best Buy
Add alerts: Use webhooks to notify users when prices drop below a threshold
Deploy: Containerize with Docker and deploy to your preferred cloud
Scale: Add Redis caching, rate limiting, and a task queue for high-volume use

Ready to build? Get your free SearchHive API key — 1,000 requests/month included, no credit card required. Check the ScrapeForge docs for advanced extraction options.

How to Build a Price Comparison API — Step-by-Step Tutorial

AI-Powered Research

How to Build a Price Comparison API — Step-by-Step Tutorial

Key Takeaways

Prerequisites

Step 1: Define Your Data Model

Step 2: Set Up SearchHive ScrapeForge for E-Commerce

Step 3: Normalize and Clean Price Data

Step 4: Build the Comparison Engine

Step 5: Create the FastAPI Endpoint

Step 6: Run and Test

Step 7: Add Persistence and Price History

Common Issues and Solutions

1. Bot Detection Blocking Your Scrapes

2. Inconsistent Price Formats

3. Rate Limiting

4. Product Matching Across Retailers

Next Steps

Keywords

RELATED ARTICLES

How to Extract Social Media Data — Step-by-Step Guide

Search API for LLM — Common Questions Answered

Top 7 AI Agent Frameworks for Building Production Applications in 2026

BUILD WITH SEARCHHIVE