How to Scrape Amazon Product Data with Python

Amazon product data is the backbone of price comparison tools, competitor monitoring, and market research platforms. The challenge: Amazon's anti-bot systems are aggressive, their HTML structure shifts regularly, and scraping at scale requires proxy rotation.

This tutorial walks through building a production-ready Amazon product scraper using Python and SearchHive's ScrapeForge API. No browser automation, no proxy management, no CAPTCHA solving infrastructure — the API handles all of that.

Key Takeaways

Direct requests to Amazon fail fast — you need proxy rotation and JS rendering to get reliable data
SearchHive's ScrapeForge API handles anti-bot bypassing, proxy rotation, and CAPTCHA solving automatically
DeepDive (AI extraction) converts raw Amazon pages into structured JSON without fragile CSS selectors
SearchHive SwiftSearch finds Amazon product URLs when you don't know the exact ASIN
Rate limiting and respectful crawling are non-negotiable for sustained access

Prerequisites

Python 3.8+
requests library (pip install requests)
A SearchHive API key (free tier available)

Step 1: Scrape a Single Amazon Product Page

The simplest case — you have a product URL (or ASIN) and want structured data back.

import requests
import json

SEARCHHIVE_API_KEY = "your_api_key_here"
BASE_URL = "https://api.searchhive.dev/v1"
HEADERS = {"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"}

def scrape_amazon_product(asin: str) -> dict:
    """Scrape a single Amazon product page and return structured data."""
    url = f"https://www.amazon.com/dp/{asin}"
    
    resp = requests.post(f"{BASE_URL}/extract", json={
        "url": url,
        "prompt": """Extract the following product information:
        - Product title
        - Price (current sale price)
        - Original/list price
        - Rating (star count)
        - Number of reviews
        - Availability status
        - Product description (first paragraph)
        - Main category
        - All bullet points (feature list)"""
    }, headers=HEADERS)
    
    if resp.status_code == 200:
        return resp.json()["data"]
    else:
        raise Exception(f"Scrape failed: {resp.status_code} - {resp.text}")

# Example usage
product = scrape_amazon_product("B09V3KXJPB")
print(json.dumps(product, indent=2))

This uses DeepDive, SearchHive's AI-powered extraction endpoint. Instead of writing CSS selectors that break when Amazon updates their DOM, you describe what you want in natural language and get structured JSON back.

Sample output:

{
  "product_title": "Apple AirPods Pro (2nd Generation)",
  "price": "$189.99",
  "original_price": "$249.00",
  "rating": 4.7,
  "review_count": 112453,
  "availability": "In Stock",
  "description": "Active Noise Cancellation...",
  "category": "Electronics > Earbuds",
  "bullet_points": [
    "Active Noise Cancellation removes background noise",
    "Transparency mode lets outside sound in",
    "Customizable fit with silicone ear tips"
  ]
}

Step 2: Find Amazon Products with Search

If you don't know the exact ASIN, use SwiftSearch to find products matching a query.

import requests

def search_amazon_products(query: str, num_results: int = 10) -> list:
    """Search Amazon for products using SearchHive SwiftSearch."""
    resp = requests.get(f"{BASE_URL}/search", params={
        "q": f"{query} site:amazon.com",
        "engine": "google",
        "num": num_results
    }, headers=HEADERS)
    
    products = []
    for result in resp.json().get("results", []):
        url = result["url"]
        # Filter to product pages only
        if "/dp/" in url or "/gp/product/" in url:
            products.append({
                "title": result["title"],
                "url": url,
                "snippet": result.get("snippet", "")
            })
    return products

# Find wireless earbuds on Amazon
results = search_amazon_products("wireless earbuds bestseller 2025")
for p in results:
    print(f"{p['title']}\n  {p['url']}\n")

This searches Google for Amazon product pages matching your query. It's more reliable than scraping Amazon's internal search, which is heavily gated.

Step 3: Bulk Scrape with Rate Limiting

Scraping multiple products requires rate limiting to avoid triggering anti-bot systems (even with proxy rotation, respect the platform).

import time
import json
from pathlib import Path

def scrape_product_list(asins: list, delay: float = 2.0, output_file: str = "products.json"):
    """Scrape multiple Amazon products with rate limiting."""
    results = []
    seen_asins = set()
    
    # Resume from existing results if file exists
    if Path(output_file).exists():
        with open(output_file) as f:
            existing = json.load(f)
            for item in existing:
                seen_asins.add(item.get("asin"))
            results = existing
        print(f"Resuming — {len(seen_asins)} products already scraped")
    
    for i, asin in enumerate(asins):
        if asin in seen_asins:
            continue
        
        try:
            product = scrape_amazon_product(asin)
            product["asin"] = asin
            product["scraped_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ")
            results.append(product)
            
            # Save incrementally
            with open(output_file, "w") as f:
                json.dump(results, f, indent=2)
            
            print(f"[{i+1}/{len(asins)}] {asin} — {product.get('price', 'N/A')}")
            
        except Exception as e:
            print(f"[{i+1}/{len(asins)}] {asin} — FAILED: {e}")
            results.append({"asin": asin, "error": str(e)})
        
        time.sleep(delay)
    
    successful = [r for r in results if "error" not in r]
    failed = [r for r in results if "error" in r]
    print(f"Done. {len(successful)} scraped, {len(failed)} failed.")
    return results

# Scrape a list of ASINs
asins = ["B09V3KXJPB", "B0CHWRXH8B", "B0C2P3F5T7", "B0BSHF7WHW", "B0D1XD1ZV3"]
scrape_product_list(asins, delay=2.0)

Key design decisions in this code:

Incremental saves — if the script crashes at product 47 of 100, you resume at 48, not from scratch
2-second delay — conservative rate limit that works reliably for most volumes
Error capture — failed scrapes are logged with the error, not silently dropped

Step 4: Extract Reviews from Product Pages

Product reviews require scraping a different URL pattern. SearchHive's DeepDive can extract reviews alongside product data.

def scrape_amazon_reviews(asin: str, num_pages: int = 3) -> list:
    """Scrape reviews from an Amazon product page."""
    reviews = []
    
    for page in range(1, num_pages + 1):
        url = f"https://www.amazon.com/product-reviews/{asin}?pageNumber={page}"
        
        resp = requests.post(f"{BASE_URL}/extract", json={
            "url": url,
            "prompt": """Extract all reviews on this page. For each review:
        - Rating (1-5 stars)
        - Review title
        - Review body text
        - Review date
        - Verified purchase status
        - Helpful votes count"""
        }, headers=HEADERS)
        
        if resp.status_code == 200:
            data = resp.json()["data"]
            if isinstance(data, list):
                reviews.extend(data)
            else:
                reviews.append(data)
        
        time.sleep(2)
    
    return reviews

reviews = scrape_amazon_reviews("B09V3KXJPB", num_pages=2)
for r in reviews[:3]:
    print(f"{'★' * r.get('rating', 0)} {r.get('title', 'No title')}")
    print(f"  {r.get('body', 'No body')[:120]}...\n")

Step 5: Build a Price Tracker

Combine search, scraping, and persistence into a recurring price tracker.

import json
import time
from datetime import datetime
from pathlib import Path

class AmazonPriceTracker:
    def __init__(self, api_key: str, data_file: str = "price_history.json"):
        self.headers = {"Authorization": f"Bearer {api_key}"}
        self.base = "https://api.searchhive.dev/v1"
        self.data_file = data_file
        self._load_data()
    
    def _load_data(self):
        self.data = {}
        if Path(self.data_file).exists():
            with open(self.data_file) as f:
                self.data = json.load(f)
    
    def _save_data(self):
        with open(self.data_file, "w") as f:
            json.dump(self.data, f, indent=2)
    
    def track_product(self, asin: str) -> dict:
        """Scrape current price and append to history."""
        product = scrape_amazon_product(asin)
        now = datetime.utcnow().isoformat()
        
        if asin not in self.data:
            self.data[asin] = {
                "title": product.get("product_title", ""),
                "url": f"https://www.amazon.com/dp/{asin}",
                "history": []
            }
        
        entry = {
            "timestamp": now,
            "price": product.get("price"),
            "original_price": product.get("original_price"),
            "rating": product.get("rating"),
            "availability": product.get("availability")
        }
        
        self.data[asin]["history"].append(entry)
        self._save_data()
        
        # Check for price drop
        history = self.data[asin]["history"]
        if len(history) >= 2:
            prev = history[-2]["price"]
            curr = entry["price"]
            if prev and curr and curr < prev:
                print(f"PRICE DROP: {asin} — {prev} -> {curr}")
        
        return entry
    
    def get_price_history(self, asin: str) -> list:
        return self.data.get(asin, {}).get("history", [])


# Usage
tracker = AmazonPriceTracker("your_api_key")

watchlist = ["B09V3KXJPB", "B0CHWRXH8B", "B0C2P3F5T7"]
for asin in watchlist:
    tracker.track_product(asin)
    time.sleep(2)

# Check history
for asin in watchlist:
    history = tracker.get_price_history(asin)
    print(f"{asin}: {len(history)} data points")

Common Issues and Fixes

Amazon returns CAPTCHA pages. The ScrapeForge API handles this with automatic proxy rotation and CAPTCHA solving. If you see consistent failures, increase the delay between requests to 3-5 seconds.

Prices returned as "N/A" or missing. Some Amazon pages show prices only when items are in stock. Check the availability field — if it says "Out of Stock," the price field may be empty.

Different Amazon marketplace needed? Change the URL from amazon.com to amazon.co.uk, amazon.de, etc. Set the country parameter in your scrape request to match.

Rate limiting (429 errors). If SearchHive returns rate limit errors, you've hit your plan's throughput limit. Either reduce your scrape frequency or upgrade your plan.

Best Practices for Amazon Scraping

Respect robots.txt and terms of service. Only scrape data you have a legitimate use for.
Use the minimum delay necessary. 2 seconds between requests is a reasonable default. Don't hammer the site.
Cache aggressively. Product titles and descriptions don't change often — cache them and only refresh prices daily.
Handle partial failures gracefully. Some products will fail to scrape. Log and retry, don't crash.
Store raw responses. Save the raw API response alongside your parsed data for debugging.

What's Next

Combine with SearchHive's SwiftSearch to automatically discover new products in your niche
Use DeepDive's structured extraction to build competitor comparison dashboards
Schedule the price tracker with cron for daily automated monitoring
Export price history to CSV/JSON for analysis in pandas or your BI tool

/blog/cheapest-web-scraping-api-for-startups-ai-agents-in-2025 /blog/web-scraping-ecommerce-price-monitoring

Start Building

All the code above works with SearchHive's free tier. Sign up at searchhive.dev, grab your API key, and start scraping Amazon product data in under 5 minutes.

How to Scrape Amazon Product Data with Python — Complete Tutorial

AI-Powered Research

Key Takeaways

Prerequisites

Step 1: Scrape a Single Amazon Product Page

Step 2: Find Amazon Products with Search

Step 3: Bulk Scrape with Rate Limiting

Step 4: Extract Reviews from Product Pages

Step 5: Build a Price Tracker

Common Issues and Fixes

Best Practices for Amazon Scraping

What's Next

Start Building

Keywords

RELATED ARTICLES

7 Best Firecrawl Alternatives for Web Scraping and Content Extraction

9 SerpApi Alternatives That Cost Less in 2026

Helium Scraper Alternatives — Better Visual Web Scraping

BUILD WITH SEARCHHIVE