How to Scrape E-commerce Pricing Data with Python

E-commerce pricing data drives competitive strategy, dynamic pricing engines, and market intelligence. Whether you're monitoring competitor prices, building a price comparison tool, or tracking product trends, you need reliable access to pricing data from multiple online stores.

This tutorial shows how to scrape e-commerce pricing data at scale using Python and the SearchHive API, with code for single products, entire categories, and multi-store comparisons.

Key Takeaways

SearchHive's ScrapeForge handles JavaScript-rendered product pages, AJAX pricing, and anti-bot measures
DeepDive API crawls entire category pages automatically with pagination handling
Structured extraction returns clean free JSON formatter with prices, names, ratings, and availability
Multi-store scraping requires proxy rotation (built into SearchHive) to avoid blocks
The free tier (500 credits) supports scraping ~100-500 product pages depending on complexity

Prerequisites

Python 3.8+
SearchHive API key (free at searchhive.dev)
Basic understanding of HTML/CSS selectors

pip install requests searchhive pandas

Step 1: Scrape a Single Product Page

Start with the fundamentals -- extracting pricing data from a single product page:

import requests
import json

API_KEY="***"
BASE_URL = "https://api.searchhive.dev/v1"

def scrape_product_price(url):
    # Extract pricing data from a single product page
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "url": url,
            "format": "json",
            "render_js": True,
            "wait_for": ".price, [data-price], #price",
            "extract": {
                "product_name": "h1.product-title, h1[itemprop='name']",
                "price": ".price, [data-price], span.price-value",
                "original_price": ".price-old, .was-price, del.price",
                "currency": ".price::attr(data-currency), [itemprop='priceCurrency']::attr(content)",
                "availability": ".stock, .availability, [itemprop='availability']::attr(content)",
                "rating": ".rating, [itemprop='ratingValue']::attr(content)",
                "review_count": ".review-count, [itemprop='reviewCount']::attr(content)",
                "image": "img.main-product::attr(src), [itemprop='image']::attr(content)",
                "sku": ".sku, [itemprop='sku']::attr(content)"
            }
        }
    )

    if response.status_code == 200:
        return response.json().get("data", {})
    else:
        print(f"Error {response.status_code}: {response.text}")
        return {}

# Example usage
product = scrape_product_price("https://example-store.com/product/wireless-keyboard")
print(f"Product: {product.get('product_name')}")
print(f"Price: {product.get('price')}")
print(f"Original: {product.get('original_price')}")
print(f"Available: {product.get('availability')}")

Step 2: Scrape an Entire Product Category

Crawling category pages is where scraping becomes powerful. SearchHive's DeepDive API handles pagination automatically:

def scrape_category(store_url, category_path, extract_fields, max_pages=10):
    # Crawl an entire product category with pagination
    response = requests.post(
        f"{BASE_URL}/deepdive",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "start_url": f"{store_url}{category_path}",
            "max_pages": max_pages,
            "follow_patterns": [category_path],
            "extract": extract_fields,
            "output_format": "json",
            "render_js": True,
            "wait_for": ".product-item, .product-card, [data-product]"
        }
    )

    crawl = response.json()
    print(f"Crawl ID: {crawl.get('crawl_id')}")
    print(f"Status: {crawl.get('status')}")

    # Wait for completion
    import time
    crawl_id = crawl["crawl_id"]
    for _ in range(60):  # Max 5 minutes
        result = requests.get(
            f"{BASE_URL}/deepdive/{crawl_id}",
            headers={"Authorization": f"Bearer {API_KEY}"}
        ).json()

        if result["status"] == "completed":
            products = []
            for page in result.get("pages", []):
                data = page.get("data", [])
                if isinstance(data, list):
                    products.extend(data)
                elif isinstance(data, dict):
                    products.append(data)
            return products

        print(f"  Progress: {result.get('pages_crawled', 0)} pages...")
        time.sleep(5)

    return []

# Example: scrape all products in a category
products = scrape_category(
    store_url="https://example-store.com",
    category_path="/category/electronics",
    extract_fields={
        "_container": ".product-item, .product-card",
        "name": "h3.name, .product-title a",
        "price": ".price, span.price-value",
        "url": "a.product-link::attr(href)",
        "image": "img::attr(src)",
        "rating": ".rating::text"
    },
    max_pages=5
)

print(f"Scraped {len(products)} products")
for p in products[:5]:
    print(f"  {p.get('name')}: {p.get('price')}")

Step 3: Compare Prices Across Multiple Stores

The real value of pricing data comes from comparing the same product across competitors:

import time
import json

def compare_prices(product_name, store_configs):
    # Compare a product's price across multiple stores
    results = []

    for store_name, config in store_configs.items():
        # Search for the product on each store
        search_response = requests.post(
            f"{BASE_URL}/search",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={
                "query": f"{product_name} site:{config['domain']}",
                "num_results": 5
            }
        )

        search_results = search_response.json().get("results", [])

        # Scrape the top result
        if search_results:
            product_url = search_results[0]["url"]
            product_data = scrape_product_price(product_url)
            product_data["store"] = store_name
            product_data["url"] = product_url
            results.append(product_data)
            print(f"  {store_name}: {product_data.get('price')}")
        else:
            print(f"  {store_name}: not found")

        time.sleep(2)

    # Sort by price (cheapest first)
    results.sort(key=lambda x: parse_price(x.get("price", "0")))
    return results

def parse_price(price_str):
    # Convert price string to float for comparison
    if not price_str:
        return float("inf")
    cleaned = "".join(c for c in str(price_str) if c.isdigit() or c == ".")
    try:
        return float(cleaned)
    except ValueError:
        return float("inf")

# Example: compare prices across stores
stores = {
    "Store A": {"domain": "store-a.com"},
    "Store B": {"domain": "store-b.com"},
    "Store C": {"domain": "store-c.com"},
}

comparison = compare_prices("Logitech MX Master 3S", stores)
print("\nPrice comparison (lowest first):")
for i, item in enumerate(comparison):
    print(f"  {i+1}. {item['store']}: {item.get('price')} ({item.get('url')})")

Step 4: Build a Price Tracking Database

Store historical pricing data to track trends and detect price changes:

import sqlite3
import csv
from datetime import datetime

def init_price_db(db_path="prices.db"):
    # Initialize database for price tracking
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("223|  CREATE TABLE IF NOT EXISTS prices ( 224|   id INTEGER PRIMARY KEY AUTOINCREMENT, 225|   product_name TEXT, 226|   store TEXT, 227|   price REAL, 228|   original_price REAL, 229|   url TEXT, 230|   in_stock INTEGER DEFAULT 1, 231|   checked_at TEXT DEFAULT CURRENT_TIMESTAMP 232|  ) 233|"))

    c.execute("236|  CREATE INDEX IF NOT EXISTS idx_prices_product 237|  ON prices(product_name, checked_at) 238|"))

    conn.commit()
    conn.close()

init_price_db()

def save_price_data(product_data, db_path="prices.db"):
    # Save price data to database
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    price = parse_price(product_data.get("price"))
    original = parse_price(product_data.get("original_price"))
    in_stock = 1 if "out of stock" not in str(
        product_data.get("availability", "")
    ).lower() else 0

    c.execute("257|  INSERT INTO prices 258|  (product_name, store, price, original_price, url, in_stock) 259|  VALUES (?, ?, ?, ?, ?, ?) 260|"), (
        product_data.get("product_name", ""),
        product_data.get("store", "unknown"),
        price, original,
        product_data.get("url", ""),
        in_stock
    ))

    conn.commit()
    conn.close()

def get_price_history(product_name, days=30, db_path="prices.db"):
    # Get price history for a product
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("277|  SELECT store, price, checked_at 278|  FROM prices 279|  WHERE product_name = ? 280|  AND checked_at >= date('now', ?) 281|  ORDER BY checked_at 282|"), (product_name, f"-{days} days"))

    rows = c.fetchall()
    conn.close()

    return [
        {"store": r[0], "price": r[1], "date": r[2]}
        for r in rows
    ]

def find_price_drops(db_path="prices.db", threshold_pct=10):
    # Find products with significant price drops
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("298|  WITH latest AS ( 299|   SELECT product_name, store, price, 300|       ROW_NUMBER() OVER ( 301|        PARTITION BY product_name, store 302|        ORDER BY checked_at DESC 303|       ) as rn 304|   FROM prices 305|  ), 306|  previous AS ( 307|   SELECT product_name, store, price, 308|       ROW_NUMBER() OVER ( 309|        PARTITION BY product_name, store 310|        ORDER BY checked_at DESC 311|       ) as rn 312|   FROM prices 313|  ) 314|  SELECT 315|   l.product_name, l.store, 316|   l.price as current_price, 317|   p.price as previous_price, 318|   ROUND((1.0 - l.price / p.price) * 100, 1) as drop_pct 319|  FROM latest l 320|  JOIN previous p ON l.product_name = p.product_name 321|   AND l.store = p.store AND p.rn = 2 322|  WHERE l.rn = 1 323|  AND l.price < p.price 324|  AND (1.0 - l.price / p.price) >= ? 325|  ORDER BY drop_pct DESC 326|  LIMIT 20 327|"), (threshold_pct / 100.0,))

    drops = []
    for row in c.fetchall():
        drops.append({
            "product": row[0],
            "store": row[1],
            "current": row[2],
            "previous": row[3],
            "drop_pct": row[4]
        })

    conn.close()
    return drops

# Example: find price drops
drops = find_price_drops(threshold_pct=15)
print("Significant price drops:")
for d in drops:
    print(f"  {d['product']} ({d['store']}): "
          f"${d['previous']} -> ${d['current']} (-{d['drop_pct']}%)")

Step 5: Export and Analyze with Pandas

import sqlite3
import pandas as pd

def export_to_csv(db_path="prices.db", output="pricing_data.csv"):
    # Export price data to CSV for analysis in Excel/Sheets
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query("SELECT product_name, store, price, original_price,    in_stock, checked_at   FROM prices   ORDER BY checked_at DESC"), conn)
    conn.close()

    df.to_csv(output, index=False)
    print(f"Exported {len(df)} records to {output}")
    return df

def daily_price_summary(db_path="prices.db"):
    # Get today's pricing snapshot
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query("SELECT product_name, store,    MIN(price) as lowest_price,    MAX(price) as highest_price,    AVG(price) as avg_price   FROM prices   WHERE date(checked_at) = date('now')   GROUP BY product_name, store   ORDER BY product_name, avg_price"), conn)
    conn.close()

    print(f"Today's pricing summary ({len(df)} products):")
    print(df.to_string(index=False))
    return df

# Run analysis
df = daily_price_summary()

Step 6: Schedule Daily Price Checks

import time

PRODUCTS = [
    ("Logitech MX Master 3S", ["store-a.com", "store-b.com"]),
    ("Sony WH-1000XM5", ["store-a.com", "store-c.com"]),
    ("Apple AirPods Pro 2", ["store-a.com", "store-b.com", "store-c.com"]),
]

def daily_price_check():
    # Run daily price checks for all tracked products
    print(f"Price Check - {datetime.now().strftime('%Y-%m-%d')}")

    for product_name, stores in PRODUCTS:
        store_configs = {s: {"domain": s} for s in stores}
        results = compare_prices(product_name, store_configs)

        for item in results:
            save_price_data(item)

    # Check for price drops
    drops = find_price_drops(threshold_pct=10)
    if drops:
        print(f"\nALERT: {len(drops)} price drops detected!")
        for d in drops:
            print(f"  {d['product']}: -{d['drop_pct']}%")

    export_to_csv()

if __name__ == "__main__":
    daily_price_check()

# Cron: run daily at 7 AM
0 7 * * * /usr/bin/python3 /path/to/price_monitor.py >> /var/log/prices.log 2>&1

Common Issues

Dynamic pricing: Some stores show different prices based on location, device, or cookies. SearchHive's proxy rotation helps, but for critical accuracy, specify a consistent location.

Anti-bot protection: Major retailers (Amazon, Walmart) have aggressive protections. Space requests widely and use the Stealth parameter. For Amazon specifically, consider their Product Advertising API instead.

Inconsistent selectors: Each store uses different HTML structures. The flexible selector syntax in ScrapeForge (comma-separated alternatives) helps, but you may need per-store configuration.

Price formatting: Prices appear in different formats ($19.99, 19,99 EUR, $1,299.00). The parse_price function handles common formats but may need adjustment for your specific stores.

Cost Comparison

Approach	Cost per 1K pages	Notes
Manual browser	$0 (your time)	Doesn't scale
ScraperAPI	$30+/mo	Basic JS rendering
Bright Data	$12.75+/mo	Requires proxy setup
SearchHive Free	$0 (500 credits)	~100-500 pages
SearchHive Starter	$9/mo	~1K-5K pages
SearchHive Builder	$49/mo	~20K-100K pages

Next Steps

Add price drop alerts via email or Slack
Build a competitor pricing dashboard with Streamlit
Integrate with your e-commerce platform to update your own prices dynamically
Track product availability and stockout patterns alongside pricing

Get started free with 500 credits at searchhive.dev -- no credit card required. Check the docs for the full API reference and Python SDK.

Compare: /compare/firecrawl | /compare/scrapingbee

How to Scrape E-commerce Pricing Data with Python

AI-Powered Research

Key Takeaways

Prerequisites

Step 1: Scrape a Single Product Page

Step 2: Scrape an Entire Product Category

Step 3: Compare Prices Across Multiple Stores

Step 4: Build a Price Tracking Database

Step 5: Export and Analyze with Pandas

Step 6: Schedule Daily Price Checks

Common Issues

Cost Comparison

Next Steps

Keywords

RELATED ARTICLES

How to Scrape Yellow Pages for Business Data

How to Scrape Social Media Data for Market Research

How to Scrape Reddit Data for Market Research

BUILD WITH SEARCHHIVE