E-commerce pricing data drives competitive strategy, dynamic pricing engines, and market intelligence. Whether you're monitoring competitor prices, building a price comparison tool, or tracking product trends, you need reliable access to pricing data from multiple online stores.
This tutorial shows how to scrape e-commerce pricing data at scale using Python and the SearchHive API, with code for single products, entire categories, and multi-store comparisons.
Key Takeaways
- SearchHive's ScrapeForge handles JavaScript-rendered product pages, AJAX pricing, and anti-bot measures
- DeepDive API crawls entire category pages automatically with pagination handling
- Structured extraction returns clean free JSON formatter with prices, names, ratings, and availability
- Multi-store scraping requires proxy rotation (built into SearchHive) to avoid blocks
- The free tier (500 credits) supports scraping ~100-500 product pages depending on complexity
Prerequisites
- Python 3.8+
- SearchHive API key (free at searchhive.dev)
- Basic understanding of HTML/CSS selectors
pip install requests searchhive pandas
Step 1: Scrape a Single Product Page
Start with the fundamentals -- extracting pricing data from a single product page:
import requests
import json
API_KEY="***"
BASE_URL = "https://api.searchhive.dev/v1"
def scrape_product_price(url):
# Extract pricing data from a single product page
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"format": "json",
"render_js": True,
"wait_for": ".price, [data-price], #price",
"extract": {
"product_name": "h1.product-title, h1[itemprop='name']",
"price": ".price, [data-price], span.price-value",
"original_price": ".price-old, .was-price, del.price",
"currency": ".price::attr(data-currency), [itemprop='priceCurrency']::attr(content)",
"availability": ".stock, .availability, [itemprop='availability']::attr(content)",
"rating": ".rating, [itemprop='ratingValue']::attr(content)",
"review_count": ".review-count, [itemprop='reviewCount']::attr(content)",
"image": "img.main-product::attr(src), [itemprop='image']::attr(content)",
"sku": ".sku, [itemprop='sku']::attr(content)"
}
}
)
if response.status_code == 200:
return response.json().get("data", {})
else:
print(f"Error {response.status_code}: {response.text}")
return {}
# Example usage
product = scrape_product_price("https://example-store.com/product/wireless-keyboard")
print(f"Product: {product.get('product_name')}")
print(f"Price: {product.get('price')}")
print(f"Original: {product.get('original_price')}")
print(f"Available: {product.get('availability')}")
Step 2: Scrape an Entire Product Category
Crawling category pages is where scraping becomes powerful. SearchHive's DeepDive API handles pagination automatically:
def scrape_category(store_url, category_path, extract_fields, max_pages=10):
# Crawl an entire product category with pagination
response = requests.post(
f"{BASE_URL}/deepdive",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"start_url": f"{store_url}{category_path}",
"max_pages": max_pages,
"follow_patterns": [category_path],
"extract": extract_fields,
"output_format": "json",
"render_js": True,
"wait_for": ".product-item, .product-card, [data-product]"
}
)
crawl = response.json()
print(f"Crawl ID: {crawl.get('crawl_id')}")
print(f"Status: {crawl.get('status')}")
# Wait for completion
import time
crawl_id = crawl["crawl_id"]
for _ in range(60): # Max 5 minutes
result = requests.get(
f"{BASE_URL}/deepdive/{crawl_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
).json()
if result["status"] == "completed":
products = []
for page in result.get("pages", []):
data = page.get("data", [])
if isinstance(data, list):
products.extend(data)
elif isinstance(data, dict):
products.append(data)
return products
print(f" Progress: {result.get('pages_crawled', 0)} pages...")
time.sleep(5)
return []
# Example: scrape all products in a category
products = scrape_category(
store_url="https://example-store.com",
category_path="/category/electronics",
extract_fields={
"_container": ".product-item, .product-card",
"name": "h3.name, .product-title a",
"price": ".price, span.price-value",
"url": "a.product-link::attr(href)",
"image": "img::attr(src)",
"rating": ".rating::text"
},
max_pages=5
)
print(f"Scraped {len(products)} products")
for p in products[:5]:
print(f" {p.get('name')}: {p.get('price')}")
Step 3: Compare Prices Across Multiple Stores
The real value of pricing data comes from comparing the same product across competitors:
import time
import json
def compare_prices(product_name, store_configs):
# Compare a product's price across multiple stores
results = []
for store_name, config in store_configs.items():
# Search for the product on each store
search_response = requests.post(
f"{BASE_URL}/search",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"query": f"{product_name} site:{config['domain']}",
"num_results": 5
}
)
search_results = search_response.json().get("results", [])
# Scrape the top result
if search_results:
product_url = search_results[0]["url"]
product_data = scrape_product_price(product_url)
product_data["store"] = store_name
product_data["url"] = product_url
results.append(product_data)
print(f" {store_name}: {product_data.get('price')}")
else:
print(f" {store_name}: not found")
time.sleep(2)
# Sort by price (cheapest first)
results.sort(key=lambda x: parse_price(x.get("price", "0")))
return results
def parse_price(price_str):
# Convert price string to float for comparison
if not price_str:
return float("inf")
cleaned = "".join(c for c in str(price_str) if c.isdigit() or c == ".")
try:
return float(cleaned)
except ValueError:
return float("inf")
# Example: compare prices across stores
stores = {
"Store A": {"domain": "store-a.com"},
"Store B": {"domain": "store-b.com"},
"Store C": {"domain": "store-c.com"},
}
comparison = compare_prices("Logitech MX Master 3S", stores)
print("\nPrice comparison (lowest first):")
for i, item in enumerate(comparison):
print(f" {i+1}. {item['store']}: {item.get('price')} ({item.get('url')})")
Step 4: Build a Price Tracking Database
Store historical pricing data to track trends and detect price changes:
import sqlite3
import csv
from datetime import datetime
def init_price_db(db_path="prices.db"):
# Initialize database for price tracking
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("223| CREATE TABLE IF NOT EXISTS prices ( 224| id INTEGER PRIMARY KEY AUTOINCREMENT, 225| product_name TEXT, 226| store TEXT, 227| price REAL, 228| original_price REAL, 229| url TEXT, 230| in_stock INTEGER DEFAULT 1, 231| checked_at TEXT DEFAULT CURRENT_TIMESTAMP 232| ) 233|"))
c.execute("236| CREATE INDEX IF NOT EXISTS idx_prices_product 237| ON prices(product_name, checked_at) 238|"))
conn.commit()
conn.close()
init_price_db()
def save_price_data(product_data, db_path="prices.db"):
# Save price data to database
conn = sqlite3.connect(db_path)
c = conn.cursor()
price = parse_price(product_data.get("price"))
original = parse_price(product_data.get("original_price"))
in_stock = 1 if "out of stock" not in str(
product_data.get("availability", "")
).lower() else 0
c.execute("257| INSERT INTO prices 258| (product_name, store, price, original_price, url, in_stock) 259| VALUES (?, ?, ?, ?, ?, ?) 260|"), (
product_data.get("product_name", ""),
product_data.get("store", "unknown"),
price, original,
product_data.get("url", ""),
in_stock
))
conn.commit()
conn.close()
def get_price_history(product_name, days=30, db_path="prices.db"):
# Get price history for a product
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("277| SELECT store, price, checked_at 278| FROM prices 279| WHERE product_name = ? 280| AND checked_at >= date('now', ?) 281| ORDER BY checked_at 282|"), (product_name, f"-{days} days"))
rows = c.fetchall()
conn.close()
return [
{"store": r[0], "price": r[1], "date": r[2]}
for r in rows
]
def find_price_drops(db_path="prices.db", threshold_pct=10):
# Find products with significant price drops
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("298| WITH latest AS ( 299| SELECT product_name, store, price, 300| ROW_NUMBER() OVER ( 301| PARTITION BY product_name, store 302| ORDER BY checked_at DESC 303| ) as rn 304| FROM prices 305| ), 306| previous AS ( 307| SELECT product_name, store, price, 308| ROW_NUMBER() OVER ( 309| PARTITION BY product_name, store 310| ORDER BY checked_at DESC 311| ) as rn 312| FROM prices 313| ) 314| SELECT 315| l.product_name, l.store, 316| l.price as current_price, 317| p.price as previous_price, 318| ROUND((1.0 - l.price / p.price) * 100, 1) as drop_pct 319| FROM latest l 320| JOIN previous p ON l.product_name = p.product_name 321| AND l.store = p.store AND p.rn = 2 322| WHERE l.rn = 1 323| AND l.price < p.price 324| AND (1.0 - l.price / p.price) >= ? 325| ORDER BY drop_pct DESC 326| LIMIT 20 327|"), (threshold_pct / 100.0,))
drops = []
for row in c.fetchall():
drops.append({
"product": row[0],
"store": row[1],
"current": row[2],
"previous": row[3],
"drop_pct": row[4]
})
conn.close()
return drops
# Example: find price drops
drops = find_price_drops(threshold_pct=15)
print("Significant price drops:")
for d in drops:
print(f" {d['product']} ({d['store']}): "
f"${d['previous']} -> ${d['current']} (-{d['drop_pct']}%)")
Step 5: Export and Analyze with Pandas
import sqlite3
import pandas as pd
def export_to_csv(db_path="prices.db", output="pricing_data.csv"):
# Export price data to CSV for analysis in Excel/Sheets
conn = sqlite3.connect(db_path)
df = pd.read_sql_query("SELECT product_name, store, price, original_price, in_stock, checked_at FROM prices ORDER BY checked_at DESC"), conn)
conn.close()
df.to_csv(output, index=False)
print(f"Exported {len(df)} records to {output}")
return df
def daily_price_summary(db_path="prices.db"):
# Get today's pricing snapshot
conn = sqlite3.connect(db_path)
df = pd.read_sql_query("SELECT product_name, store, MIN(price) as lowest_price, MAX(price) as highest_price, AVG(price) as avg_price FROM prices WHERE date(checked_at) = date('now') GROUP BY product_name, store ORDER BY product_name, avg_price"), conn)
conn.close()
print(f"Today's pricing summary ({len(df)} products):")
print(df.to_string(index=False))
return df
# Run analysis
df = daily_price_summary()
Step 6: Schedule Daily Price Checks
import time
PRODUCTS = [
("Logitech MX Master 3S", ["store-a.com", "store-b.com"]),
("Sony WH-1000XM5", ["store-a.com", "store-c.com"]),
("Apple AirPods Pro 2", ["store-a.com", "store-b.com", "store-c.com"]),
]
def daily_price_check():
# Run daily price checks for all tracked products
print(f"Price Check - {datetime.now().strftime('%Y-%m-%d')}")
for product_name, stores in PRODUCTS:
store_configs = {s: {"domain": s} for s in stores}
results = compare_prices(product_name, store_configs)
for item in results:
save_price_data(item)
# Check for price drops
drops = find_price_drops(threshold_pct=10)
if drops:
print(f"\nALERT: {len(drops)} price drops detected!")
for d in drops:
print(f" {d['product']}: -{d['drop_pct']}%")
export_to_csv()
if __name__ == "__main__":
daily_price_check()
# Cron: run daily at 7 AM
0 7 * * * /usr/bin/python3 /path/to/price_monitor.py >> /var/log/prices.log 2>&1
Common Issues
Dynamic pricing: Some stores show different prices based on location, device, or cookies. SearchHive's proxy rotation helps, but for critical accuracy, specify a consistent location.
Anti-bot protection: Major retailers (Amazon, Walmart) have aggressive protections. Space requests widely and use the Stealth parameter. For Amazon specifically, consider their Product Advertising API instead.
Inconsistent selectors: Each store uses different HTML structures. The flexible selector syntax in ScrapeForge (comma-separated alternatives) helps, but you may need per-store configuration.
Price formatting: Prices appear in different formats ($19.99, 19,99 EUR, $1,299.00). The parse_price function handles common formats but may need adjustment for your specific stores.
Cost Comparison
| Approach | Cost per 1K pages | Notes |
|---|---|---|
| Manual browser | $0 (your time) | Doesn't scale |
| ScraperAPI | $30+/mo | Basic JS rendering |
| Bright Data | $12.75+/mo | Requires proxy setup |
| SearchHive Free | $0 (500 credits) | ~100-500 pages |
| SearchHive Starter | $9/mo | ~1K-5K pages |
| SearchHive Builder | $49/mo | ~20K-100K pages |
Next Steps
- Add price drop alerts via email or Slack
- Build a competitor pricing dashboard with Streamlit
- Integrate with your e-commerce platform to update your own prices dynamically
- Track product availability and stockout patterns alongside pricing
Get started free with 500 credits at searchhive.dev -- no credit card required. Check the docs for the full API reference and Python SDK.
Related tutorials: /tutorials/how-to-build-an-seo-monitoring-tool-with-search-apis | /tutorials/how-to-monitor-brand-mentions-across-the-web-with-python
Compare: /compare/firecrawl | /compare/scrapingbee