Ecommerce automation saves hundreds of hours per year on repetitive tasks like price monitoring, inventory tracking, competitor analysis, and product catalog updates. This tutorial walks you through building a complete ecommerce automation pipeline using Python and SearchHive's web scraping APIs -- no enterprise tools required.
Key Takeaways
- You can build a production ecommerce automation system with Python and a single SearchHive API key
- ScrapeForge handles JavaScript-rendered product pages that regular HTTP clients cannot access
- SwiftSearch monitors competitor SERP positions for product keywords
- DeepDive extracts structured product data from any product page URL
- Total cost starts at $0 (500 free credits) and scales to $49/month for serious volume
Prerequisites
Before you start, you will need:
- Python 3.9+ installed
- A SearchHive account (free at searchhive.dev)
- Your API key from the dashboard
- Basic familiarity with Python requests and free JSON formatter
pip install requests httpx pandas
Step 1: Set Up Your SearchHive Client
First, create a reusable client that wraps SearchHive's three APIs: SwiftSearch for search, ScrapeForge for scraping, and DeepDive for deep extraction.
# ecommerce_automation/client.py
import httpx
import json
import time
from typing import Optional
class SearchHiveClient:
"""Unified client for SearchHive APIs."""
BASE_URL = "https://api.searchhive.dev/v1"
def __init__(self, api_key: str):
self.api_key = api_key
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def swift_search(self, query: str, engine: str = "google",
num_results: int = 10) -> dict:
"""Search using SwiftSearch -- returns organic results."""
response = httpx.post(
f"{self.BASE_URL}/swiftsearch",
headers=self.headers,
json={
"query": query,
"engine": engine,
"num_results": num_results
},
timeout=30.0
)
response.raise_for_status()
return response.json()
def scrape_forge(self, url: str, render_js: bool = True,
format: str = "markdown") -> dict:
"""Scrape any page with ScrapeForge -- handles JS rendering."""
response = httpx.post(
f"{self.BASE_URL}/scrapeforge",
headers=self.headers,
json={
"url": url,
"render_js": render_js,
"format": format
},
timeout=60.0
)
response.raise_for_status()
return response.json()
def deep_dive(self, url: str, extract: Optional[dict] = None) -> dict:
"""Extract structured data from a page with DeepDive."""
response = httpx.post(
f"{self.BASE_URL}/deepdive",
headers=self.headers,
json={
"url": url,
"extract": extract
},
timeout=60.0
)
response.raise_for_status()
return response.json()
Step 2: Monitor Competitor Pricing
Track what competitors charge for the same products by scraping their product pages at regular intervals.
# ecommerce_automation/price_monitor.py
import pandas as pd
from datetime import datetime
from client import SearchHiveClient
def monitor_prices(client: SearchHiveClient, product_urls: list[dict]) -> pd.DataFrame:
"""Scrape pricing from competitor product pages.
Args:
client: SearchHiveClient instance
product_urls: List of dicts with 'url', 'competitor', 'product_name'
"""
results = []
for item in product_urls:
try:
# Use DeepDive to extract structured pricing data
response = client.deep_dive(
item["url"],
extract={
"price": {"type": "number", "description": "Product price"},
"title": {"type": "string", "description": "Product title"},
"availability": {"type": "string", "description": "In stock or out of stock"},
"rating": {"type": "string", "description": "Customer rating if available"}
}
)
data = response.get("data", {})
results.append({
"timestamp": datetime.now().isoformat(),
"competitor": item["competitor"],
"product": item["product_name"],
"price": data.get("price"),
"availability": data.get("availability"),
"rating": data.get("rating"),
"url": item["url"]
})
print(f" {item['competitor']}: ${data.get('price', 'N/A')}")
# Respect rate limits
time.sleep(1)
except Exception as e:
print(f" Error scraping {item['url']}: {e}")
results.append({
"timestamp": datetime.now().isoformat(),
"competitor": item["competitor"],
"product": item["product_name"],
"price": None,
"error": str(e)
})
return pd.DataFrame(results)
# Usage example
client = SearchHiveClient("sh_live_your_api_key")
products = [
{"url": "https://example.com/product/wireless-earbuds", "competitor": "CompetitorA", "product_name": "Wireless Earbuds Pro"},
{"url": "https://shop.example.org/earbuds-wireless", "competitor": "CompetitorB", "product_name": "Wireless Earbuds Pro"},
]
df = monitor_prices(client, products)
print(df.to_string())
# Save to CSV for historical tracking
df.to_csv("price_history.csv", index=False, mode="a", header=not pd.io.common.file_exists("price_history.csv"))
Step 3: Track Your SERP Rankings
Monitor where your products rank for target keywords. This helps you understand which product listings need SEO optimization.
# ecommerce_automation/rank_tracker.py
from client import SearchHiveClient
def track_rankings(client: SearchHiveClient, keywords: list[str],
your_domain: str) -> dict:
"""Check SERP positions for your product keywords.
Args:
client: SearchHiveClient instance
keywords: List of product keywords to check
your_domain: Your store domain to find rankings for
"""
rankings = {}
for keyword in keywords:
response = client.swift_search(keyword, num_results=20)
results = response.get("results", [])
position = None
for i, result in enumerate(results):
if your_domain in result.get("url", ""):
position = i + 1
break
rankings[keyword] = position
status = f"Position #{position}" if position else "Not in top 20"
print(f" '{keyword}': {status}")
time.sleep(0.5)
return rankings
# Usage
rankings = track_rankings(
client,
keywords=["wireless earbuds 2026", "best noise cancelling earbuds", "earbuds under $50"],
your_domain="yourstore.com"
)
# Identify keywords that need improvement
missing = [kw for kw, pos in rankings.items() if pos is None]
if missing:
print(f"\nKeywords not ranking in top 20: {missing}")
Step 4: Build a Product Catalog Updater
Automatically pull product information from supplier or competitor pages to keep your catalog fresh.
# ecommerce_automation/catalog_updater.py
import json
from client import SearchHiveClient
def extract_product_catalog(client: SearchHiveClient,
category_url: str) -> list[dict]:
"""Extract all products from a category listing page.
Args:
client: SearchHiveClient instance
category_url: URL of a category or collection page
"""
# Scrape the listing page (often JS-rendered on modern stores)
page = client.scrape_forge(category_url, render_js=True, format="markdown")
# Then deep dive to extract structured product data
structured = client.deep_dive(
category_url,
extract={
"products": {
"type": "array",
"description": "All products listed on this page",
"items": {
"name": {"type": "string"},
"price": {"type": "number"},
"url": {"type": "string"},
"image_url": {"type": "string"},
"description": {"type": "string"}
}
}
}
)
products = structured.get("data", {}).get("products", [])
print(f"Extracted {len(products)} products from {category_url}")
return products
# Usage
products = extract_product_catalog(
client,
"https://example.com/category/wireless-audio"
)
# Save catalog
with open("product_catalog.json", "w") as f:
json.dump(products, f, indent=2)
print(f"Saved {len(products)} products to product_catalog.json")
Step 5: Set Up Automated Scheduling
Run your monitoring pipeline on a schedule. You can use cron expression generator, GitHub Actions, or any task scheduler.
# ecommerce_automation/pipeline.py
import os
import json
from datetime import datetime
from client import SearchHiveClient
from price_monitor import monitor_prices
from rank_tracker import track_rankings
API_KEY = os.environ.get("SEARCHHIVE_API_KEY", "sh_live_...")
# Configuration
PRODUCTS = [
{"url": "https://example.com/product/1", "competitor": "CompA", "product_name": "Widget X"},
{"url": "https://example.org/product/1", "competitor": "CompB", "product_name": "Widget X"},
]
KEYWORDS = [
"buy widget x online",
"widget x best price 2026",
"widget x review",
]
YOUR_DOMAIN = "yourstore.com"
def run_pipeline():
"""Run the full ecommerce automation pipeline."""
client = SearchHiveClient(API_KEY)
print(f"=== Pipeline run: {datetime.now().isoformat()} ===")
# 1. Price monitoring
print("\n[1/2] Price monitoring...")
price_df = monitor_prices(client, PRODUCTS)
# Flag price drops
if not price_df.empty and "price" in price_df.columns:
avg_price = price_df["price"].astype(float).mean()
print(f" Average competitor price: ${avg_price:.2f}")
# 2. Rank tracking
print("\n[2/2] Rank tracking...")
rankings = track_rankings(client, KEYWORDS, YOUR_DOMAIN)
# Save results
run_data = {
"timestamp": datetime.now().isoformat(),
"prices": price_df.to_dict(orient="records"),
"rankings": rankings
}
filename = f"pipeline_run_{datetime.now().strftime('%Y%m%d_%H%M')}.json"
with open(filename, "w") as f:
json.dump(run_data, f, indent=2)
print(f"\nResults saved to {filename}")
return run_data
if __name__ == "__main__":
run_pipeline()
Schedule with cron (Linux):
# Run every 6 hours
0 */6 * * * cd /path/to/ecommerce_automation && python3 pipeline.py >> pipeline.log 2>&1
Step 6: Analyze Trends Over Time
After collecting data for a few days, analyze pricing trends and ranking changes.
import pandas as pd
# Load historical price data
df = pd.read_csv("price_history.csv", parse_dates=["timestamp"])
# Average price by competitor
print(df.groupby("competitor")["price"].agg(["mean", "min", "max"]))
# Price trend over time
daily_avg = df.groupby(df["timestamp"].dt.date)["price"].mean()
print(f"\nPrice trend:\n{daily_avg.tail(7)}")
Common Issues
JavaScript-rendered pages returning empty content: Set render_js=True in ScrapeForge calls. Most modern ecommerce sites (Shopify, WooCommerce, BigCommerce) render product data client-side. ScrapeForge handles this automatically.
Rate limiting: SearchHive enforces rate limits per plan. On the free tier, you get up to 2 concurrent requests. The Builder plan ($49/month) supports significantly higher throughput. Add time.sleep() between requests to stay within limits.
CAPTCHAs and bot detection: ScrapeForge includes proxy rotation and browser fingerprint management. If you still hit blocks, upgrade to the Unicorn plan for residential proxies.
Inconsistent product data: Use DeepDive with explicit extraction schemas instead of regex tester parsing. DeepDive uses AI to understand page structure and extract data reliably even when layouts change.
Complete Code Example
All the modules above work together. Clone this structure:
ecommerce_automation/
client.py # SearchHive API wrapper
price_monitor.py # Competitor price tracking
rank_tracker.py # SERP position monitoring
catalog_updater.py # Product catalog extraction
pipeline.py # Orchestration script
requirements.txt # requests, httpx, pandas
Install and run:
pip install requests httpx pandas
export SEARCHHIVE_API_KEY="sh_live_your_key"
python3 pipeline.py
Next Steps
Once your basic pipeline is running, consider these enhancements:
- Alerting: Send Slack/Telegram notifications when competitors drop prices
- Historical dashboards: Visualize trends with Grafana or a simple matplotlib chart
- API integration: Push extracted data directly to your store's admin API (Shopify, WooCommerce)
- Competitor discovery: Use SwiftSearch to find new competitors ranking for your keywords
SearchHive starts with 500 free credits so you can build and test your entire pipeline before spending a dime. When you are ready to scale, the Builder plan at $49/month gives you 100K credits -- enough for thousands of product scrapes and searches daily.
See also: /blog/complete-guide-to-automation-retry-strategies for making your pipeline more resilient, or /compare/firecrawl for a comparison with other scraping APIs.