Knowing when competitors update their pricing, launch new features, or change their messaging is a significant competitive advantage. But manually checking competitor websites is tedious and unreliable -- changes slip through the cracks constantly.
Automated website monitoring solves this by checking pages on a schedule and alerting you when content changes. This guide shows you how to build a competitor monitoring system using SearchHive APIs, with JavaScript and Python implementations.
Key Takeaways
- Automated monitoring catches changes within minutes, not days or weeks of manual checking.
- SearchHive ScrapeForge handles JavaScript-rendered pages that simple
fetch()calls miss entirely. - Hash-based change detection is fast but misses subtle changes; text-diff catches everything but is noisier.
- A complete monitoring system needs scheduling, change detection, alerting, and data storage.
Prerequisites
- Node.js 18+ or Python 3.8+
- A SearchHive API key (free signup with 500 credits)
- (Optional) A Slack webhook or email service for alerts
Step 1: Choose Your Monitoring Strategy
There are two main approaches to detecting changes:
Hash-based detection (fast, simple)
Store a hash of the page content. If the hash changes, the page changed. This catches any modification but doesn't tell you what changed.
Text-diff detection (detailed)
Store the full text and compare against previous versions. This shows exactly what was added, removed, or modified, but generates more alerts for minor changes (timestamps, cookie banners, etc.).
For most teams, hash-based detection with selective text-diff on detected changes is the sweet spot.
Step 2: Fetch Page Content with SearchHive
Many competitor websites use JavaScript frameworks. A simple fetch() or requests.get() returns empty HTML shells. SearchHive's ScrapeForge renders JavaScript:
// monitor.js -- Fetch a competitor page with JS rendering
const API_KEY = process.env.SEARCHHIVE_API_KEY;
const BASE_URL = "https://api.searchhive.dev/v1";
async function fetchPage(url) {
const response = await fetch(BASE_URL + "/scrape", {
method: "POST",
headers: {
"Authorization": "Bearer " + API_KEY,
"Content-Type": "application/json"
},
body: JSON.stringify({
url: url,
format: "text",
render_js: true
})
});
if (!response.ok) {
throw new Error("Fetch failed: " + response.status);
}
const data = await response.json();
return data.content || data.markdown || data.text || "";
}
// Test it
fetchPage("https://competitor.com/pricing").then(function(content) {
console.log("Fetched " + content.length + " characters");
});
# monitor.py -- Python equivalent
import requests
API_KEY = "your-api-key"
BASE_URL = "https://api.searchhive.dev/v1"
def fetch_page(url):
# Fetch a page with JavaScript rendering enabled.
response = requests.post(
BASE_URL + "/scrape",
headers={"Authorization": "Bearer " + API_KEY},
json={"url": url, "format": "text", "render_js": True}
)
response.raise_for_status()
data = response.json()
return data.get("content", data.get("markdown", ""))
Step 3: Implement Change Detection
// change-detector.js
const crypto = require("crypto");
function computeHash(content) {
return crypto.createHash("sha256").update(content).digest("hex");
}
// In-memory store (replace with database in production)
const snapshots = new Map();
function detectChange(url, content) {
const hash = computeHash(content);
const previous = snapshots.get(url);
snapshots.set(url, { hash: hash, content: content, timestamp: new Date().toISOString() });
if (!previous) {
return { changed: false, firstSeen: true };
}
if (previous.hash !== hash) {
return {
changed: true,
previousHash: previous.hash,
newHash: hash,
previousTimestamp: previous.timestamp
};
}
return { changed: false };
}
// Usage
async function checkPage(url) {
const content = await fetchPage(url);
const result = detectChange(url, content);
if (result.firstSeen) {
console.log("[NEW] Baseline captured for " + url);
} else if (result.changed) {
console.log("[CHANGE] " + url + " was modified since " + result.previousTimestamp);
} else {
console.log("[OK] " + url + " unchanged");
}
return result;
}
# change_detector.py
import hashlib
import json
from pathlib import Path
from datetime import datetime
SNAPSHOTS_DIR = Path("./snapshots")
def compute_hash(content):
# Compute SHA-256 hash of page content.
return hashlib.sha256(content.encode()).hexdigest()
def detect_change(url, content):
# Detect if page content has changed.
# Uses file-based storage for persistence across runs.
SNAPSHOTS_DIR.mkdir(exist_ok=True)
# Create a safe filename from URL
safe_name = hashlib.md5(url.encode()).hexdigest()
snapshot_file = SNAPSHOTS_DIR / (safe_name + ".json")
current_hash = compute_hash(content)
if snapshot_file.exists():
with open(snapshot_file) as f:
previous = json.load(f)
if previous["hash"] != current_hash:
# Update snapshot
with open(snapshot_file, "w") as f:
json.dump({
"hash": current_hash,
"content": content,
"url": url,
"timestamp": datetime.now().isoformat()
}, f)
return {
"changed": True,
"url": url,
"previous_hash": previous["hash"],
"new_hash": current_hash,
"previous_timestamp": previous["timestamp"]
}
return {"changed": False, "url": url}
# First time seeing this URL
with open(snapshot_file, "w") as f:
json.dump({
"hash": current_hash,
"content": content,
"url": url,
"timestamp": datetime.now().isoformat()
}, f)
return {"changed": False, "url": url, "first_seen": True}
Step 4: Focus on Specific Content Sections
Most page changes are irrelevant -- cookie banners, analytics scripts, footer timestamps. Focus your monitoring on the sections that matter:
def detect_section_change(url, sections_to_watch):
# Monitor specific sections of a page instead of the whole thing.
# Args: url, sections_to_watch as dict {name: description}
response = requests.post(
BASE_URL + "/deepdive",
headers={"Authorization": "Bearer " + API_KEY},
json={
"url": url,
"prompt": (
"Extract the following sections from this page and return "
"each section's text content: " +
", ".join(name + " (" + desc + ")" for name, desc in sections_to_watch.items())
)
}
)
data = response.json()
changes = []
for section_name in sections_to_watch:
section_content = data.get(section_name, "")
result = detect_change(url + "#" + section_name, section_content)
if result.get("changed"):
changes.append({
"section": section_name,
"url": url,
"timestamp": result.get("previous_timestamp", "")
})
return changes
# Example: monitor only the parts that matter
changes = detect_section_change(
url="https://competitor.com",
sections_to_watch={
"pricing": "the pricing table section with plan names and prices",
"hero": "the main hero headline and subheadline text",
"features": "the features list or grid section"
}
)
if changes:
print("Changes detected:")
for c in changes:
print(" Section '" + c["section"] + "' on " + c["url"])
Step 5: Set Up Scheduled Monitoring
Use cron (Linux/macOS) or Task Scheduler (Windows) to run checks automatically:
# Run every 6 hours
0 */6 * * * cd /path/to/monitor && python3 monitor.py >> monitor.log 2>&1
# monitor.py -- Complete monitoring script
import requests
import json
import hashlib
from pathlib import Path
from datetime import datetime
API_KEY = "your-api-key"
BASE_URL = "https://api.searchhive.dev/v1"
SNAPSHOTS_DIR = Path("./snapshots")
# Configure what to monitor
WATCHLIST = [
{"url": "https://competitor-a.com/pricing", "name": "Competitor A Pricing"},
{"url": "https://competitor-b.com/features", "name": "Competitor B Features"},
{"url": "https://competitor-c.com", "name": "Competitor C Homepage"}
]
def fetch_page(url):
response = requests.post(
BASE_URL + "/scrape",
headers={"Authorization": "Bearer " + API_KEY},
json={"url": url, "format": "text", "render_js": True}
)
response.raise_for_status()
data = response.json()
return data.get("content", data.get("markdown", ""))
def send_alert(name, url, result):
# Send an alert when a change is detected.
# Replace with your notification system.
message = (
"CHANGE DETECTED: " + name + "\n"
"URL: " + url + "\n"
"Previous: " + result.get("previous_timestamp", "N/A")
)
print(message)
def run_monitoring():
# Check all URLs in the watchlist.
results = []
for target in WATCHLIST:
try:
content = fetch_page(target["url"])
result = detect_change(target["url"], content)
result["name"] = target["name"]
results.append(result)
if result.get("changed"):
send_alert(target["name"], target["url"], result)
except Exception as e:
results.append({"name": target["name"], "url": target["url"], "error": str(e)})
timestamp = datetime.now().isoformat()
print(json.dumps({"timestamp": timestamp, "results": results}, indent=2))
return results
if __name__ == "__main__":
run_monitoring()
Step 6: Track Specific Competitor Signals
For competitive intelligence, you want to track specific signals:
def track_pricing_changes(url):
# Track pricing page changes with structured extraction.
response = requests.post(
BASE_URL + "/deepdive",
headers={"Authorization": "Bearer " + API_KEY},
json={
"url": url,
"prompt": (
"Extract all pricing plans with their: name, monthly price, "
"annual price (if shown), and list of included features. "
"Return as a JSON array of objects."
)
}
)
return response.json()
def track_new_features(url):
# Detect new feature announcements on a product page.
response = requests.post(
BASE_URL + "/deepdive",
headers={"Authorization": "Bearer " + API_KEY},
json={
"url": url,
"prompt": (
"Extract all feature names mentioned on this page. "
"Return as a JSON array of strings."
)
}
)
return response.json()
Common Issues
False positives from dynamic content
Cookie banners, live chat widgets, and analytics scripts change on every load. Solutions:
- Monitor specific sections only (Step 4) instead of full pages
- Strip known dynamic elements before hashing
- Use SearchHive's DeepDive to extract just the content sections you care about
Sites blocking your requests
SearchHive handles proxy rotation and anti-bot measures, but if you're hitting the same site hundreds of times daily, consider:
- Reducing check frequency
- Using different endpoints or user paths
- Contacting SearchHive for custom proxy configuration on higher plans
Storage growing unbounded
Archive snapshots with a retention policy:
import time
def cleanup_old_snapshots(days=90):
# Remove snapshots older than the retention period.
cutoff = datetime.now().timestamp() - (days * 86400)
for f in SNAPSHOTS_DIR.glob("*.json"):
if f.stat().st_mtime < cutoff:
f.unlink()
Next Steps
- Start monitoring today: Sign up at searchhive.dev with 500 free credits and set up your first competitor watchlist.
- Scale to dozens of competitors: The Builder plan ($49/mo) gives you 100K credits -- enough to monitor hundreds of pages every hour.
- Read the docs: Visit searchhive.dev/docs for the complete ScrapeForge and DeepDive API reference.
Related: /blog/building-ai-agents-with-web-scraping-apis | /compare/firecrawl | /compare/scrapingbee