How to Build Market Intelligence Platforms — Step-by-Step
Market intelligence platforms aggregate data from multiple sources -- competitors, industry reports, social media, news, and customer feedback -- to give businesses actionable insights. With the right APIs and a bit of Python, you can build one yourself in a weekend.
This tutorial walks through building a market intelligence pipeline that collects competitor data, analyzes trends, and delivers reports -- all using Python and SearchHive's APIs.
Prerequisites
- Python 3.9+ installed
- A SearchHive account (free tier works -- 500 credits)
- Basic familiarity with Python and REST APIs
- pip install requests searchhive
Step 1: Define Your Intelligence Requirements
Before writing code, clarify what data you need. Common market intelligence use cases include:
- Competitor monitoring: Track competitor pricing, product launches, and positioning
- Industry trends: Identify emerging topics, technologies, and customer needs
- Sentiment analysis: Gauge public perception of brands or products
- News monitoring: Stay on top of relevant industry developments
For this tutorial, we'll build a competitor monitoring system that tracks mentions, extracts key data points, and generates weekly reports.
# config.py -- Define your monitoring targets
MONITORING_CONFIG = {
"competitors": [
{"name": "Competitor A", "keywords": ["Competitor A pricing", "Competitor A features", "Competitor A review"]},
{"name": "Competitor B", "keywords": ["Competitor B launch", "Competitor B update", "Competitor B vs"]},
],
"industry_terms": ["market intelligence API", "web scraping trends 2026", "SERP API comparison"],
"report_frequency": "weekly",
}
Step 2: Set Up the SearchHive Client
SearchHive provides three APIs that map directly to market intelligence needs:
- SwiftSearch: Discover mentions across search engines
- ScrapeForge: Extract data from specific pages
- DeepDive: Analyze content structure and entities
import os
from searchhive import SwiftSearch, ScrapeForge, DeepDive
API_KEY = os.environ.get("SEARCHHIVE_API_KEY", "your-api-key")
# Initialize all three APIs
search_client = SwiftSearch(api_key=API_KEY)
scraper = ScrapeForge(api_key=API_KEY)
analyzer = DeepDive(api_key=API_KEY)
Step 3: Collect Competitor Mentions via Search
Use SwiftSearch to find competitor mentions across the web:
def collect_mentions(keyword, num_results=20):
"""Search for mentions of a competitor keyword."""
results = search_client.search(
query=keyword,
engine="google",
num=num_results,
country="us",
recency="week" # Only results from the past week
)
mentions = []
for r in results:
mentions.append({
"title": r.title,
"url": r.url,
"snippet": r.snippet,
"date": getattr(r, "date", None),
})
return mentions
# Collect mentions for all competitors
all_mentions = []
for comp in MONITORING_CONFIG["competitors"]:
for keyword in comp["keywords"]:
mentions = collect_mentions(keyword)
for m in mentions:
m["competitor"] = comp["name"]
m["keyword"] = keyword
all_mentions.extend(mentions)
print(f"Collected {len(all_mentions)} mentions")
Step 4: Scrape Relevant Pages for Detailed Data
Search snippets give you a summary. For deeper analysis, scrape the top pages:
def scrape_top_pages(mentions, max_pages=10):
"""Scrape the top-ranked pages for full content."""
seen_urls = set()
scraped = []
for mention in mentions[:max_pages]:
url = mention["url"]
if url in seen_urls:
continue
seen_urls.add(url)
try:
page = scraper.scrape(url, format="markdown")
scraped.append({
"url": url,
"title": mention["title"],
"content": page.content[:5000], # First 5K chars
"competitor": mention["competitor"],
})
except Exception as e:
print(f"Failed to scrape {url}: {e}")
return scraped
scraped_pages = scrape_top_pages(all_mentions)
Step 5: Analyze Content with DeepDive
Extract structured intelligence from scraped pages:
def analyze_page(url):
"""Use DeepDive to extract key entities and summaries."""
analysis = analyzer.analyze(
url=url,
extract=["title", "summary", "key_entities", "sentiment"]
)
return {
"title": analysis.title,
"summary": analysis.summary,
"entities": analysis.key_entities,
"sentiment": getattr(analysis, "sentiment", "neutral"),
}
# Analyze top scraped pages
intel_reports = []
for page in scraped_pages[:5]: # Limit to conserve credits
try:
intel = analyze_page(page["url"])
intel["url"] = page["url"]
intel["competitor"] = page["competitor"]
intel_reports.append(intel)
except Exception as e:
print(f"Analysis failed for {page['url']}: {e}")
Step 6: Aggregate and Deduplicate Results
Raw data needs cleaning before it's useful. Remove duplicates, normalize formats, and rank by relevance:
from collections import Counter
def aggregate_intelligence(mentions, intel_reports):
"""Combine search mentions and deep analysis into a report."""
# Count mentions per competitor
mention_counts = Counter(m["competitor"] for m in mentions)
# Identify trending entities
all_entities = []
for report in intel_reports:
all_entities.extend(report.get("entities", []))
trending_entities = Counter(all_entities).most_common(10)
return {
"total_mentions": len(mentions),
"competitor_breakdown": dict(mention_counts),
"trending_entities": trending_entities,
"detailed_reports": intel_reports,
}
report = aggregate_intelligence(all_mentions, intel_reports)
Step 7: Generate a Weekly Report
Export the aggregated intelligence to a readable format:
import json
from datetime import datetime
def generate_weekly_report(aggregate_data, filename="market-intel-report.json"):
"""Save the weekly market intelligence report."""
report = {
"generated_at": datetime.utcnow().isoformat(),
"period": "weekly",
"summary": {
"total_mentions": aggregate_data["total_mentions"],
"competitors_tracked": list(aggregate_data["competitor_breakdown"].keys()),
"top_entities": [
{"entity": e, "count": c}
for e, c in aggregate_data["trending_entities"]
],
},
"competitor_mentions": aggregate_data["competitor_breakdown"],
"detailed_analysis": aggregate_data["detailed_reports"],
}
with open(filename, "w") as f:
json.dump(report, f, indent=2, default=str)
print(f"Report saved to {filename}")
return report
weekly_report = generate_weekly_report(report)
Step 8: Schedule Automatic Runs
Automate the pipeline to run weekly using cron expression generator or a task scheduler:
# Add to crontab for weekly runs (every Monday at 9 AM)
0 9 * * 1 cd /path/to/project && python3 market_intel.py >> logs/intel.log 2>&1
Or use Python's schedule library for more control:
import schedule
import time
def weekly_job():
mentions = []
for comp in MONITORING_CONFIG["competitors"]:
for kw in comp["keywords"]:
mentions.extend(collect_mentions(kw))
scraped = scrape_top_pages(mentions)
intel = []
for page in scraped[:5]:
try:
intel.append(analyze_page(page["url"]))
except:
continue
aggregate = aggregate_intelligence(mentions, intel)
generate_weekly_report(aggregate)
schedule.every().monday.at("09:00").do(weekly_job)
while True:
schedule.run_pending()
time.sleep(3600)
Common Issues and Solutions
Rate limiting: SearchHive returns 429 when you exceed plan limits. Implement exponential backoff (see API security best practices).
Blocking: Some sites block automated scrapers. Use ScrapeForge's proxy rotation and headless browser rendering to improve success rates.
Data quality: Not all search results are relevant. Filter by recency, add negative keywords, and use DeepDive's sentiment scoring to prioritize actionable intelligence.
Next Steps
Once your basic pipeline is running, consider extending it:
- Add social media monitoring: Scrape LinkedIn, Twitter, and Reddit for competitor mentions
- Pricing tracking: Scrape competitor pricing pages weekly and build a price history database
- Alert system: Send notifications when unusual activity is detected (spikes in mentions, negative sentiment)
- Dashboard: Build a simple Streamlit or Flask dashboard to visualize trends
Get Started with SearchHive
SearchHive's free tier gives you 500 credits to start building your market intelligence pipeline immediately. The $9/month Starter plan includes 5K credits -- enough for weekly competitor monitoring across multiple targets.
Sign up for free and explore the API documentation to get started.
Related: /blog/best-web-scraping-api-for-competitive-intelligence | /blog/how-to-track-competitor-pricing-with-python