How to Build Market Intelligence Platforms — Step-by-Step

Market intelligence platforms aggregate data from multiple sources -- competitors, industry reports, social media, news, and customer feedback -- to give businesses actionable insights. With the right APIs and a bit of Python, you can build one yourself in a weekend.

This tutorial walks through building a market intelligence pipeline that collects competitor data, analyzes trends, and delivers reports -- all using Python and SearchHive's APIs.

Prerequisites

Python 3.9+ installed
A SearchHive account (free tier works -- 500 credits)
Basic familiarity with Python and REST APIs
pip install requests searchhive

Step 1: Define Your Intelligence Requirements

Before writing code, clarify what data you need. Common market intelligence use cases include:

Competitor monitoring: Track competitor pricing, product launches, and positioning
Industry trends: Identify emerging topics, technologies, and customer needs
Sentiment analysis: Gauge public perception of brands or products
News monitoring: Stay on top of relevant industry developments

For this tutorial, we'll build a competitor monitoring system that tracks mentions, extracts key data points, and generates weekly reports.

# config.py -- Define your monitoring targets
MONITORING_CONFIG = {
    "competitors": [
        {"name": "Competitor A", "keywords": ["Competitor A pricing", "Competitor A features", "Competitor A review"]},
        {"name": "Competitor B", "keywords": ["Competitor B launch", "Competitor B update", "Competitor B vs"]},
    ],
    "industry_terms": ["market intelligence API", "web scraping trends 2026", "SERP API comparison"],
    "report_frequency": "weekly",
}

Step 2: Set Up the SearchHive Client

SearchHive provides three APIs that map directly to market intelligence needs:

SwiftSearch: Discover mentions across search engines
ScrapeForge: Extract data from specific pages
DeepDive: Analyze content structure and entities

import os
from searchhive import SwiftSearch, ScrapeForge, DeepDive

API_KEY = os.environ.get("SEARCHHIVE_API_KEY", "your-api-key")

# Initialize all three APIs
search_client = SwiftSearch(api_key=API_KEY)
scraper = ScrapeForge(api_key=API_KEY)
analyzer = DeepDive(api_key=API_KEY)

Step 3: Collect Competitor Mentions via Search

Use SwiftSearch to find competitor mentions across the web:

def collect_mentions(keyword, num_results=20):
    """Search for mentions of a competitor keyword."""
    results = search_client.search(
        query=keyword,
        engine="google",
        num=num_results,
        country="us",
        recency="week"  # Only results from the past week
    )
    mentions = []
    for r in results:
        mentions.append({
            "title": r.title,
            "url": r.url,
            "snippet": r.snippet,
            "date": getattr(r, "date", None),
        })
    return mentions

# Collect mentions for all competitors
all_mentions = []
for comp in MONITORING_CONFIG["competitors"]:
    for keyword in comp["keywords"]:
        mentions = collect_mentions(keyword)
        for m in mentions:
            m["competitor"] = comp["name"]
            m["keyword"] = keyword
        all_mentions.extend(mentions)

print(f"Collected {len(all_mentions)} mentions")

Step 4: Scrape Relevant Pages for Detailed Data

Search snippets give you a summary. For deeper analysis, scrape the top pages:

def scrape_top_pages(mentions, max_pages=10):
    """Scrape the top-ranked pages for full content."""
    seen_urls = set()
    scraped = []
    for mention in mentions[:max_pages]:
        url = mention["url"]
        if url in seen_urls:
            continue
        seen_urls.add(url)
        try:
            page = scraper.scrape(url, format="markdown")
            scraped.append({
                "url": url,
                "title": mention["title"],
                "content": page.content[:5000],  # First 5K chars
                "competitor": mention["competitor"],
            })
        except Exception as e:
            print(f"Failed to scrape {url}: {e}")
    return scraped

scraped_pages = scrape_top_pages(all_mentions)

Step 5: Analyze Content with DeepDive

Extract structured intelligence from scraped pages:

def analyze_page(url):
    """Use DeepDive to extract key entities and summaries."""
    analysis = analyzer.analyze(
        url=url,
        extract=["title", "summary", "key_entities", "sentiment"]
    )
    return {
        "title": analysis.title,
        "summary": analysis.summary,
        "entities": analysis.key_entities,
        "sentiment": getattr(analysis, "sentiment", "neutral"),
    }

# Analyze top scraped pages
intel_reports = []
for page in scraped_pages[:5]:  # Limit to conserve credits
    try:
        intel = analyze_page(page["url"])
        intel["url"] = page["url"]
        intel["competitor"] = page["competitor"]
        intel_reports.append(intel)
    except Exception as e:
        print(f"Analysis failed for {page['url']}: {e}")

Step 6: Aggregate and Deduplicate Results

Raw data needs cleaning before it's useful. Remove duplicates, normalize formats, and rank by relevance:

from collections import Counter

def aggregate_intelligence(mentions, intel_reports):
    """Combine search mentions and deep analysis into a report."""
    # Count mentions per competitor
    mention_counts = Counter(m["competitor"] for m in mentions)

    # Identify trending entities
    all_entities = []
    for report in intel_reports:
        all_entities.extend(report.get("entities", []))
    trending_entities = Counter(all_entities).most_common(10)

    return {
        "total_mentions": len(mentions),
        "competitor_breakdown": dict(mention_counts),
        "trending_entities": trending_entities,
        "detailed_reports": intel_reports,
    }

report = aggregate_intelligence(all_mentions, intel_reports)

Step 7: Generate a Weekly Report

Export the aggregated intelligence to a readable format:

import json
from datetime import datetime

def generate_weekly_report(aggregate_data, filename="market-intel-report.json"):
    """Save the weekly market intelligence report."""
    report = {
        "generated_at": datetime.utcnow().isoformat(),
        "period": "weekly",
        "summary": {
            "total_mentions": aggregate_data["total_mentions"],
            "competitors_tracked": list(aggregate_data["competitor_breakdown"].keys()),
            "top_entities": [
                {"entity": e, "count": c}
                for e, c in aggregate_data["trending_entities"]
            ],
        },
        "competitor_mentions": aggregate_data["competitor_breakdown"],
        "detailed_analysis": aggregate_data["detailed_reports"],
    }

    with open(filename, "w") as f:
        json.dump(report, f, indent=2, default=str)

    print(f"Report saved to {filename}")
    return report

weekly_report = generate_weekly_report(report)

Step 8: Schedule Automatic Runs

Automate the pipeline to run weekly using cron expression generator or a task scheduler:

# Add to crontab for weekly runs (every Monday at 9 AM)
0 9 * * 1 cd /path/to/project && python3 market_intel.py >> logs/intel.log 2>&1

Or use Python's schedule library for more control:

import schedule
import time

def weekly_job():
    mentions = []
    for comp in MONITORING_CONFIG["competitors"]:
        for kw in comp["keywords"]:
            mentions.extend(collect_mentions(kw))

    scraped = scrape_top_pages(mentions)
    intel = []
    for page in scraped[:5]:
        try:
            intel.append(analyze_page(page["url"]))
        except:
            continue

    aggregate = aggregate_intelligence(mentions, intel)
    generate_weekly_report(aggregate)

schedule.every().monday.at("09:00").do(weekly_job)

while True:
    schedule.run_pending()
    time.sleep(3600)

Common Issues and Solutions

Rate limiting: SearchHive returns 429 when you exceed plan limits. Implement exponential backoff (see API security best practices).

Blocking: Some sites block automated scrapers. Use ScrapeForge's proxy rotation and headless browser rendering to improve success rates.

Data quality: Not all search results are relevant. Filter by recency, add negative keywords, and use DeepDive's sentiment scoring to prioritize actionable intelligence.

Next Steps

Once your basic pipeline is running, consider extending it:

Add social media monitoring: Scrape LinkedIn, Twitter, and Reddit for competitor mentions
Pricing tracking: Scrape competitor pricing pages weekly and build a price history database
Alert system: Send notifications when unusual activity is detected (spikes in mentions, negative sentiment)
Dashboard: Build a simple Streamlit or Flask dashboard to visualize trends

Get Started with SearchHive

SearchHive's free tier gives you 500 credits to start building your market intelligence pipeline immediately. The $9/month Starter plan includes 5K credits -- enough for weekly competitor monitoring across multiple targets.

How to Build Market Intelligence Platforms — Step-by-Step

AI-Powered Research

How to Build Market Intelligence Platforms — Step-by-Step

Prerequisites

Step 1: Define Your Intelligence Requirements

Step 2: Set Up the SearchHive Client

Step 3: Collect Competitor Mentions via Search

Step 4: Scrape Relevant Pages for Detailed Data

Step 5: Analyze Content with DeepDive

Step 6: Aggregate and Deduplicate Results

Step 7: Generate a Weekly Report

Step 8: Schedule Automatic Runs

Common Issues and Solutions

Next Steps

Get Started with SearchHive

Keywords

RELATED ARTICLES

SearchHive vs Diffbot — Pricing Compared

How to Use a Search API for Developers — Step-by-Step

SearchHive vs Zenserp — Pricing Compared

BUILD WITH SEARCHHIVE