Complete Guide to Process Automation with Search APIs

Process automation is the practice of using technology to execute recurring tasks with minimal human intervention. For development teams, data teams, and operations, that means automating the collection, processing, and distribution of web data -- a process that traditionally requires manual scraping scripts, fragile CSS selectors, and constant maintenance.

This guide walks through how a mid-size SaaS company automated 80% of their competitive intelligence workflow using SearchHive's API suite, cutting manual research time from 15 hours per week to under 3.

Key Takeaways

Process automation with search APIs eliminates repetitive data collection tasks and reduces human error
SearchHive's three-product stack (SwiftSearch, ScrapeForge, DeepDive) covers the full pipeline from discovery to extraction to analysis
Python-based automation can reduce competitive research time by 80% with fewer than 200 lines of code
Pay-as-you-go pricing makes automation accessible without upfront infrastructure investment
Async/parallel patterns in Python let you process hundreds of pages in minutes

Background

TechPulse, a 50-person SaaS company in the project management space, relied on a manual competitive intelligence process. Every week, a team of three analysts spent 15+ hours researching competitor pricing pages, feature updates, customer reviews, and market positioning. The process involved:

Visiting 40+ competitor websites individually
Copying pricing tiers into spreadsheets
Noting new features from changelogs and release notes
Compiling quarterly competitive landscape reports

The output was valuable but the process was slow, inconsistent, and didn't scale. New competitors entered the market faster than the team could track them.

The Challenge

TechPulse needed to automate competitive intelligence collection while maintaining data accuracy. Their specific requirements were:

Daily pricing monitoring across 40+ competitor sites
Feature tracking from changelogs and documentation pages
Sentiment analysis from review sites (G2, Capterra, TrustRadius)
Structured output feeding directly into their internal Notion knowledge base

Previous attempts with in-house scraping failed because competitor sites used anti-bot measures, JavaScript-rendered content, and frequently changed page structures.

The Solution with SearchHive

SearchHive provided three APIs that covered each stage of the automation pipeline:

SwiftSearch -- Discover relevant pages (competitor docs, reviews, press releases)
ScrapeForge -- Extract structured data from JavaScript-heavy pages
DeepDive -- Get AI-generated summaries and comparisons from multiple sources

Here's the architecture they built:

[SwiftSearch] --> Discover competitor pages, reviews, news
      |
[ScrapeForge] --> Extract pricing tables, feature lists, DOM content
      |
[DeepDive] --> Generate summaries, comparisons, sentiment analysis
      |
[Notion API] --> Push structured data into knowledge base

Implementation

Step 1: Competitor Discovery Pipeline

The first step was automating the discovery of relevant pages across competitors.

import asyncio
import httpx
import json

SEARCHHIVE_API_KEY = "your-api-key"
BASE_URL = "https://api.searchhive.dev/v1"

headers = {
    "Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
    "Content-Type": "application/json"
}

competitors = [
    "Monday.com", "Asana", "ClickUp", "Notion",
    "Linear", "Basecamp", "Wrike", "Smartsheet"
]

async def search_competitor(client, name, query_type):
    # SwiftSearch: find relevant pages for each competitor
    payload = {
        "query": f"{name} {query_type} 2026",
        "num_results": 10,
        "include_domains": []
    }
    resp = await client.post(
        f"{BASE_URL}/swift/search",
        headers=headers,
        json=payload,
        timeout=30.0
    )
    data = resp.json()
    return name, query_type, data.get("results", [])

async def discover_all():
    queries = ["pricing", "features", "changelog", "vs alternatives"]
    tasks = []
    async with httpx.AsyncClient() as client:
        for comp in competitors:
            for q in queries:
                tasks.append(search_competitor(client, comp, q))
        results = await asyncio.gather(*tasks, return_exceptions=True)
    
    return [r for r in results if not isinstance(r, Exception)]

# Run discovery
pages = asyncio.run(discover_all())
print(f"Discovered {len(pages)} relevant pages")

Step 2: Structured Data Extraction

Once relevant pages were identified, ScrapeForge extracted structured content from JavaScript-rendered competitor pages.

async def extract_pricing(client, url):
    # ScrapeForge: extract content from JS-heavy pricing pages
    payload = {
        "url": url,
        "format": "markdown",
        "remove_selectors": ["nav", "footer", ".cookie-banner"],
        "wait_for": ".pricing-table"  # wait for dynamic content
    }
    resp = await client.post(
        f"{BASE_URL}/scrape/extract",
        headers=headers,
        json=payload,
        timeout=60.0
    )
    return resp.json()

async def process_competitor_pages(pages):
    pricing_data = {}
    async with httpx.AsyncClient() as client:
        # Process pricing pages in parallel (max 5 concurrent)
        semaphore = asyncio.Semaphore(5)
        
        async def extract_with_limit(url):
            async with semaphore:
                try:
                    result = await extract_pricing(client, url)
                    return url, result.get("content", "")
                except Exception as e:
                    return url, f"Error: {e}"
        
        pricing_pages = [p for _, q, results in pages 
                        for p in results 
                        if "pricing" in p.get("url", "").lower()]
        
        tasks = [extract_with_limit(p["url"]) for p in pricing_pages]
        results = await asyncio.gather(*tasks)
        
        for url, content in results:
            pricing_data[url] = content
    
    return pricing_data

Step 3: AI-Powered Analysis

DeepDive synthesized the raw data into actionable competitive intelligence.

async def analyze_competitor(client, competitor_name, raw_data):
    # DeepDive: AI analysis of extracted data
    payload = {
        "query": f"Summarize {competitor_name}'s current pricing structure, "
                f"key features, and competitive positioning. Focus on "
                f"enterprise vs SMB pricing differences.",
        "context": raw_data[:8000],  # limit context size
        "output_format": "structured"
    }
    resp = await client.post(
        f"{BASE_URL}/deep/analyze",
        headers=headers,
        json=payload,
        timeout=45.0
    )
    return resp.json()

Step 4: Automation Scheduling

The full pipeline runs daily via a cron expression generator:

import schedule
import time

def run_daily_competitive_intel():
    print(f"[{time.strftime('%Y-%m-%d %H:%M')}] Starting daily intel run")
    
    # 1. Discover new pages
    pages = asyncio.run(discover_all())
    
    # 2. Extract structured data
    pricing = asyncio.run(process_competitor_pages(pages))
    
    # 3. Analyze and summarize
    async with httpx.AsyncClient() as client:
        analyses = []
        for comp in competitors:
            comp_data = pricing.get(comp, "")
            if comp_data:
                analysis = asyncio.run(analyze_competitor(client, comp, comp_data))
                analyses.append(analysis)
    
    # 4. Generate report and push to Notion
    report = generate_report(analyses)
    push_to_notion(report)
    
    print(f"[{time.strftime('%Y-%m-%d %H:%M')}] Daily intel complete")

schedule.every().day.at("06:00").do(run_daily_competitive_intel)

while True:
    schedule.run_pending()
    time.sleep(60)

Results

After implementing the SearchHive-powered automation pipeline, TechPulse measured the following improvements:

Metric	Before	After	Improvement
Weekly research hours	15 hours	3 hours	80% reduction
Competitors tracked	12	40+	3x coverage
Data freshness	Weekly	Daily	7x frequency
Report turnaround	2 days	4 hours	12x faster
Missed competitor moves	8/month	1/month	87% fewer

The cost of the SearchHive API usage averaged $47/month across all three products -- far less than the analyst hours it replaced.

Lessons Learned

1. Start with discovery, not extraction. Many teams jump straight to scraping specific pages and then have to redo the work when URLs change. Using SwiftSearch to dynamically discover the right pages each run makes the pipeline resilient to site structure changes.

2. Use ScrapeForge for JS-heavy pages. Traditional HTTP scraping breaks on React, Vue, and Angular sites. ScrapeForge renders JavaScript before extraction, which handled 95% of the competitor pricing pages TechPulse needed.

3. Batch requests with concurrency limits. Processing 40 competitors sequentially would take too long. Using asyncio.Semaphore(5) kept the pipeline fast without overwhelming the API.

4. Cache aggressively. Not every competitor page changes daily. TechPulse added a 48-hour cache for pages that hadn't changed, cutting API calls by 40%.

5. Structure output for downstream systems. The DeepDive structured output format fed directly into their Notion knowledge base without manual formatting.

Getting Started with Process Automation

If you want to replicate this approach, SearchHive's free tier includes 500 API credits to get started. The three-product stack covers the full automation pipeline:

SwiftSearch for programmatic web search
ScrapeForge for structured data extraction
DeepDive for AI-powered analysis

Compare SearchHive with alternatives in our comparison guides or explore the API docs to see which plan fits your automation needs.

Complete Guide to Process Automation with Search APIs

AI-Powered Research

Complete Guide to Process Automation with Search APIs

Key Takeaways

Background

The Challenge

The Solution with SearchHive

Implementation

Step 1: Competitor Discovery Pipeline

Step 2: Structured Data Extraction

Step 3: AI-Powered Analysis

Step 4: Automation Scheduling

Results

Lessons Learned

Getting Started with Process Automation

Keywords

RELATED ARTICLES

Complete Guide to AI Agent Tools and APIs

Complete Guide to REST API Best Practices

Best Web Scraping Ethics Tools (2025)

BUILD WITH SEARCHHIVE