Complete Guide to Process Automation with Search APIs
Process automation is the practice of using technology to execute recurring tasks with minimal human intervention. For development teams, data teams, and operations, that means automating the collection, processing, and distribution of web data -- a process that traditionally requires manual scraping scripts, fragile CSS selectors, and constant maintenance.
This guide walks through how a mid-size SaaS company automated 80% of their competitive intelligence workflow using SearchHive's API suite, cutting manual research time from 15 hours per week to under 3.
Key Takeaways
- Process automation with search APIs eliminates repetitive data collection tasks and reduces human error
- SearchHive's three-product stack (SwiftSearch, ScrapeForge, DeepDive) covers the full pipeline from discovery to extraction to analysis
- Python-based automation can reduce competitive research time by 80% with fewer than 200 lines of code
- Pay-as-you-go pricing makes automation accessible without upfront infrastructure investment
- Async/parallel patterns in Python let you process hundreds of pages in minutes
Background
TechPulse, a 50-person SaaS company in the project management space, relied on a manual competitive intelligence process. Every week, a team of three analysts spent 15+ hours researching competitor pricing pages, feature updates, customer reviews, and market positioning. The process involved:
- Visiting 40+ competitor websites individually
- Copying pricing tiers into spreadsheets
- Noting new features from changelogs and release notes
- Compiling quarterly competitive landscape reports
The output was valuable but the process was slow, inconsistent, and didn't scale. New competitors entered the market faster than the team could track them.
The Challenge
TechPulse needed to automate competitive intelligence collection while maintaining data accuracy. Their specific requirements were:
- Daily pricing monitoring across 40+ competitor sites
- Feature tracking from changelogs and documentation pages
- Sentiment analysis from review sites (G2, Capterra, TrustRadius)
- Structured output feeding directly into their internal Notion knowledge base
Previous attempts with in-house scraping failed because competitor sites used anti-bot measures, JavaScript-rendered content, and frequently changed page structures.
The Solution with SearchHive
SearchHive provided three APIs that covered each stage of the automation pipeline:
- SwiftSearch -- Discover relevant pages (competitor docs, reviews, press releases)
- ScrapeForge -- Extract structured data from JavaScript-heavy pages
- DeepDive -- Get AI-generated summaries and comparisons from multiple sources
Here's the architecture they built:
[SwiftSearch] --> Discover competitor pages, reviews, news
|
[ScrapeForge] --> Extract pricing tables, feature lists, DOM content
|
[DeepDive] --> Generate summaries, comparisons, sentiment analysis
|
[Notion API] --> Push structured data into knowledge base
Implementation
Step 1: Competitor Discovery Pipeline
The first step was automating the discovery of relevant pages across competitors.
import asyncio
import httpx
import json
SEARCHHIVE_API_KEY = "your-api-key"
BASE_URL = "https://api.searchhive.dev/v1"
headers = {
"Authorization": f"Bearer {SEARCHHIVE_API_KEY}",
"Content-Type": "application/json"
}
competitors = [
"Monday.com", "Asana", "ClickUp", "Notion",
"Linear", "Basecamp", "Wrike", "Smartsheet"
]
async def search_competitor(client, name, query_type):
# SwiftSearch: find relevant pages for each competitor
payload = {
"query": f"{name} {query_type} 2026",
"num_results": 10,
"include_domains": []
}
resp = await client.post(
f"{BASE_URL}/swift/search",
headers=headers,
json=payload,
timeout=30.0
)
data = resp.json()
return name, query_type, data.get("results", [])
async def discover_all():
queries = ["pricing", "features", "changelog", "vs alternatives"]
tasks = []
async with httpx.AsyncClient() as client:
for comp in competitors:
for q in queries:
tasks.append(search_competitor(client, comp, q))
results = await asyncio.gather(*tasks, return_exceptions=True)
return [r for r in results if not isinstance(r, Exception)]
# Run discovery
pages = asyncio.run(discover_all())
print(f"Discovered {len(pages)} relevant pages")
Step 2: Structured Data Extraction
Once relevant pages were identified, ScrapeForge extracted structured content from JavaScript-rendered competitor pages.
async def extract_pricing(client, url):
# ScrapeForge: extract content from JS-heavy pricing pages
payload = {
"url": url,
"format": "markdown",
"remove_selectors": ["nav", "footer", ".cookie-banner"],
"wait_for": ".pricing-table" # wait for dynamic content
}
resp = await client.post(
f"{BASE_URL}/scrape/extract",
headers=headers,
json=payload,
timeout=60.0
)
return resp.json()
async def process_competitor_pages(pages):
pricing_data = {}
async with httpx.AsyncClient() as client:
# Process pricing pages in parallel (max 5 concurrent)
semaphore = asyncio.Semaphore(5)
async def extract_with_limit(url):
async with semaphore:
try:
result = await extract_pricing(client, url)
return url, result.get("content", "")
except Exception as e:
return url, f"Error: {e}"
pricing_pages = [p for _, q, results in pages
for p in results
if "pricing" in p.get("url", "").lower()]
tasks = [extract_with_limit(p["url"]) for p in pricing_pages]
results = await asyncio.gather(*tasks)
for url, content in results:
pricing_data[url] = content
return pricing_data
Step 3: AI-Powered Analysis
DeepDive synthesized the raw data into actionable competitive intelligence.
async def analyze_competitor(client, competitor_name, raw_data):
# DeepDive: AI analysis of extracted data
payload = {
"query": f"Summarize {competitor_name}'s current pricing structure, "
f"key features, and competitive positioning. Focus on "
f"enterprise vs SMB pricing differences.",
"context": raw_data[:8000], # limit context size
"output_format": "structured"
}
resp = await client.post(
f"{BASE_URL}/deep/analyze",
headers=headers,
json=payload,
timeout=45.0
)
return resp.json()
Step 4: Automation Scheduling
The full pipeline runs daily via a cron expression generator:
import schedule
import time
def run_daily_competitive_intel():
print(f"[{time.strftime('%Y-%m-%d %H:%M')}] Starting daily intel run")
# 1. Discover new pages
pages = asyncio.run(discover_all())
# 2. Extract structured data
pricing = asyncio.run(process_competitor_pages(pages))
# 3. Analyze and summarize
async with httpx.AsyncClient() as client:
analyses = []
for comp in competitors:
comp_data = pricing.get(comp, "")
if comp_data:
analysis = asyncio.run(analyze_competitor(client, comp, comp_data))
analyses.append(analysis)
# 4. Generate report and push to Notion
report = generate_report(analyses)
push_to_notion(report)
print(f"[{time.strftime('%Y-%m-%d %H:%M')}] Daily intel complete")
schedule.every().day.at("06:00").do(run_daily_competitive_intel)
while True:
schedule.run_pending()
time.sleep(60)
Results
After implementing the SearchHive-powered automation pipeline, TechPulse measured the following improvements:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Weekly research hours | 15 hours | 3 hours | 80% reduction |
| Competitors tracked | 12 | 40+ | 3x coverage |
| Data freshness | Weekly | Daily | 7x frequency |
| Report turnaround | 2 days | 4 hours | 12x faster |
| Missed competitor moves | 8/month | 1/month | 87% fewer |
The cost of the SearchHive API usage averaged $47/month across all three products -- far less than the analyst hours it replaced.
Lessons Learned
1. Start with discovery, not extraction. Many teams jump straight to scraping specific pages and then have to redo the work when URLs change. Using SwiftSearch to dynamically discover the right pages each run makes the pipeline resilient to site structure changes.
2. Use ScrapeForge for JS-heavy pages. Traditional HTTP scraping breaks on React, Vue, and Angular sites. ScrapeForge renders JavaScript before extraction, which handled 95% of the competitor pricing pages TechPulse needed.
3. Batch requests with concurrency limits. Processing 40 competitors sequentially would take too long. Using asyncio.Semaphore(5) kept the pipeline fast without overwhelming the API.
4. Cache aggressively. Not every competitor page changes daily. TechPulse added a 48-hour cache for pages that hadn't changed, cutting API calls by 40%.
5. Structure output for downstream systems. The DeepDive structured output format fed directly into their Notion knowledge base without manual formatting.
Getting Started with Process Automation
If you want to replicate this approach, SearchHive's free tier includes 500 API credits to get started. The three-product stack covers the full automation pipeline:
- SwiftSearch for programmatic web search
- ScrapeForge for structured data extraction
- DeepDive for AI-powered analysis
Compare SearchHive with alternatives in our comparison guides or explore the API docs to see which plan fits your automation needs.