How to Automate Finance Workflows — Step-by-Step Tutorial
Finance teams spend countless hours on repetitive tasks: pulling data from multiple sources, reconciling accounts, generating reports, and tracking expenses. Automating these workflows saves time, reduces errors, and frees your team to focus on analysis instead of data entry.
This tutorial shows you how to build automated finance workflows using Python and SearchHive's web data APIs — from collecting financial data to generating reports automatically.
Key Takeaways
- Most finance automation starts with data collection — extracting data from websites, APIs, and internal systems
- SearchHive's SwiftSearch and ScrapeForge APIs handle financial website scraping with built-in anti-bot bypass
- Python + FastAPI gives you a solid foundation for finance automation pipelines
- Scheduling, error handling, and audit trails are essential for production finance workflows
- Always consider compliance (GDPR, SOX, data retention policies) when automating finance processes
Prerequisites
- Python 3.9+ installed
- A SearchHive API key (free tier — 1,000 requests/month)
- Familiarity with Python basics
- Access to the financial websites or data sources you want to automate
pip install requests pandas fastapi uvicorn schedule python-dotopenpyxl
Step 1: Identify Automatable Finance Tasks
Common finance tasks suitable for automation:
| Task | Data Source | Automation Method |
|---|---|---|
| Competitor pricing monitoring | E-commerce sites | Web scraping |
| Invoice processing | Email + PDF | OCR + parsing |
| Bank statement reconciliation | CSV/PDF exports | Data matching |
| Financial news aggregation | News websites | Web scraping + NLP |
| Expense report generation | Internal tools + receipts | API + OCR |
| Tax rate monitoring | Government websites | Web scraping |
| Market data collection | Financial sites/APIs | Web scraping + API |
For this tutorial, we'll focus on three practical workflows: competitor pricing, financial news aggregation, and automated reporting.
Step 2: Set Up the Data Collection Layer
The foundation of any finance automation pipeline is reliable data collection. Here's a reusable module for extracting financial data:
# data_collector.py
import requests
import os
from datetime import datetime
from typing import Optional
SEARCHHIVE_API_KEY = os.getenv("SEARCHHIVE_API_KEY")
SEARCHHIVE_BASE = "https://api.searchhive.dev/v1"
def scrape_financial_site(url: str, extract_config: dict = None) -> dict:
"""Scrape a financial website with JS rendering and bot bypass."""
payload = {
"url": url,
"render_js": True,
"anti_bot": True
}
if extract_config:
payload["extraction"] = extract_config
response = requests.post(
f"{SEARCHHIVE_BASE}/scrapeforge",
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
json=payload,
timeout=30
)
response.raise_for_status()
return response.json()
def search_financial_data(query: str, max_results: int = 10) -> dict:
"""Search for financial data using SearchHive SwiftSearch."""
response = requests.post(
f"{SEARCHHIVE_BASE}/swiftsearch",
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
json={"query": query, "max_results": max_results},
timeout=15
)
response.raise_for_status()
return response.json()
Step 3: Automate Competitor Pricing Monitoring
Track competitor prices across multiple retailers and get alerts when prices change:
# pricing_monitor.py
import json
from data_collector import scrape_financial_site
from datetime import datetime
from typing import Optional
import hashlib
PRICE_HISTORY_FILE = "price_history.json"
COMPETITOR_CONFIGS = {
"competitor_a": {
"url": "https://competitor-a.com/products",
"extraction": {
"type": "structured",
"fields": {
"products": {
"selector": ".product-item",
"multiple": True,
"fields": {
"name": ".product-name::text",
"price": ".product-price::text",
"sku": "data-sku::attr(data-sku)"
}
}
}
}
},
"competitor_b": {
"url": "https://competitor-b.com/catalog",
"extraction": {
"type": "structured",
"fields": {
"products": {
"selector": ".catalog-item",
"multiple": True,
"fields": {
"name": ".item-title::text",
"price": ".item-price::text",
"sku": ".item-sku::text"
}
}
}
}
}
}
def load_price_history() -> dict:
try:
with open(PRICE_HISTORY_FILE, "r") as f:
return json.load(f)
except FileNotFoundError:
return {}
def save_price_history(data: dict):
with open(PRICE_HISTORY_FILE, "w") as f:
json.dump(data, f, indent=2)
def parse_price(raw: str) -> float:
import re
if not raw:
return 0.0
cleaned = re.sub(r'[^\d.]', '', raw)
try:
return float(cleaned)
except ValueError:
return 0.0
def monitor_prices() -> list:
"""Check competitor prices and detect changes."""
history = load_price_history()
alerts = []
today = datetime.utcnow().strftime("%Y-%m-%d")
for competitor, config in COMPETITOR_CONFIGS.items():
try:
data = scrape_financial_site(config["url"], config["extraction"])
products = data.get("data", {}).get("products", [])
for product in products:
sku = product.get("sku", product.get("name", ""))
sku_hash = hashlib.md5(sku.encode()).hexdigest()[:12]
key = f"{competitor}:{sku_hash}"
current_price = parse_price(product.get("price", ""))
if key in history:
old_price = history[key].get("price", 0)
if old_price > 0 and current_price != old_price:
change_pct = ((current_price - old_price) / old_price) * 100
alert = {
"competitor": competitor,
"product": product.get("name", "Unknown"),
"old_price": old_price,
"new_price": current_price,
"change_pct": round(change_pct, 2),
"date": today
}
alerts.append(alert)
print(f"⚠️ Price change: {alert['product']} ${old_price} → ${current_price} ({change_pct:+.1f}%)")
history[key] = {
"price": current_price,
"name": product.get("name", ""),
"last_seen": today
}
except Exception as e:
print(f"Error scraping {competitor}: {e}")
save_price_history(history)
return alerts
Step 4: Automate Financial News Aggregation
Collect financial news from multiple sources and categorize by topic:
# news_aggregator.py
from data_collector import search_financial_data, scrape_financial_site
from datetime import datetime
from typing import Optional
FINANCIAL_NEWS_QUERIES = [
"fintech funding rounds 2025",
"Federal Reserve interest rate news",
"cryptocurrency regulation updates",
"bank earnings reports Q1 2025"
]
def collect_financial_news(max_articles: int = 50) -> list:
"""Collect financial news from multiple queries."""
articles = []
seen_urls = set()
for query in FINANCIAL_NEWS_QUERIES:
results = search_financial_data(query, max_results=5)
for item in results.get("results", []):
url = item.get("url", "")
if url in seen_urls:
continue
seen_urls.add(url)
article = {
"title": item.get("title", ""),
"url": url,
"snippet": item.get("snippet", ""),
"source": item.get("domain", ""),
"query": query,
"collected_at": datetime.utcnow().isoformat()
}
articles.append(article)
# Sort by recency
articles.sort(key=lambda x: x["collected_at"], reverse=True)
return articles[:max_articles]
def generate_news_summary(articles: list) -> str:
"""Generate a text summary of collected articles."""
summary_lines = [f"Financial News Summary — {datetime.utcnow().strftime('%Y-%m-%d')}"]
summary_lines.append(f"Total articles collected: {len(articles)}")
summary_lines.append("")
for article in articles[:20]: # Top 20
summary_lines.append(f"- **{article['title']}** ({article['source']})")
summary_lines.append(f" {article['snippet'][:200]}")
summary_lines.append(f" [Read more]({article['url']})")
summary_lines.append("")
return "\n".join(summary_lines)
Step 5: Build Automated Reporting
Generate formatted financial reports automatically:
# report_generator.py
import pandas as pd
from datetime import datetime
from pathlib import Path
def generate_pricing_report(price_alerts: list, output_path: str = "reports/") -> str:
"""Generate an Excel report from pricing alerts."""
Path(output_path).mkdir(exist_ok=True)
if not price_alerts:
return "No price changes detected."
df = pd.DataFrame(price_alerts)
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
filename = f"{output_path}pricing_report_{timestamp}.xlsx"
# Create Excel with formatting
with pd.ExcelWriter(filename, engine="openpyxl") as writer:
df.to_excel(writer, sheet_name="Price Changes", index=False)
# Auto-adjust column widths
worksheet = writer.sheets["Price Changes"]
for col in worksheet.columns:
max_length = max(len(str(cell.value or "")) for cell in col)
worksheet.column_dimensions[col[0].column_letter].width = min(max_length + 2, 50)
return filename
def generate_daily_summary(news: list, alerts: list, output_path: str = "reports/") -> str:
"""Generate a combined daily finance summary."""
Path(output_path).mkdir(exist_ok=True)
timestamp = datetime.utcnow().strftime("%Y%m%d")
filename = f"{output_path}daily_summary_{timestamp}.md"
content = f"# Daily Finance Summary\n"
content += f"**Date**: {datetime.utcnow().strftime('%Y-%m-%d %H:%M UTC')}\n\n"
# Price alerts section
content += f"## Price Changes ({len(alerts)} detected)\n\n"
if alerts:
for alert in alerts:
emoji = "📉" if alert["change_pct"] < 0 else "📈"
content += f"{emoji} **{alert['product']}** ({alert['competitor']}): "
content += f"${alert['old_price']} → ${alert['new_price']} ({alert['change_pct']:+.1f}%)\n"
else:
content += "No price changes detected.\n"
# News section
content += f"\n## Financial News ({len(news)} articles)\n\n"
for article in news[:10]:
n content += f"- [{article['title']}]({article['url']}) — {article['source']}\n"
with open(filename, "w") as f:
f.write(content)
return filename
Step 6: Schedule Everything
Run your automation pipeline on a schedule:
# scheduler.py
import schedule
import time
from pricing_monitor import monitor_prices
from news_aggregator import collect_financial_news, generate_news_summary
from report_generator import generate_pricing_report, generate_daily_summary
import json
from datetime import datetime
def daily_finance_pipeline():
"""Run the complete daily finance automation pipeline."""
print(f"\n{'='*60}")
print(f"Starting daily finance pipeline: {datetime.utcnow().isoformat()}")
print(f"{'='*60}\n")
# Step 1: Monitor competitor prices
print("📊 Checking competitor prices...")
alerts = monitor_prices()
print(f" Found {len(alerts)} price changes")
# Step 2: Collect financial news
print("\n📰 Collecting financial news...")
news = collect_financial_news()
print(f" Collected {len(news)} articles")
# Step 3: Generate reports
print("\n📝 Generating reports...")
pricing_report = generate_pricing_report(alerts)
daily_report = generate_daily_summary(news, alerts)
print(f" Pricing report: {pricing_report}")
print(f" Daily summary: {daily_report}")
# Step 4: Save pipeline metadata
metadata = {
"run_time": datetime.utcnow().isoformat(),
"price_alerts": len(alerts),
"news_collected": len(news),
"reports_generated": [pricing_report, daily_report]
}
with open("pipeline_log.json", "a") as f:
f.write(json.dumps(metadata) + "\n")
print("\n✅ Daily pipeline complete")
# Schedule: Run every weekday at 8 AM
schedule.every().monday.at("08:00").do(daily_finance_pipeline)
schedule.every().tuesday.at("08:00").do(daily_finance_pipeline)
schedule.every().wednesday.at("08:00").do(daily_finance_pipeline)
schedule.every().thursday.at("08:00").do(daily_finance_pipeline)
schedule.every().friday.at("08:00").do(daily_finance_pipeline)
# Also run once on startup for testing
daily_finance_pipeline()
if __name__ == "__main__":
print("Finance automation scheduler running. Press Ctrl+C to stop.")
while True:
schedule.run_pending()
time.sleep(60)
Step 7: Add Error Handling and Audit Logging
Production finance automation needs proper error handling and audit trails:
# error_handler.py
import logging
import json
from datetime import datetime
from functools import wraps
from typing import Callable
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("finance_automation.log"),
logging.StreamHandler()
]
)
audit_log = []
def audit_log_step(func: Callable) -> Callable:
"""Decorator to log each automation step for compliance."""
@wraps(func)
def wrapper(*args, **kwargs):
step_name = func.__name__
start = datetime.utcnow()
logging.info(f"Starting: {step_name}")
try:
result = func(*args, **kwargs)
entry = {
"step": step_name,
"status": "success",
"started_at": start.isoformat(),
"completed_at": datetime.utcnow().isoformat(),
"records_processed": len(result) if isinstance(result, list) else 1
}
audit_log.append(entry)
logging.info(f"Completed: {step_name} ({len(audit_log)} total steps)")
return result
except Exception as e:
entry = {
"step": step_name,
"status": "error",
"error": str(e),
"started_at": start.isoformat(),
"failed_at": datetime.utcnow().isoformat()
}
audit_log.append(entry)
logging.error(f"Failed: {step_name} — {e}")
raise
return wrapper
def save_audit_log(path: str = "audit_logs/"):
from pathlib import Path
Path(path).mkdir(exist_ok=True)
filename = f"{path}audit_{datetime.utcnow().strftime('%Y%m%d')}.json"
with open(filename, "w") as f:
json.dump(audit_log, f, indent=2)
return filename
Common Issues and Solutions
1. Financial Sites Block Your Scrapes
Problem: Banks, stock exchanges, and financial data providers have strong bot protection.
Solution: SearchHive's anti-bot bypass handles this automatically. The anti_bot: True parameter (enabled by default on paid plans) manages Cloudflare challenges, CAPTCHAs, and browser fingerprint detection.
2. Data Format Inconsistency
Problem: Different sources return prices, dates, and numbers in different formats.
Solution: Create a normalization layer (like parse_price() in Step 3) that standardizes all data before it enters your pipeline.
3. Compliance and Data Retention
Problem: Financial data may be subject to GDPR, SOX, or other regulatory requirements.
Solution: Implement the audit logging pattern from Step 7. Set up automated data retention policies — archive old data and delete records beyond your required retention period.
4. Handling Site Changes
Problem: Target websites update their HTML structure, breaking your extraction rules.
Solution: Use SearchHive's DeepDive API for content-level extraction (extracts clean text regardless of HTML structure changes). Monitor for extraction failures and alert your team.
Next Steps
- Add database storage: Replace free JSON formatter file-based storage with PostgreSQL or SQLite
- Set up alerting: Email or Slack notifications for significant price changes
- Deploy to production: Containerize with Docker and deploy to your cloud provider
- Add more data sources: Extend competitor configs, add API integrations with financial data providers
Ready to automate your finance workflows? Get started with SearchHive's free tier — 1,000 requests per month, no credit card required. Check the API docs for SwiftSearch and ScrapeForge integration guides.
See also: