How to Scrape Trustpilot Reviews for Brand Monitoring
Trustpilot hosts over 250 million reviews across millions of businesses. For brands, monitoring these reviews means catching customer complaints early, tracking sentiment trends, and benchmarking against competitors.
This tutorial walks through scraping Trustpilot reviews using Python and SearchHive's ScrapeForge API — with structured data extraction via DeepDive.
Key Takeaways
- Trustpilot reviews are loaded dynamically via JavaScript, so simple HTTP requests won't work
- SearchHive's ScrapeForge renders JavaScript automatically — no Playwright/Selenium setup needed
- DeepDive extracts review fields (rating, text, date, author) into clean free JSON formatter
- You can monitor multiple companies and track sentiment changes over time
- 500 free credits to start, no credit card required
Prerequisites
- Python 3.8+
requestslibrary (pip install requests)- A SearchHive API key (get one free)
- Basic familiarity with Python and REST APIs
Step 1: Understand Trustpilot's Page Structure
Each company on Trustpilot has a reviews page at:
https://www.trustpilot.com/review/{company_name}
Pages include:
- Overall rating and review count
- Individual reviews with star rating, text, author name, date
- Pagination (up to ~10 reviews per page in the default view)
Trustpilot loads reviews via JavaScript — the initial HTML doesn't contain review content. This is why plain requests.get() alone won't work.
Step 2: Set Up Your SearchHive Client
import requests
import json
import time
from datetime import datetime
SEARCHHIVE_API_KEY = "your_api_key_here"
BASE_URL = "https://api.searchhive.dev/v1"
def scrape_trustpilot_page(company_name, page_num=1):
# Scrape a single page of Trustpilot reviews
url = f"https://www.trustpilot.com/review/{company_name}?page={page_num}"
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
json={
"url": url,
"render_js": True,
"format": "markdown"
}
)
response.raise_for_status()
return response.json()
def extract_reviews(content):
# Extract structured review data using DeepDive
response = requests.post(
f"{BASE_URL}/deepdive",
headers={"Authorization": f"Bearer {SEARCHHIVE_API_KEY}"},
json={
"content": content,
"extract": [
"review_author",
"review_rating",
"review_title",
"review_text",
"review_date",
"company_name"
]
}
)
response.raise_for_status()
return response.json()["data"]
Step 3: Scrape Reviews from Multiple Pages
def scrape_all_reviews(company_name, max_pages=5):
# Scrape reviews across multiple pages
all_reviews = []
for page in range(1, max_pages + 1):
print(f"Scraping page {page}...")
result = scrape_trustpilot_page(company_name, page)
if not result.get("markdown"):
print(f"No content on page {page}, stopping.")
break
# Extract structured data from the page content
reviews = extract_reviews(result["markdown"])
all_reviews.extend(reviews)
# Be respectful - add delay between requests
time.sleep(2)
return all_reviews
Step 4: Analyze Sentiment and Track Over Time
from collections import Counter
def analyze_reviews(reviews):
# Calculate basic review metrics
if not reviews:
return {}
ratings = [r.get("review_rating", 0) for r in reviews]
rating_counts = Counter(str(r) for r in ratings if r)
avg_rating = sum(r for r in ratings if r) / len(ratings) if ratings else 0
return {
"total_reviews": len(reviews),
"average_rating": round(avg_rating, 1),
"rating_distribution": dict(rating_counts),
"latest_review_date": max(
(r.get("review_date") for r in reviews if r.get("review_date")),
default=None
)
}
def save_reviews(reviews, filename):
# Save reviews to JSON for later analysis
with open(filename, "w", encoding="utf-8") as f:
json.dump(reviews, f, indent=2, ensure_ascii=False)
print(f"Saved {len(reviews)} reviews to {filename}")
Step 5: Monitor Competitors Side by Side
def monitor_companies(companies, pages_per_company=3):
# Scrape and compare reviews across multiple companies
report = {}
for company in companies:
print(f"\n{'='*50}")
print(f"Monitoring: {company}")
print(f"{'='*50}")
reviews = scrape_all_reviews(company, max_pages=pages_per_company)
metrics = analyze_reviews(reviews)
report[company] = {
"metrics": metrics,
"recent_reviews": reviews[:5] # Keep top 5 for snapshot
}
print(f" Average rating: {metrics.get('average_rating', 'N/A')}")
print(f" Total scraped: {metrics.get('total_reviews', 0)}")
time.sleep(3) # Delay between companies
return report
# Usage
if __name__ == "__main__":
companies = ["amazon", "ebay", "shopify"]
report = monitor_companies(companies, pages_per_company=3)
# Save full report
save_reviews(report, "trustpilot_report.json")
# Print summary
for company, data in report.items():
m = data["metrics"]
print(f"\n{company.upper()}")
print(f" Rating: {m.get('average_rating', 'N/A')}/5")
print(f" Reviews scraped: {m.get('total_reviews', 0)}")
Step 6: Set Up Automated Scheduled Monitoring
For continuous monitoring, wrap the scraper in a scheduled job:
import schedule
def daily_monitor():
# Run daily brand monitoring
companies = ["your-brand", "competitor-1", "competitor-2"]
report = monitor_companies(companies, pages_per_company=2)
timestamp = datetime.now().strftime("%Y-%m-%d")
save_reviews(report, f"monitoring_{timestamp}.json")
# Alert on significant rating drops
for company, data in report.items():
avg = data["metrics"].get("average_rating", 5)
if avg < 3.0:
print(f"ALERT: {company} average rating dropped to {avg}")
# Schedule to run daily at 9 AM
schedule.every().day.at("09:00").do(daily_monitor)
# For quick testing, run once
daily_monitor()
Common Issues and Solutions
Issue: Empty results from ScrapeForge
Trustpilot may block requests from certain IPs. SearchHive routes through rotating proxies, but if you still get empty pages, increase the wait_for parameter:
json={"url": url, "render_js": True, "wait_for": 3000}
Issue: Incomplete review extraction
Trustpilot paginates with infinite scroll. The ?page=N parameter loads different batches. If DeepDive misses some reviews, try extracting with raw HTML format instead of markdown.
Issue: Rate limiting
Trustpilot is aggressive about rate limiting. Keep delays between requests (2-5 seconds) and limit concurrent scrapes. SearchHive's proxy rotation helps, but don't hammer a single company page.
Next Steps
- Set up automated monitoring for your brand and top 5 competitors
- Export review data to a dashboard tool (Grafana, Looker, or a simple spreadsheet)
- Combine with /blog/google-scholar-scraper-tutorial for academic citation monitoring
- For competitor SEO research, pair with /compare/serpapi for search ranking data
Start scraping Trustpilot reviews today with 500 free credits. No credit card needed — just grab an API key and start building.