How to Scrape Amazon Without Getting Blocked? Complete Answer

Amazon is one of the hardest websites to scrape. They invest heavily in anti-bot technology -- CAPTCHAs, IP blocking, behavioral analysis, JavaScript challenges, and device fingerprinting. If you try to scrape Amazon with basic requests and no precautions, you'll get blocked within minutes.

This FAQ covers every technique for scraping Amazon data reliably, from simple approaches to production-grade solutions, with code examples and honest cost comparisons.

Key Takeaways

Amazon blocks aggressively -- basic HTTP requests and datacenter proxies get detected and banned quickly
Residential proxies are mandatory for any volume beyond a few requests
API-based services (SearchHive ScrapeForge, ScrapingBee) handle Amazon's anti-bot measures for you
Pre-built scrapers (Apify Amazon Scraper) work but are expensive for ongoing use
SearchHive DeepDive can research Amazon product data without you writing any scraper logic
Respect rate limits and terms of service -- aggressive scraping risks legal action

Why is Amazon so hard to scrape?

Amazon uses multiple layers of anti-bot protection:

IP reputation tracking: Datacenter IPs get flagged fast. Amazon maintains a massive database of known proxy and VPN IPs.
Behavioral analysis: They track mouse movements, scroll patterns, click timing, and navigation patterns to distinguish bots from humans.
JavaScript challenges: Pages load dynamic content via JS that requires a real browser engine to execute.
CAPTCHAs: Triggered when suspicious activity is detected -- image puzzles, checkbox challenges.
Device fingerprinting: Browser fingerprinting (canvas, WebGL, fonts, plugins) identifies automated browsers.
Rate limiting per session: Even legitimate-looking sessions get throttled if they request too many pages too fast.
TLS fingerprinting: Amazon checks the TLS handshake to verify you're using a real browser's SSL implementation.

What's the easiest way to scrape Amazon?

The easiest approach: use an API that handles Amazon's anti-bot measures for you.

SearchHive ScrapeForge

import requests

headers = {"Authorization": "Bearer YOUR_API_KEY"}

# Scrape an Amazon product page -- proxy rotation and JS rendering included
response = requests.get(
    "https://api.searchhive.dev/scrapeforge",
    headers=headers,
    params={
        "url": "https://www.amazon.com/dp/B0C4JVT6KQ",
        "format": "markdown",
        "js_render": True
    }
).json()

print(response["markdown"][:1000])

ScrapeForge automatically handles:

Residential proxy rotation (different IP for each request)
JavaScript rendering (waits for dynamic content to load)
CAPTCHA solving (when triggered)
Browser fingerprint spoofing

At $49/month for 100K credits, this is significantly cheaper than managing your own proxy infrastructure.

SearchHive DeepDive for product research

If you don't need to scrape individual pages but want product data:

import requests

response = requests.post(
    "https://api.searchhive.dev/deepdive",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "query": "Find the top 5 best-selling wireless earbuds on Amazon under $100 in 2026, with prices and ratings",
        "depth": "comprehensive"
    }
).json()

print(response["answer"])

How do I scrape Amazon with Python (manual approach)?

If you want to build your own scraper, here's what you need:

1. Residential Proxies

Datacenter proxies won't work. You need residential proxies -- IPs from real ISPs that look like normal users.

Popular residential proxy providers:

Bright Data: From $5.50/GB, large proxy pool
Oxylabs: From $8/GB, good reliability
Smartproxy: From $4.40/GB, budget option

2. Undetected Browser

Use undetected-chromedriver or playwright-stealth:

from playwright.sync_api import sync_playwright
import time

def scrape_amazon_product(asin, proxy):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={"server": proxy}
        )
        context = browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080}
        )
        page = context.new_page()

        page.goto(f"https://www.amazon.com/dp/{asin}")
        page.wait_for_selector("#productTitle", timeout=10000)
        time.sleep(2)  # Simulate human reading time

        title = page.locator("#productTitle").inner_text().strip()
        price = page.locator(".a-price .a-offscreen").first.inner_text().strip()
        rating = page.locator("#acrPopover .a-size-base").first.inner_text().strip()

        browser.close()
        return {"title": title, "price": price, "rating": rating}

3. Rate Limiting

Add random delays between requests to mimic human behavior:

import random
import time

asins = ["B0C4JVT6KQ", "B09V3KXJPB", "B0CJ5J7TVR"]

for asin in asins:
    result = scrape_amazon_product(asin, "http://your-proxy:port")
    print(result)
    time.sleep(random.uniform(5, 15))  # 5-15 second delay

4. Session Management

Maintain cookies across requests to appear as a returning user:

context = browser.new_context(
    storage_state="amazon_cookies.json",  # Load saved session
    user_agent="..."
)
# After successful scrape, save cookies
context.storage_state(path="amazon_cookies.json")

What about Apify's Amazon scraper?

Apify offers a pre-built Amazon scraper (Actor) that handles proxy rotation and CAPTCHAs:

Amazon Product Scraper: $49/month (Personal plan), ~$0.001-0.003 per product
Amazon Reviews Scraper: Same pricing
Amazon Search Scraper: Same pricing

Pros: Works out of the box, handles anti-bot measures, good for non-developers Cons: Expensive at scale, limited customization, Apify's credit system is confusing

At 100K products/month, Apify costs $100-300/month depending on credit usage. SearchHive ScrapeForge does the same for $49/month.

What data can you legally scrape from Amazon?

Product titles, prices, ratings: Publicly available on product pages
Product descriptions and specs: Public information
Customer reviews: Publicly available, but be careful with user-generated content
Seller information: Public on storefront pages
Best seller rankings: Publicly displayed

Avoid:

Scraping personal information (buyer names, addresses)
Bypassing paywalls or login-gated content
Scraping at volumes that cause service disruption
Using scraped data to compete directly on Amazon (may violate their Terms of Service)

The hiQ vs. LinkedIn (2022) ruling established that scraping publicly available data is generally legal in the US. However, Amazon's ToS explicitly prohibits automated access, which creates a contractual risk even if scraping itself is legal.

How do I scrape Amazon search results?

To get product listings from Amazon search:

import requests

headers = {"Authorization": "Bearer YOUR_API_KEY"}

# Scrape Amazon search results page
response = requests.get(
    "https://api.searchhive.dev/scrapeforge",
    headers=headers,
    params={
        "url": "https://www.amazon.com/s?k=wireless+earbuds&ref=nb_sb_noss",
        "format": "markdown",
        "js_render": True
    }
).json()

# Parse the markdown output for product data
# In production, use a structured extraction or LLM to parse the markdown
print(response["markdown"][:2000])

For structured extraction, use SearchHive's free JSON formatter output format or parse the markdown with regex tester/LLM.

How do I scrape Amazon at scale?

For large-scale Amazon scraping (thousands to millions of products):

Use a queue system: Redis or SQS to manage URLs to scrape
Distribute across workers: Multiple scraper instances with different proxy IPs
Implement exponential backoff: When blocked, wait longer before retrying
Monitor success rates: Track blocks, CAPTCHAs, and empty responses
Rotate user agent parsers: Cycle through hundreds of realistic user agent strings
Respect crawl rate: 1 request per 5-15 seconds per IP minimum

import requests
import time
import json

API_KEY = "YOUR_API_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}

def batch_scrape_amazon(asins, output_file="amazon_products.json"):
    results = []
    for i, asin in enumerate(asins):
        try:
            response = requests.get(
                "https://api.searchhive.dev/scrapeforge",
                headers=headers,
                params={
                    "url": f"https://www.amazon.com/dp/{asin}",
                    "format": "markdown",
                    "js_render": True
                },
                timeout=30
            )
            if response.status_code == 200:
                data = response.json()
                results.append({"asin": asin, "content": data.get("markdown", "")})

            # Rate limiting
            time.sleep(random.uniform(2, 5))

        except Exception as e:
            print(f"Error scraping {asin}: {e}")
            continue

        # Save checkpoint every 100 products
        if (i + 1) % 100 == 0:
            with open(output_file, "w") as f:
                json.dump(results, f)
            print(f"Checkpoint: {i+1}/{len(asins)} products scraped")

    return results

What are the alternatives to scraping Amazon?

If you don't want to deal with scraping at all:

Amazon Product Advertising API (PA-API): Official API, but requires Amazon Associates registration and has strict usage limits
Amazon SP-API (Selling Partner API): Only available to registered Amazon sellers
Third-party data providers: Keepa, Jungle Scout, Helium 10 -- provide historical price data, sales estimates, and BSR tracking
SearchHive DeepDive: Research product categories, pricing trends, and competitor analysis without scraping individual pages

Get started

Don't fight Amazon's anti-bot systems alone. SearchHive ScrapeForge handles proxy rotation, JS rendering, and CAPTCHA solving so you can focus on your application.

Free tier: 500 credits to test
Starter plan: $9/month for 5,000 credits
Builder plan: $49/month for 100,000 credits

Get your API key: https://searchhive.dev

For more scraping guides, see /blog/how-to-automate-web-scraping-complete-answer and /blog/what-is-the-best-web-scraping-api-complete-answer.