How to Captcha Solving For Scraping — Step-by-Step

Web scraping hits a wall the moment a site throws a CAPTCHA. Whether you're building a price monitor, a lead generation pipeline, or a research tool, CAPTCHAs are the most common barrier between your scraper and the data you need.

This guide covers every captcha solving for scraping approach -- from manual services to fully automated API-based solutions -- with working code examples using SearchHive's built-in CAPTCHA handling and standalone solving services.

Key Takeaways

CAPTCHAs come in 6 major types: reCAPTCHA v2, reCAPTCHA v3, hCaptcha, Cloudflare Turnstile, Cloudflare Challenge, and text/audio CAPTCHAs
Third-party solving services (2Captcha, CapSolver, Anti-Captcha) cost $0.50-$3.00 per 1000 solves
SearchHive handles CAPTCHAs automatically as part of its scraping API -- no separate service needed
The most reliable approach combines proxy rotation, request fingerprinting, and automatic CAPTCHA solving

Prerequisites

Before we start, you'll need:

Python 3.8+ installed
A SearchHive account (free tier includes CAPTCHA solving on 500 requests/month)
Basic familiarity with HTTP requests and web scraping concepts
For standalone services: accounts with 2Captcha, CapSolver, or Anti-Captcha

Install the required packages:

pip install requests searchhive-sdk

If you're using standalone solving services:

pip install 2captcha-python anticaptchaofficial capsolver-python

Step 1: Identify the CAPTCHA Type

Before solving anything, you need to know what you're dealing with. Here's a quick reference:

CAPTCHA Type	Visual	Common Sites	Difficulty
reCAPTCHA v2	Checkbox + image grids	Google services, broad adoption	Medium
reCAPTCHA v3	Invisible (no user interaction)	Google services, enterprise	Hard
hCaptcha	Image selection (boats, crosswalks)	Cloudflare sites, Discord	Medium
Cloudflare Turnstile	Invisible checkbox	Modern Cloudflare-protected sites	Medium
Cloudflare Challenge	Full-page interstitial with JS challenge	E-commerce, SaaS dashboards	Very Hard
Text/Audio	Distorted text or audio clips	Legacy sites, government forms	Low

Detecting the CAPTCHA type programmatically:

import requests

def detect_captcha_type(html):
    # Check for reCAPTCHA v2
    if 'google.com/recaptcha/api.js' in html or 'g-recaptcha' in html:
        return 'recaptcha_v2'
    # Check for reCAPTCHA v3
    if 'grecaptcha' in html and 'render' in html and 'enterprise' in html:
        return 'recaptcha_v3'
    # Check for hCaptcha
    if 'hcaptcha.com' in html or 'h-captcha' in html:
        return 'hcaptcha'
    # Check for Cloudflare Turnstile
    if 'challenges.cloudflare.com/turnstile' in html or 'cf-turnstile' in html:
        return 'turnstile'
    # Check for Cloudflare Challenge
    if 'challenge-platform' in html or 'cf-browser-verification' in html:
        return 'cloudflare_challenge'
    return 'unknown'

url = 'https://example.com/protected-page'
resp = requests.get(url, timeout=10)
captcha_type = detect_captcha_type(resp.text)
print(f'Detected CAPTCHA type: {captcha_type}')

Step 2: Use SearchHive's Built-in CAPTCHA Solving

The simplest approach is to let SearchHive handle everything. SearchHive's ScrapeForge API includes automatic CAPTCHA detection and solving -- no extra configuration required.

from searchhive import ScrapeForge

client = ScrapeForge(api_key='your-api-key')

# Single page scrape with automatic CAPTCHA handling
result = client.scrape(
    url='https://example.com/protected-page',
    render_js=True,       # Enable for SPAs and JS-rendered CAPTCHAs
    anti_bot=True,        # Enable anti-bot fingerprinting
    solve_captchas=True   # Automatically solve detected CAPTCHAs
)

if result.success:
    print(f'Title: {result.data.get("title")}')
    print(f'Content length: {len(result.html)} chars')
else:
    print(f'Error: {result.error}')

SearchHive supports all major CAPTCHA types out of the box:

reCAPTCHA v2 and v3 -- solved automatically
hCaptcha -- solved automatically
Cloudflare Turnstile -- bypassed via browser fingerprinting
Cloudflare Challenge -- handled with residential proxies + headless browser

Step 3: Integrate 2Captcha for Standalone Solving

If you need more control over the solving process, 2Captcha is the most popular standalone service at around $2.99 per 1000 reCAPTCHA v2 solves.

from twocaptcha import TwoCaptcha

solver = TwoCaptcha('YOUR_2CAPTCHA_API_KEY')

# Solve reCAPTCHA v2
result = solver.recaptcha(
    sitekey='6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-',
    url='https://example.com/page-with-recaptcha'
)
print(f'Token: {result["code"]}')

# Solve hCaptcha
result = solver.hcaptcha(
    sitekey='SITE_KEY_HERE',
    url='https://example.com/page-with-hcaptcha'
)
print(f'Token: {result["code"]}')

# Solve text CAPTCHA
result = solver.normal('BASE64_IMAGE_OR_TEXT')
print(f'Solution: {result["code"]}')

Step 4: Integrate CapSolver for Faster Solving

CapSolver offers AI-powered solving with faster response times. Pricing starts at $0.80 per 1000 reCAPTCHA v2 solves -- significantly cheaper than 2Captcha.

import capsolver

capsolver.api_key = 'YOUR_CAPSOLVER_API_KEY'

# Solve reCAPTCHA v2
solution = capsolver.solve({
    "type": "ReCaptchaV2TaskProxyLess",
    "websiteURL": "https://example.com",
    "websiteKey": "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-"
})
print(f'g-recaptcha-response: {solution["gRecaptchaResponse"]}')

# Solve Cloudflare Turnstile
solution = capsolver.solve({
    "type": "AntiTurnstileTaskProxyLess",
    "websiteURL": "https://example.com",
    "websiteKey": "SITE_KEY_HERE"
})
print(f'Turnstile token: {solution["token"]}')

Step 5: Build a Retry Pipeline with CAPTCHA Fallback

Production scrapers need robust error handling. Here's a pipeline that tries the request first, detects CAPTCHAs, solves them, and retries:

import requests
import time
from searchhive import ScrapeForge

def robust_scrape(url, max_retries=3):
    client = ScrapeForge(api_key='your-api-key')

    for attempt in range(max_retries):
        result = client.scrape(
            url=url,
            render_js=True,
            anti_bot=True,
            solve_captchas=True
        )

        if result.success:
            return result

        # Check if blocked by CAPTCHA
        if result.status_code in (403, 503):
            print(f'Attempt {attempt + 1}: CAPTCHA detected, retrying...')
            time.sleep(2 ** attempt)  # Exponential backoff
        else:
            print(f'Attempt {attempt + 1}: Error {result.status_code}')
            break

    return None

# Scrape multiple pages with rate limiting
urls = [
    'https://example.com/product/1',
    'https://example.com/product/2',
    'https://example.com/product/3',
]

for url in urls:
    result = robust_scrape(url)
    if result and result.success:
        print(f'Scraped: {result.data.get("title")}')
    time.sleep(1)  # Be polite between requests

Step 6: Handle Cloudflare Challenge Pages

Cloudflare Challenge pages are the hardest CAPTCHAs to bypass because they combine JS challenges, browser fingerprinting, and rate limiting. SearchHive uses residential proxies and headless browsers to handle these:

from searchhive import ScrapeForge

client = ScrapeForge(api_key='your-api-key')

# Cloudflare-protected pages need residential proxies
result = client.scrape(
    url='https://cloudflare-protected-site.com/data',
    render_js=True,
    anti_bot=True,
    solve_captchas=True,
    proxy_type='residential',  # Use residential proxies
    country='us'               # Geo-target if needed
)

if result.success:
    print('Cloudflare challenge bypassed successfully')
    print(result.html[:500])

Step 7: Cost Optimization

CAPTCHA solving costs add up fast. Here's how to keep them under control:

Approach	Cost per 1000 solves	Best For
SearchHive (included)	$0 (bundled with API calls)	All-purpose scraping
2Captcha	$2.99 (reCAPTCHA v2)	Budget scraping, text CAPTCHAs
CapSolver	$0.80 (reCAPTCHA v2)	High-volume, cost-sensitive
Anti-Captcha	$1.80 (reCAPTCHA v2)	Balanced cost/reliability
CapMonster Cloud	$0.70 (reCAPTCHA v2)	Maximum throughput

Cost-saving strategies:

Use SearchHive first -- CAPTCHA solving is included in every request, so you don't pay per-solve fees
Cache CAPTCHA tokens -- reCAPTCHA v2 tokens are valid for 2 minutes; reuse them across requests to the same site
Rotate proxies -- changing your IP reduces the frequency of CAPTCHA triggers
Respect rate limits -- space requests 1-3 seconds apart to avoid triggering anti-bot systems
Use session cookies -- maintain login sessions to avoid repeated CAPTCHA challenges

Step 8: Legal and Ethical Considerations

Before implementing CAPTCHA solving in production:

Check the site's Terms of Service -- many sites explicitly prohibit automated access
Respect robots.txt generator -- check robots.txt before scraping
Rate limit your requests -- aggressive scraping degrades service for other users
Don't bypass security for malicious purposes -- CAPTCHA solving for legitimate data collection is generally acceptable; bypassing security for credential stuffing or DDoS is not
GDPR and data privacy -- ensure you're not scraping personal data without a legal basis

Complete Code Example

Here's a production-ready scraper that handles CAPTCHAs automatically:

import json
import time
from searchhive import ScrapeForge

def scrape_with_retry(urls, api_key, delay=1.5, max_retries=2):
    client = ScrapeForge(api_key=api_key)
    results = []

    for i, url in enumerate(urls):
        print(f'[{i+1}/{len(urls)}] Scraping {url}...')

        for attempt in range(max_retries + 1):
            result = client.scrape(
                url=url,
                render_js=True,
                anti_bot=True,
                solve_captchas=True
            )

            if result.success:
                results.append({
                    'url': url,
                    'title': result.data.get('title', ''),
                    'status': 'success'
                })
                print(f'  Success: {result.data.get("title", "")[:60]}')
                break

            if result.status_code in (403, 503):
                wait = delay * (2 ** attempt)
                print(f'  CAPTCHA/block detected, waiting {wait}s...')
                time.sleep(wait)
            else:
                results.append({
                    'url': url,
                    'status': 'failed',
                    'error': str(result.error)
                })
                print(f'  Failed: {result.error}')
                break

        time.sleep(delay)

    return results

if __name__ == '__main__':
    API_KEY = 'your-searchhive-api-key'
    urls = [
        'https://httpbin.org/html',
        'https://example.com',
    ]

    results = scrape_with_retry(urls, API_KEY)
    print(json.dumps(results, indent=2))

Common Issues

CAPTCHA solving takes too long. Most solving services return results in 10-30 seconds. If you need faster turnaround, use SearchHive's built-in solving which averages 3-5 seconds since it combines solving with the scrape request.

High failure rate on Cloudflare sites. Datacenter proxies get flagged quickly. Use residential proxies (available on SearchHive Builder and Unicorn plans) for Cloudflare-protected sites.

CAPTCHAs keep appearing even after solving. This means your request pattern is being detected. Add random delays (2-5 seconds), rotate user agents, and use session management to maintain cookies.

Next Steps

Start with SearchHive's free tier -- 500 requests/month with CAPTCHA solving included
Read the SearchHive API docs for advanced anti-bot configuration
Check out /blog/searchhive-vs-2captcha-captcha-solving for a detailed service comparison
For large-scale scraping, see /blog/how-to-build-a-web-scraper-that-scales

Ready to scrape without worrying about CAPTCHAs? Get started with SearchHive free -- no credit card required, CAPTCHA solving included on every plan.

How to Captcha Solving For Scraping — Step-by-Step

AI-Powered Research

How to Captcha Solving For Scraping — Step-by-Step

Key Takeaways

Prerequisites

Step 1: Identify the CAPTCHA Type

Step 2: Use SearchHive's Built-in CAPTCHA Solving

Step 3: Integrate 2Captcha for Standalone Solving

Step 4: Integrate CapSolver for Faster Solving

Step 5: Build a Retry Pipeline with CAPTCHA Fallback

Step 6: Handle Cloudflare Challenge Pages

Step 7: Cost Optimization

Step 8: Legal and Ethical Considerations

Complete Code Example

Common Issues

Next Steps

Keywords

RELATED ARTICLES

How to Build a Web Scraping API Pipeline — Step-by-Step

Complete Guide to Api Playground Tools

Complete Guide to Autonomous Agents Design

BUILD WITH SEARCHHIVE