Whether you are building a visual search engine, training computer vision models, aggregating product images, or archiving web content, you need an API that can extract images reliably at scale. But image scraping has unique challenges — lazy-loaded images, responsive srcsets, CDN URLs, Base64 encoder-encoded thumbnails, and anti-hotlinking protection all complicate the process.
This guide compares the best APIs for scraping images from websites, with pricing, code examples, and specific guidance on handling the quirks of modern image delivery.
Key Takeaways
- Most scraping APIs can extract image URLs but few handle lazy loading, srcsets, and base64 images well
- SearchHive's ScrapeForge extracts full-resolution image URLs with automatic lazy-load resolution
- ScrapingBee and ZenRows render JavaScript before extraction, catching dynamically loaded images
- For bulk image downloading, combine a scraping API with an async download queue
- Always check robots.txt generator and terms of service before scraping images at scale
1. SearchHive (ScrapeForge)
SearchHive's ScrapeForge endpoint extracts image URLs from rendered pages, resolving lazy-loaded images and responsive srcsets automatically.
Pricing: Free 500 credits, Starter $9/5K, Builder $49/100K, Unicorn $199/500K.
import requests
# Scrape a page and extract all images
response = requests.post(
"https://api.searchhive.dev/v1/scrapeforge",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://example-store.com/products/shoes",
"format": "json",
"render_js": True,
"extract": {
"images": {
"selector": "img",
"type": "list",
"attributes": ["src", "data-src", "data-lazy-src", "srcset"]
},
"product_images": {
"selector": ".product-gallery img",
"type": "list",
"attributes": ["src", "data-src"]
}
}
}
)
data = response.json()
all_images = set()
# Collect all image URLs from different attributes
for attr in ["src", "data-src", "data-lazy-src"]:
for img in data.get("data", {}).get("images", []):
url = img.get(attr, "")
if url and url.startswith("http"):
all_images.add(url)
print(f"Found {len(all_images)} images")
for url in list(all_images)[:10]:
print(url)
Why it leads: ScrapeForge resolves data-src and data-lazy-src attributes that JavaScript lazy-loading libraries use. This catches images that basic HTML parsers miss. The render_js: True parameter ensures images loaded by JavaScript (infinite scroll, virtualized lists) are also captured.
2. ScrapingBee
ScrapingBee renders pages in headless Chrome and provides a screenshot API for capturing full-page or element screenshots.
Pricing: Freelance $49/250K credits, Startup $99/1M, Business $249/3M. JS rendering = 5 credits, screenshots = 5 credits.
import requests
import json
# Extract image URLs from a rendered page
response = requests.get(
"https://app.scrapingbee.com/api/v1/",
params={
"api_key": "YOUR_KEY",
"url": "https://photo-gallery.com/album",
"render_js": "true",
"wait": 3000,
"extract_rules": json.dumps({
"images": {"selector": "img", "output": ["src", "alt"]}
})
}
)
images = response.json().get("images", [])
for img in images:
print(f"{img.get(chr(97)+chr(108)+chr(116), chr(78)+chr(111)+chr(32)+chr(97)+chr(108)+chr(116))}: {img[chr(115)+chr(114)+chr(99)]}")
# Take a screenshot of the full page
screenshot_response = requests.get(
"https://app.scrapingbee.com/api/v1/screenshot",
params={
"api_key": "YOUR_KEY",
"url": "https://example.com/page",
"full_page": "true",
"output_format": "png"
}
)
with open("screenshot.png", "wb") as f:
f.write(screenshot_response.content)
Why it stands out: The dedicated screenshot API lets you capture visual page states, not just image URLs. Combined with JS rendering and proxy rotation, it handles most image extraction scenarios.
3. Firecrawl
Firecrawl converts pages to Markdown with embedded image references, making it useful for extracting image URLs from content pages.
Pricing: Free 500 credits (one-time), Hobby $16/3K/mo, Standard $83/100K/mo.
from firecrawl import FirecrawlApp
import re
app = FirecrawlApp(api_key="YOUR_KEY")
result = app.scrape_url(
"https://blog.example.com/post-with-images",
params={"formats": ["markdown", "html"]}
)
# Image URLs are embedded in the Markdown output
markdown = result.get("markdown", "")
images = re.findall(r"!\[.*?\]\((https?://[^\s)]+)\)", markdown)
print(f"Found {len(images)} images in content")
for url in images[:10]:
print(url)
Best for: Blog content and documentation sites where images are embedded in Markdown-style content.
4. ZenRows
ZenRows renders JavaScript and bypasses anti-bot protection, making it suitable for scraping images from heavily protected sites.
Pricing: Starts at $49/month for 250K requests.
import requests
import re
response = requests.get(
"https://api.zenrows.com/v1/batches",
params={
"apikey": "YOUR_KEY",
"url": "https://protected-gallery.com",
"js_render": "true",
"antibot": "true",
"premium_proxy": "true"
}
)
html = response.json()["result"]["content"]
images = re.findall(r'src="(https?://[^"]+\.(jpg|jpeg|png|webp|gif|svg))"', html, re.I)
print(f"Found {len(images)} images")
for url, fmt in images[:10]:
print(f"[{fmt}] {url}")
Best for: Image scraping from sites with Cloudflare, DataDome, or similar protection.
5. Apify Web Scraper
Apify provides pre-built scraping actors with JavaScript rendering and site-wide crawling capabilities.
Pricing: Free $5 credit/mo, Starter $49/100K, Business $199/750K.
from apify_client import ApifyClient
client = ApifyClient("YOUR_KEY")
run = client.actor("apify/web-scraper").call(
run_input={
"startUrls": [{"url": "https://store.com/category/shoes"}],
"selectors": ["img[src]", "img[data-src]"],
"maxPages": 50
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
for img in item.get("images", []):
print(img)
Best for: Crawling entire sites to collect all images across multiple pages.
6. ScrapeGraphAI
ScrapeGraphAI uses AI to extract images based on semantic descriptions of what you are looking for.
Pricing: Free 50 credits, Starter $17/60K/year, Growth $85/480K/year.
from scrapegraphai.graphs import SmartScraperGraph
graph = SmartScraperGraph(
prompt="Extract all product images with their URLs, alt text, and dimensions",
source="https://shop.example.com/product/12345",
config={"llm": {"model": "gpt-4o-mini"}}
)
result = graph.run()
print(result)
Best for: Extracting images with contextual understanding of what they represent.
7. imgproxy (Self-Hosted)
imgproxy is a fast, secure image processing proxy that can fetch, resize, and cache images. Essential infrastructure for image scraping pipelines.
Pricing: Free (open source). MIT license.
# imgproxy URL format for image processing
import requests
proxy_url = "https://imgproxy.your-domain.com/insecure/plain/https://example.com/image.jpg"
proxy_url += "/rs:fit:800:600/f:webp/q:80"
response = requests.get(proxy_url)
with open("resized.webp", "wb") as f:
f.write(response.content)
Best for: Post-processing scraped images — resizing, format conversion, face detection — at scale.
Comparison Table
| API | JS Rendering | Lazy Load Support | Anti-Bot | Bulk Crawl | Entry Price | |---|---|---|---|---| | SearchHive | Yes | Yes | Yes | Via DeepDive | $9/mo | | ScrapingBee | Yes | Yes | Partial | No | $49/mo | | Firecrawl | Yes | Yes | Partial | Yes | $16/mo | | ZenRows | Yes | Yes | Yes | No | $49/mo | | Apify | Yes | Yes | Partial | Yes | $49/mo | | ScrapeGraphAI | Yes | Yes | Yes | No | $17/mo | | imgproxy | N/A | N/A | N/A | N/A | Free |
Handling Common Image Scraping Challenges
Lazy-Loaded Images
Many sites use Intersection Observer to load images only when they scroll into view. Solution: use JS rendering with a scroll simulation, or use SearchHive/ZenRows which handle this automatically.
Responsive Srcsets
Modern sites serve different image sizes via srcset attributes. Extract the full srcset string and parse it to select the resolution you need.
Base64-Encoded Images
Some sites embed small images as base64 data URIs. These will not have HTTP URLs. Filter these out if you only need downloadable images.
CDN URLs
Images are often served from CDNs with signed URLs that expire. Download images promptly after scraping.
Verdict
For general-purpose image URL extraction, SearchHive's ScrapeForge handles lazy loading, JavaScript rendering, and structured data extraction in one call. The Builder plan at $49/100K credits gives you plenty of room for image-heavy scraping tasks.
For scraping images from sites with strong bot protection, ZenRows is the most reliable option. For site-wide image crawling, Apify provides the best crawl infrastructure.
Pair any of these with imgproxy for post-processing, and you have a complete image scraping pipeline.
Start with 500 free SearchHive credits — JS rendering and lazy-load resolution are included from the start.
/blog/best-web-scraping-apis-with-javascript-support /blog/residential-proxy-apis-compared-for-web-scraping