Supabase Edge Functions for Web Scraping: Complete Serverless Guide
Supabase Edge Functions run Deno TypeScript at the edge, giving you serverless compute with access to the broader Supabase ecosystem (PostgreSQL, Auth, Storage). For web scraping, Edge Functions offer a compelling serverless option -- but like all edge platforms, they have real limitations for production scraping workloads. This guide covers what works, what does not, and how to combine Supabase with a dedicated scraping API for the best results.
Key Takeaways
- Supabase Edge Functions run Deno TypeScript with a 50MB memory limit and no headless browser
- They work well for API-based data fetching and orchestrating calls to scraping APIs
- For JS-rendered sites, proxy rotation, and CAPTCHA handling, pair Edge Functions with SearchHive ScrapeForge
- The Supabase + SearchHive combo gives you serverless compute + production-grade scraping
Prerequisites
- A Supabase project (free tier works)
- Supabase CLI installed:
npm install -g supabase - Deno basics (TypeScript runs natively)
Step 1: Set Up Your Edge Function
Initialize Supabase locally and create a new edge function:
supabase init
supabase functions new web-scraper
This creates supabase/functions/web-scraper/index.ts.
Step 2: Basic Page Fetching
The simplest scraping pattern -- fetch a URL and extract data:
// supabase/functions/web-scraper/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts"
const corsHeaders = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers":
"authorization, x-client-info, apikey, content-type",
}
serve(async (req) => {
if (req.method === "OPTIONS") {
return new Response("ok", { headers: corsHeaders })
}
try {
const { url } = await req.json()
if (!url) {
return new Response(
JSON.stringify({ error: "Missing url parameter" }),
{ status: 400, headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
const response = await fetch(url, {
headers: {
"User-Agent":
"Mozilla/5.0 (compatible; MyScraper/1.0)",
"Accept": "text/html,application/xhtml+xml",
},
})
const html = await response.text()
// Extract title
const titleMatch = html.match(/<title>(.*?)<\/title>/s)
const title = titleMatch ? titleMatch[1].trim() : "No title"
// Strip HTML tags for text content
const text = html
.replace(/<script[^>]*>[\s\S]*?<\/script>/gi, "")
.replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")
.replace(/<[^>]+>/g, " ")
.replace(/\s+/g, " ")
.trim()
.slice(0, 5000)
return new Response(
JSON.stringify({ url, title, content: text }),
{ headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
} catch (err) {
return new Response(
JSON.stringify({ error: err.message }),
{ status: 500, headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
})
Deploy with:
supabase functions deploy web-scraper
Step 3: The JavaScript Rendering Problem
Supabase Edge Functions run Deno -- they have a V8 engine but no DOM, no headless browser, and no document object. This means:
- Static HTML pages: works fine
- Server-side rendered pages (SSR): works fine
- Client-side rendered pages (React, Vue, Angular SPAs): returns empty shells
- Pages that load data via AJAX after initial load: misses the data
Deno does have a Puppeteer port (puppeteer for Deno via https://deno.land/x/puppeteer), but Supabase Edge Functions have a 50MB memory limit and no GPU access, making headless browser use impractical if not impossible.
Step 4: Using SearchHive from Edge Functions
The practical solution: use your Edge Function to call SearchHive's APIs, which handle JS rendering, proxy rotation, and anti-bot bypass on their infrastructure:
// supabase/functions/scrape-orchestrate/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts"
const SEARCHHIVE_KEY = Deno.env.get("SEARCHHIVE_API_KEY") ?? ""
const corsHeaders = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers":
"authorization, x-client-info, apikey, content-type",
}
serve(async (req) => {
if (req.method === "OPTIONS") {
return new Response("ok", { headers: corsHeaders })
}
try {
const { action, url, query } = await req.json()
if (action === "search") {
// SwiftSearch - web search
const resp = await fetch(
"https://api.searchhive.dev/v1/swiftsearch",
{
method: "POST",
headers: {
Authorization: `Bearer ${SEARCHHIVE_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
query: query,
num_results: 5,
}),
}
)
const data = await resp.json()
return new Response(
JSON.stringify(data),
{ headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
if (action === "scrape") {
// ScrapeForge - JS-rendered scraping
const resp = await fetch(
"https://api.searchhive.dev/v1/scrapeforge",
{
method: "POST",
headers: {
Authorization: `Bearer ${SEARCHHIVE_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
url: url,
render_js: true,
format: "markdown",
}),
}
)
const data = await resp.json()
return new Response(
JSON.stringify(data),
{ headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
if (action === "research") {
// DeepDive - full page extraction
const resp = await fetch(
"https://api.searchhive.dev/v1/deepdive",
{
method: "POST",
headers: {
Authorization: `Bearer ${SEARCHHIVE_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
url: url,
extract_text: true,
}),
}
)
const data = await resp.json()
return new Response(
JSON.stringify(data),
{ headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
return new Response(
JSON.stringify({ error: "Unknown action. Use: search, scrape, or research" }),
{ status: 400, headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
} catch (err) {
return new Response(
JSON.stringify({ error: err.message }),
{ status: 500, headers: { ...corsHeaders, "Content-Type": "application/json" } }
)
}
})
Step 5: Store Results in Supabase PostgreSQL
One of Supabase's strengths is built-in PostgreSQL. Store scraped data directly:
// After scraping, save to Supabase
const SUPABASE_URL = Deno.env.get("SUPABASE_URL") ?? ""
const SUPABASE_KEY = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY") ?? ""
async function saveToDb(data: { title: string; content: string; url: string }) {
await fetch(`${SUPABASE_URL}/rest/v1/scraped_pages`, {
method: "POST",
headers: {
"apikey": SUPABASE_KEY,
"Authorization": `Bearer ${SUPABASE_KEY}`,
"Content-Type": "application/json",
"Prefer": "return=representation",
},
body: JSON.stringify(data),
})
}
Create the table first in the Supabase dashboard:
CREATE TABLE scraped_pages (
id BIGSERIAL PRIMARY KEY,
url TEXT NOT NULL,
title TEXT,
content TEXT,
scraped_at TIMESTAMPTZ DEFAULT NOW()
);
Step 6: Set Environment Secrets
Configure your API keys as Supabase secrets:
supabase secrets set SEARCHHIVE_API_KEY=your-key-here
supabase secrets set SUPABASE_SERVICE_ROLE_KEY=your-key-here
Supabase Edge Functions vs SearchHive for Scraping
| Feature | Supabase Edge Functions | SearchHive ScrapeForge |
|---|---|---|
| Runtime | Deno TypeScript | Cloud infrastructure |
| JS Rendering | No | Yes (headless Chrome) |
| Memory limit | 50MB | No practical limit |
| Execution timeout | 150s | 60s+ configurable |
| Proxy rotation | No | Built-in |
| CAPTCHA handling | No | Automatic |
| Storage | PostgreSQL (built-in) | API response only |
| Pricing | Free: 500K invocations | $49/mo for 100K credits |
| Auth | Supabase Auth (built-in) | API key |
The combination is powerful: Supabase handles auth, storage, and orchestration. SearchHive handles the actual scraping. Neither replaces the other -- they complement each other.
Common Issues
CORS errors in browser calls: Edge Functions need explicit CORS headers. The corsHeaders object in the examples above handles this.
Timeout on slow pages: Edge Functions have a 150-second limit. For very slow sites, use ScrapeForge which handles timeouts gracefully.
Deno module compatibility: Not all npm packages work in Deno. Stick to web-standard APIs (fetch, Request, Response) and Deno-compatible imports.
Rate limits on free tier: Supabase free tier limits Edge Function invocations to 500K/month. For high-volume scraping, upgrade to Pro ($25/mo for unlimited invocations).
IP blocking: Supabase Edge Functions run on shared infrastructure. Target sites may block Supabase's IP ranges. ScrapeForge's rotating proxies avoid this.
Next Steps
- Get a free SearchHive API key (500 credits, no credit card)
- Read the ScrapeForge documentation for JS rendering and proxy options
- Check our Cloudflare Workers scraping guide for another edge computing approach
- See the OpenAI function calling comparison for AI-powered scraping workflows
Get Started with SearchHive
SearchHive provides search, scraping, and deep research through a unified API. The free tier gives you 500 credits to test everything. At scale, the Builder plan ($49/mo) delivers 100K credits with JS rendering, proxy rotation, and CAPTCHA handling built in.
Sign up free and start building serverless scraping pipelines today.