Supabase Edge Functions for Web Scraping: Complete Serverless Guide

Supabase Edge Functions run Deno TypeScript at the edge, giving you serverless compute with access to the broader Supabase ecosystem (PostgreSQL, Auth, Storage). For web scraping, Edge Functions offer a compelling serverless option -- but like all edge platforms, they have real limitations for production scraping workloads. This guide covers what works, what does not, and how to combine Supabase with a dedicated scraping API for the best results.

Key Takeaways

Supabase Edge Functions run Deno TypeScript with a 50MB memory limit and no headless browser
They work well for API-based data fetching and orchestrating calls to scraping APIs
For JS-rendered sites, proxy rotation, and CAPTCHA handling, pair Edge Functions with SearchHive ScrapeForge
The Supabase + SearchHive combo gives you serverless compute + production-grade scraping

Prerequisites

A Supabase project (free tier works)
Supabase CLI installed: npm install -g supabase
Deno basics (TypeScript runs natively)

Step 1: Set Up Your Edge Function

Initialize Supabase locally and create a new edge function:

supabase init
supabase functions new web-scraper

This creates supabase/functions/web-scraper/index.ts.

Step 2: Basic Page Fetching

The simplest scraping pattern -- fetch a URL and extract data:

// supabase/functions/web-scraper/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts"

const corsHeaders = {
  "Access-Control-Allow-Origin": "*",
  "Access-Control-Allow-Headers":
    "authorization, x-client-info, apikey, content-type",
}

serve(async (req) => {
  if (req.method === "OPTIONS") {
    return new Response("ok", { headers: corsHeaders })
  }

  try {
    const { url } = await req.json()

    if (!url) {
      return new Response(
        JSON.stringify({ error: "Missing url parameter" }),
        { status: 400, headers: { ...corsHeaders, "Content-Type": "application/json" } }
      )
    }

    const response = await fetch(url, {
      headers: {
        "User-Agent":
          "Mozilla/5.0 (compatible; MyScraper/1.0)",
        "Accept": "text/html,application/xhtml+xml",
      },
    })

    const html = await response.text()

    // Extract title
    const titleMatch = html.match(/<title>(.*?)<\/title>/s)
    const title = titleMatch ? titleMatch[1].trim() : "No title"

    // Strip HTML tags for text content
    const text = html
      .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, "")
      .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, "")
      .replace(/<[^>]+>/g, " ")
      .replace(/\s+/g, " ")
      .trim()
      .slice(0, 5000)

    return new Response(
      JSON.stringify({ url, title, content: text }),
      { headers: { ...corsHeaders, "Content-Type": "application/json" } }
    )
  } catch (err) {
    return new Response(
      JSON.stringify({ error: err.message }),
      { status: 500, headers: { ...corsHeaders, "Content-Type": "application/json" } }
    )
  }
})

Deploy with:

supabase functions deploy web-scraper

Step 3: The JavaScript Rendering Problem

Supabase Edge Functions run Deno -- they have a V8 engine but no DOM, no headless browser, and no document object. This means:

Static HTML pages: works fine
Server-side rendered pages (SSR): works fine
Client-side rendered pages (React, Vue, Angular SPAs): returns empty shells
Pages that load data via AJAX after initial load: misses the data

Deno does have a Puppeteer port (puppeteer for Deno via https://deno.land/x/puppeteer), but Supabase Edge Functions have a 50MB memory limit and no GPU access, making headless browser use impractical if not impossible.

Step 4: Using SearchHive from Edge Functions

The practical solution: use your Edge Function to call SearchHive's APIs, which handle JS rendering, proxy rotation, and anti-bot bypass on their infrastructure:

// supabase/functions/scrape-orchestrate/index.ts
import { serve } from "https://deno.land/std@0.168.0/http/server.ts"

const SEARCHHIVE_KEY = Deno.env.get("SEARCHHIVE_API_KEY") ?? ""

const corsHeaders = {
  "Access-Control-Allow-Origin": "*",
  "Access-Control-Allow-Headers":
    "authorization, x-client-info, apikey, content-type",
}

serve(async (req) => {
  if (req.method === "OPTIONS") {
    return new Response("ok", { headers: corsHeaders })
  }

  try {
    const { action, url, query } = await req.json()

    if (action === "search") {
      // SwiftSearch - web search
      const resp = await fetch(
        "https://api.searchhive.dev/v1/swiftsearch",
        {
          method: "POST",
          headers: {
            Authorization: `Bearer ${SEARCHHIVE_KEY}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({
            query: query,
            num_results: 5,
          }),
        }
      )
      const data = await resp.json()
      return new Response(
        JSON.stringify(data),
        { headers: { ...corsHeaders, "Content-Type": "application/json" } }
      )
    }

    if (action === "scrape") {
      // ScrapeForge - JS-rendered scraping
      const resp = await fetch(
        "https://api.searchhive.dev/v1/scrapeforge",
        {
          method: "POST",
          headers: {
            Authorization: `Bearer ${SEARCHHIVE_KEY}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({
            url: url,
            render_js: true,
            format: "markdown",
          }),
        }
      )
      const data = await resp.json()
      return new Response(
        JSON.stringify(data),
        { headers: { ...corsHeaders, "Content-Type": "application/json" } }
      )
    }

    if (action === "research") {
      // DeepDive - full page extraction
      const resp = await fetch(
        "https://api.searchhive.dev/v1/deepdive",
        {
          method: "POST",
          headers: {
            Authorization: `Bearer ${SEARCHHIVE_KEY}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({
            url: url,
            extract_text: true,
          }),
        }
      )
      const data = await resp.json()
      return new Response(
        JSON.stringify(data),
        { headers: { ...corsHeaders, "Content-Type": "application/json" } }
      )
    }

    return new Response(
      JSON.stringify({ error: "Unknown action. Use: search, scrape, or research" }),
      { status: 400, headers: { ...corsHeaders, "Content-Type": "application/json" } }
    )
  } catch (err) {
    return new Response(
      JSON.stringify({ error: err.message }),
      { status: 500, headers: { ...corsHeaders, "Content-Type": "application/json" } }
    )
  }
})

Step 5: Store Results in Supabase PostgreSQL

One of Supabase's strengths is built-in PostgreSQL. Store scraped data directly:

// After scraping, save to Supabase
const SUPABASE_URL = Deno.env.get("SUPABASE_URL") ?? ""
const SUPABASE_KEY = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY") ?? ""

async function saveToDb(data: { title: string; content: string; url: string }) {
  await fetch(`${SUPABASE_URL}/rest/v1/scraped_pages`, {
    method: "POST",
    headers: {
      "apikey": SUPABASE_KEY,
      "Authorization": `Bearer ${SUPABASE_KEY}`,
      "Content-Type": "application/json",
      "Prefer": "return=representation",
    },
    body: JSON.stringify(data),
  })
}

Create the table first in the Supabase dashboard:

CREATE TABLE scraped_pages (
  id BIGSERIAL PRIMARY KEY,
  url TEXT NOT NULL,
  title TEXT,
  content TEXT,
  scraped_at TIMESTAMPTZ DEFAULT NOW()
);

Step 6: Set Environment Secrets

Configure your API keys as Supabase secrets:

supabase secrets set SEARCHHIVE_API_KEY=your-key-here
supabase secrets set SUPABASE_SERVICE_ROLE_KEY=your-key-here

Supabase Edge Functions vs SearchHive for Scraping

Feature	Supabase Edge Functions	SearchHive ScrapeForge
Runtime	Deno TypeScript	Cloud infrastructure
JS Rendering	No	Yes (headless Chrome)
Memory limit	50MB	No practical limit
Execution timeout	150s	60s+ configurable
Proxy rotation	No	Built-in
CAPTCHA handling	No	Automatic
Storage	PostgreSQL (built-in)	API response only
Pricing	Free: 500K invocations	$49/mo for 100K credits
Auth	Supabase Auth (built-in)	API key

The combination is powerful: Supabase handles auth, storage, and orchestration. SearchHive handles the actual scraping. Neither replaces the other -- they complement each other.

Common Issues

CORS errors in browser calls: Edge Functions need explicit CORS headers. The corsHeaders object in the examples above handles this.

Timeout on slow pages: Edge Functions have a 150-second limit. For very slow sites, use ScrapeForge which handles timeouts gracefully.

Deno module compatibility: Not all npm packages work in Deno. Stick to web-standard APIs (fetch, Request, Response) and Deno-compatible imports.

Rate limits on free tier: Supabase free tier limits Edge Function invocations to 500K/month. For high-volume scraping, upgrade to Pro ($25/mo for unlimited invocations).

IP blocking: Supabase Edge Functions run on shared infrastructure. Target sites may block Supabase's IP ranges. ScrapeForge's rotating proxies avoid this.

Next Steps

Get a free SearchHive API key (500 credits, no credit card)
Read the ScrapeForge documentation for JS rendering and proxy options
Check our Cloudflare Workers scraping guide for another edge computing approach
See the OpenAI function calling comparison for AI-powered scraping workflows

Get Started with SearchHive

SearchHive provides search, scraping, and deep research through a unified API. The free tier gives you 500 credits to test everything. At scale, the Builder plan ($49/mo) delivers 100K credits with JS rendering, proxy rotation, and CAPTCHA handling built in.

Supabase Edge Functions for Web Scraping: Complete Serverless Guide

AI-Powered Research