Complete Guide to API Pagination Design -- Patterns, Pitfalls, and Best Practices

API pagination is one of those things every developer encounters but few get right on the first try. Return too many results at once and your API is slow and unreliable. Return too few and clients make excessive requests. Get the pagination scheme wrong and you introduce duplicate results, missed data, or security vulnerabilities.

This guide covers the five main pagination patterns, when to use each, and how to implement them correctly -- with practical examples using SearchHive APIs.

Background

Most APIs return collections of data: search results, user records, product listings, log entries. Returning all results in a single response works for small datasets, but breaks down quickly:

Memory pressure: 100K search results in one response can consume hundreds of MB
Latency: Serializing and transmitting large payloads is slow
Timeouts: Clients time out waiting for responses that take too long
Failures: A network error mid-transfer means re-fetching everything

Pagination solves this by splitting large result sets into smaller "pages" that clients can request individually. The challenge is designing the pagination scheme so that it is efficient, consistent, and easy to use.

Key Takeaways

Five main patterns: Offset, cursor, keyset, time-based, and token-based pagination
Cursor pagination is best for real-time data and large datasets
Offset pagination is simplest but has performance and consistency problems at scale
SearchHive uses offset pagination with a page parameter and consistent 10-item pages
Common pitfalls: skipping items, returning duplicates, and inconsistent page sizes

Challenge: Choosing the Right Pagination Pattern

A SearchHive user was building a competitive intelligence dashboard that needed to fetch all 500+ blog posts from the API and sync them to a local database. Their first implementation used offset-based pagination but kept getting inconsistent results -- some posts appeared on multiple pages, others were missing entirely.

The root cause: new posts were being published between pagination requests, shifting the offset positions. This is the classic "moving target" problem with offset pagination on mutable datasets.

Solution: Pagination Patterns Explained

1. Offset Pagination (Search and scrape APIs)

The simplest pattern. The client specifies a starting offset and a page size. Total results are known upfront.

GET /api/posts?page=1&limit=20
GET /api/posts?page=2&limit=20

Pros: Simple to implement, supports random access (jump to page 50), total count available.

Cons: Inconsistent results when data changes between requests. Performance degrades on large offsets (MySQL has to scan and skip all previous rows).

SearchHive uses this pattern for its blog API. Here is how to paginate through all posts correctly:

import requests

BASE = "https://searchhive.dev/api/blog"
all_posts = []
page = 1
per_page = 100

while True:
    resp = requests.get(f"{BASE}/posts", params={"limit": per_page, "page": page})
    data = resp.json()
    posts = data.get("posts", [])
    
    if not posts:
        break
    
    all_posts.extend(posts)
    
    # Check if we have reached the last page
    total = data.get("total", 0)
    if len(all_posts) >= total:
        break
    
    page += 1

print(f"Fetched {len(all_posts)} posts across {page} pages")

When to use: Small to medium datasets, read-mostly data, when you need random page access.

2. Cursor-Based Pagination

Instead of offsets, use an opaque cursor that points to the last item fetched. The next page starts after that cursor.

GET /api/posts?cursor=abc123&limit=20
Response: {"data": [...], "next_cursor": "def456"}
GET /api/posts?cursor=def456&limit=20

Pros: Consistent results even when data is added or removed. Constant-time performance regardless of position. No "moving target" problem.

Cons: Cannot jump to a specific page. Cursors may expire.

import requests

API_KEY = "your_searchhive_key"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Example: paginating through search results with a hypothetical cursor API
cursor = None
all_results = []

while True:
    params = {"query": "web scraping tools", "limit": 20}
    if cursor:
        params["cursor"] = cursor
    
    resp = requests.post(
        "https://api.searchhive.dev/v1/search",
        headers=headers,
        json=params
    )
    data = resp.json().get("data", {})
    results = data.get("organic", [])
    
    if not results:
        break
    
    all_results.extend(results)
    cursor = data.get("next_cursor")
    
    if not cursor:
        break

print(f"Total results: {len(all_results)}")

When to use: Real-time data, infinite scrolling, large datasets, streaming APIs.

3. Keyset (Seek) Pagination

Filter by the value of a sortable column (usually an ID or timestamp) instead of an offset.

GET /api/posts?after_id=1234&limit=20
GET /api/posts?after_id=5678&limit=20

Pros: Consistent results, fast queries (uses index), no counting queries.

Cons: Only works with sorted data, cannot go backwards easily.

# Keyset pagination for a hypothetical posts API
last_id = 0
all_posts = []

while True:
    resp = requests.get(
        f"{BASE}/posts",
        params={"after_id": last_id, "limit": 20}
    )
    posts = resp.json().get("posts", [])
    
    if not posts:
        break
    
    all_posts.extend(posts)
    last_id = posts[-1]["id"]  # Use the last item's ID as the keyset

When to use: Chronological feeds, log data, append-only datasets.

4. Time-Based Pagination

Use timestamps as pagination boundaries. Common for time-series data and activity feeds.

GET /api/logs?before=2025-01-15T10:00:00Z&limit=100

Pros: Natural for time-ordered data, supports date range queries.

Cons: Multiple items with the same timestamp cause gaps or duplicates.

5. Token-Based (Link Header) Pagination

The server returns a Link header with URLs for the next, previous, first, and last pages. GitHub API uses this pattern.

Link: <https://api.github.com/repos?page=2>; rel="next",
      <https://api.github.com/repos?page=10>; rel="last"

Pros: Server controls pagination completely. Client just follows links.

Cons: Less transparent, harder to cache, client cannot predict page structure.

Implementation: Robust Pagination Client

Here is a reusable pagination client that handles both offset and cursor patterns with retry logic:

import requests
import time

class PaginatedAPI:
    """Reusable pagination client for REST APIs."""

    def __init__(self, base_url, headers=None, max_retries=3):
        self.base_url = base_url
        self.headers = headers or {}
        self.max_retries = max_retries

    def paginate_offset(self, endpoint, per_page=100, max_items=None):
        """Paginate using offset/page numbers."""
        all_items = []
        page = 1

        while True:
            for attempt in range(self.max_retries):
                try:
                    resp = requests.get(
                        f"{self.base_url}{endpoint}",
                        params={"page": page, "limit": per_page},
                        headers=self.headers,
                        timeout=30
                    )
                    resp.raise_for_status()
                    break
                except requests.exceptions.RequestException:
                    if attempt == self.max_retries - 1:
                        raise
                    time.sleep(2 ** attempt)
            else:
                continue

            data = resp.json()
            items = data.get("posts", data.get("data", data.get("results", [])))

            if not items:
                break

            all_items.extend(items)
            if max_items and len(all_items) >= max_items:
                all_items = all_items[:max_items]
                break

            total = data.get("total", data.get("count", None))
            if total and len(all_items) >= total:
                break

            page += 1

        return all_items

    def paginate_cursor(self, endpoint, per_page=100, cursor_key="cursor"):
        """Paginate using cursor-based approach."""
        all_items = []
        cursor = None

        while True:
            params = {"limit": per_page}
            if cursor:
                params[cursor_key] = cursor

            resp = requests.get(
                f"{self.base_url}{endpoint}",
                params=params,
                headers=self.headers,
                timeout=30
            )
            resp.raise_for_status()

            data = resp.json()
            items = data.get("data", [])
            if not items:
                break

            all_items.extend(items)
            cursor = data.get("next_cursor")
            if not cursor:
                break

        return all_items


# Usage: Fetch all SearchHive blog posts
api = PaginatedAPI("https://searchhive.dev/api")
posts = api.paginate_offset("/blog/posts", per_page=100)
print(f"Total posts: {len(posts)}")

Results

Implementing robust pagination solved the original problem. The competitive intelligence dashboard now reliably syncs all posts without duplicates or gaps. Key improvements:

Deduplication: Track post IDs in a set and skip duplicates from overlapping pages
Rate limiting: Add delays between page requests to avoid throttling
Resumability: Store the last page/offset so interrupted syncs can resume

Lessons Learned

Always check the total count. Some APIs return a total field. Use it to know when you are done instead of relying on empty pages, which can be ambiguous.

Handle inconsistent page sizes. Not all APIs return exactly limit items per page. A request for 20 items might return 15 on the last page -- that is not an error, that is the end.

Deduplicate results. Even with offset pagination, mutable datasets can produce duplicates. Always deduplicate by ID on the client side.

Use page not offset for SearchHive. The SearchHive blog API uses a page parameter (1-indexed), not a raw offset. Using offset will not work correctly.

Rate limit your pagination. Fetching 100 pages in rapid succession will trigger rate limits on any API. Add a small delay (100-500ms) between requests.

Log your progress. When paginating through thousands of items, log progress so you can debug issues. At minimum, log the current page and total items fetched.

SearchHive makes pagination straightforward with consistent page sizes and clear total counts. Start building with the free tier -- 500 credits, no credit card required.

Complete Guide to API Pagination Design -- Patterns, Pitfalls, and Best Practices

AI-Powered Research

Background

Key Takeaways

Challenge: Choosing the Right Pagination Pattern

Solution: Pagination Patterns Explained

1. Offset Pagination (Search and scrape APIs)

2. Cursor-Based Pagination

3. Keyset (Seek) Pagination

4. Time-Based Pagination

5. Token-Based (Link Header) Pagination

Implementation: Robust Pagination Client

Results

Lessons Learned

Keywords

RELATED ARTICLES

Unstructured Data Extraction -- Common Questions Answered

Search API for AI -- Common Questions Answered

Complete Guide to AI Agent Web Access -- How to Give Your Agent the Internet

BUILD WITH SEARCHHIVE