API pagination is one of those things every developer encounters but few get right on the first try. Return too many results at once and your API is slow and unreliable. Return too few and clients make excessive requests. Get the pagination scheme wrong and you introduce duplicate results, missed data, or security vulnerabilities.
This guide covers the five main pagination patterns, when to use each, and how to implement them correctly -- with practical examples using SearchHive APIs.
Background
Most APIs return collections of data: search results, user records, product listings, log entries. Returning all results in a single response works for small datasets, but breaks down quickly:
- Memory pressure: 100K search results in one response can consume hundreds of MB
- Latency: Serializing and transmitting large payloads is slow
- Timeouts: Clients time out waiting for responses that take too long
- Failures: A network error mid-transfer means re-fetching everything
Pagination solves this by splitting large result sets into smaller "pages" that clients can request individually. The challenge is designing the pagination scheme so that it is efficient, consistent, and easy to use.
Key Takeaways
- Five main patterns: Offset, cursor, keyset, time-based, and token-based pagination
- Cursor pagination is best for real-time data and large datasets
- Offset pagination is simplest but has performance and consistency problems at scale
- SearchHive uses offset pagination with a
pageparameter and consistent 10-item pages - Common pitfalls: skipping items, returning duplicates, and inconsistent page sizes
Challenge: Choosing the Right Pagination Pattern
A SearchHive user was building a competitive intelligence dashboard that needed to fetch all 500+ blog posts from the API and sync them to a local database. Their first implementation used offset-based pagination but kept getting inconsistent results -- some posts appeared on multiple pages, others were missing entirely.
The root cause: new posts were being published between pagination requests, shifting the offset positions. This is the classic "moving target" problem with offset pagination on mutable datasets.
Solution: Pagination Patterns Explained
1. Offset Pagination (Search and scrape APIs)
The simplest pattern. The client specifies a starting offset and a page size. Total results are known upfront.
GET /api/posts?page=1&limit=20
GET /api/posts?page=2&limit=20
Pros: Simple to implement, supports random access (jump to page 50), total count available.
Cons: Inconsistent results when data changes between requests. Performance degrades on large offsets (MySQL has to scan and skip all previous rows).
SearchHive uses this pattern for its blog API. Here is how to paginate through all posts correctly:
import requests
BASE = "https://searchhive.dev/api/blog"
all_posts = []
page = 1
per_page = 100
while True:
resp = requests.get(f"{BASE}/posts", params={"limit": per_page, "page": page})
data = resp.json()
posts = data.get("posts", [])
if not posts:
break
all_posts.extend(posts)
# Check if we have reached the last page
total = data.get("total", 0)
if len(all_posts) >= total:
break
page += 1
print(f"Fetched {len(all_posts)} posts across {page} pages")
When to use: Small to medium datasets, read-mostly data, when you need random page access.
2. Cursor-Based Pagination
Instead of offsets, use an opaque cursor that points to the last item fetched. The next page starts after that cursor.
GET /api/posts?cursor=abc123&limit=20
Response: {"data": [...], "next_cursor": "def456"}
GET /api/posts?cursor=def456&limit=20
Pros: Consistent results even when data is added or removed. Constant-time performance regardless of position. No "moving target" problem.
Cons: Cannot jump to a specific page. Cursors may expire.
import requests
API_KEY = "your_searchhive_key"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Example: paginating through search results with a hypothetical cursor API
cursor = None
all_results = []
while True:
params = {"query": "web scraping tools", "limit": 20}
if cursor:
params["cursor"] = cursor
resp = requests.post(
"https://api.searchhive.dev/v1/search",
headers=headers,
json=params
)
data = resp.json().get("data", {})
results = data.get("organic", [])
if not results:
break
all_results.extend(results)
cursor = data.get("next_cursor")
if not cursor:
break
print(f"Total results: {len(all_results)}")
When to use: Real-time data, infinite scrolling, large datasets, streaming APIs.
3. Keyset (Seek) Pagination
Filter by the value of a sortable column (usually an ID or timestamp) instead of an offset.
GET /api/posts?after_id=1234&limit=20
GET /api/posts?after_id=5678&limit=20
Pros: Consistent results, fast queries (uses index), no counting queries.
Cons: Only works with sorted data, cannot go backwards easily.
# Keyset pagination for a hypothetical posts API
last_id = 0
all_posts = []
while True:
resp = requests.get(
f"{BASE}/posts",
params={"after_id": last_id, "limit": 20}
)
posts = resp.json().get("posts", [])
if not posts:
break
all_posts.extend(posts)
last_id = posts[-1]["id"] # Use the last item's ID as the keyset
When to use: Chronological feeds, log data, append-only datasets.
4. Time-Based Pagination
Use timestamps as pagination boundaries. Common for time-series data and activity feeds.
GET /api/logs?before=2025-01-15T10:00:00Z&limit=100
Pros: Natural for time-ordered data, supports date range queries.
Cons: Multiple items with the same timestamp cause gaps or duplicates.
5. Token-Based (Link Header) Pagination
The server returns a Link header with URLs for the next, previous, first, and last pages. GitHub API uses this pattern.
Link: <https://api.github.com/repos?page=2>; rel="next",
<https://api.github.com/repos?page=10>; rel="last"
Pros: Server controls pagination completely. Client just follows links.
Cons: Less transparent, harder to cache, client cannot predict page structure.
Implementation: Robust Pagination Client
Here is a reusable pagination client that handles both offset and cursor patterns with retry logic:
import requests
import time
class PaginatedAPI:
"""Reusable pagination client for REST APIs."""
def __init__(self, base_url, headers=None, max_retries=3):
self.base_url = base_url
self.headers = headers or {}
self.max_retries = max_retries
def paginate_offset(self, endpoint, per_page=100, max_items=None):
"""Paginate using offset/page numbers."""
all_items = []
page = 1
while True:
for attempt in range(self.max_retries):
try:
resp = requests.get(
f"{self.base_url}{endpoint}",
params={"page": page, "limit": per_page},
headers=self.headers,
timeout=30
)
resp.raise_for_status()
break
except requests.exceptions.RequestException:
if attempt == self.max_retries - 1:
raise
time.sleep(2 ** attempt)
else:
continue
data = resp.json()
items = data.get("posts", data.get("data", data.get("results", [])))
if not items:
break
all_items.extend(items)
if max_items and len(all_items) >= max_items:
all_items = all_items[:max_items]
break
total = data.get("total", data.get("count", None))
if total and len(all_items) >= total:
break
page += 1
return all_items
def paginate_cursor(self, endpoint, per_page=100, cursor_key="cursor"):
"""Paginate using cursor-based approach."""
all_items = []
cursor = None
while True:
params = {"limit": per_page}
if cursor:
params[cursor_key] = cursor
resp = requests.get(
f"{self.base_url}{endpoint}",
params=params,
headers=self.headers,
timeout=30
)
resp.raise_for_status()
data = resp.json()
items = data.get("data", [])
if not items:
break
all_items.extend(items)
cursor = data.get("next_cursor")
if not cursor:
break
return all_items
# Usage: Fetch all SearchHive blog posts
api = PaginatedAPI("https://searchhive.dev/api")
posts = api.paginate_offset("/blog/posts", per_page=100)
print(f"Total posts: {len(posts)}")
Results
Implementing robust pagination solved the original problem. The competitive intelligence dashboard now reliably syncs all posts without duplicates or gaps. Key improvements:
- Deduplication: Track post IDs in a set and skip duplicates from overlapping pages
- Rate limiting: Add delays between page requests to avoid throttling
- Resumability: Store the last page/offset so interrupted syncs can resume
Lessons Learned
Always check the total count. Some APIs return a total field. Use it to know when you are done instead of relying on empty pages, which can be ambiguous.
Handle inconsistent page sizes. Not all APIs return exactly limit items per page. A request for 20 items might return 15 on the last page -- that is not an error, that is the end.
Deduplicate results. Even with offset pagination, mutable datasets can produce duplicates. Always deduplicate by ID on the client side.
Use page not offset for SearchHive. The SearchHive blog API uses a page parameter (1-indexed), not a raw offset. Using offset will not work correctly.
Rate limit your pagination. Fetching 100 pages in rapid succession will trigger rate limits on any API. Add a small delay (100-500ms) between requests.
Log your progress. When paginating through thousands of items, log progress so you can debug issues. At minimum, log the current page and total items fetched.
SearchHive makes pagination straightforward with consistent page sizes and clear total counts. Start building with the free tier -- 500 credits, no credit card required.
Related: /blog/search-api-for-ai-common-questions-answered | /blog/unstructured-data-extraction-common-questions-answered | /compare/serpapi