How to Python SDK Design: A Step-by-Step Guide
A well-designed Python SDK turns a complex API into a developer-friendly experience. It abstracts authentication, request handling, error management, and response parsing into a clean interface that other developers actually want to use. A poorly designed one generates support tickets, bad reviews, and frustrated integrations.
This guide walks through the complete process of designing and building a production-quality Python SDK, using SearchHive's API as a real-world example throughout.
Key Takeaways
- Good SDK design follows Python conventions -- PEP 8, type hints, context managers, and async support
- Start with the API contract -- your SDK is a thin, opinionated wrapper around HTTP calls
- Error handling and retry logic separate production SDKs from toy scripts
- PyPI publishing with proper metadata, tests, and documentation is non-negotiable
- SearchHive's Python SDK demonstrates all these patterns with real API integration
Prerequisites
Before starting, you should be comfortable with:
- Python 3.9+ and pip
- HTTP APIs (REST, free JSON formatter)
- Basic package structure (
setup.pyorpyproject.toml) - Virtual environments
You'll also need:
- An API to wrap (we'll use SearchHive as our example)
- A PyPI account (for publishing)
- pytest (for testing)
Step 1: Define the API Contract
Before writing SDK code, document every endpoint your SDK will cover. List the HTTP method, path, parameters, response format, and error codes.
For SearchHive, the core endpoints are:
| Endpoint | Method | Purpose |
|---|---|---|
/v1/swiftsearch | POST | Web search with real-time results |
/v1/scrapeforge | POST | Extract structured data from URLs |
/v1/deepdive | POST | Full page content extraction |
Write this down as a reference. Your SDK implementation should map 1:1 to this contract.
Step 2: Set Up the Package Structure
A standard Python SDK package follows this structure:
searchhive-sdk/
pyproject.toml
src/
searchhive/
__init__.py
client.py
exceptions.py
resources/
__init__.py
search.py
scrape.py
deepdive.py
tests/
test_client.py
test_search.py
README.md
Create the pyproject.toml:
# pyproject.toml
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"
[project]
name = "searchhive"
version = "1.0.0"
description = "Python SDK for SearchHive web scraping and search APIs"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.9"
dependencies = [
"requests>=2.28.0",
"pydantic>=2.0.0",
]
[project.optional-dependencies]
dev = ["pytest>=7.0", "pytest-mock", "ruff"]
async = ["httpx>=0.24.0"]
[tool.ruff]
line-length = 100
target-version = "py39"
Step 3: Build the Client Class
The client is the main entry point. It manages authentication, HTTP sessions, and provides access to resource objects.
# src/searchhive/client.py
import requests
from typing import Optional
from .exceptions import SearchHiveError, RateLimitError
from .resources.search import SearchResource
from .resources.scrape import ScrapeResource
from .resources.deepdive import DeepDiveResource
DEFAULT_BASE_URL = "https://api.searchhive.dev/v1"
class SearchHiveClient:
def __init__(
self,
api_key: str,
base_url: str = DEFAULT_BASE_URL,
timeout: int = 30,
max_retries: int = 3,
):
if not api_key:
raise ValueError("API key is required")
self.api_key = api_key
self.base_url = base_url.rstrip("/")
self.timeout = timeout
self.max_retries = max_retries
self._session = requests.Session()
self._session.headers.update({
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": "searchhive-python/1.0.0",
})
# Resource accessors
self.search = SearchResource(self)
self.scrape = ScrapeResource(self)
self.deepdive = DeepDiveResource(self)
def _request(self, method: str, path: str, **kwargs) -> dict:
url = f"{self.base_url}/{path.lstrip('/')}"
kwargs.setdefault("timeout", self.timeout)
for attempt in range(self.max_retries):
try:
response = self._session.request(method, url, **kwargs)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 429:
# Rate limited -- wait and retry
retry_after = int(response.headers.get("Retry-After", "5"))
raise RateLimitError(
f"Rate limited. Retry after {retry_after}s.",
retry_after=retry_after,
)
if response.status_code >= 500 and attempt < self.max_retries - 1:
continue # Retry server errors
raise SearchHiveError(
f"HTTP {response.status_code}: {response.text}",
status_code=response.status_code,
)
except requests.exceptions.ConnectionError as e:
if attempt < self.max_retries - 1:
continue
raise SearchHiveError(f"Connection error: {e}")
def close(self):
self._session.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
Key design decisions:
- Context manager support (
__enter__/__exit__) ensures proper resource cleanup - Session reuse via
requests.Session()for connection pooling - Retry logic with exponential backoff for transient errors
- Resource accessor pattern --
client.search.query(...)instead ofclient.query_search(...)
Step 4: Create Resource Classes
Each API domain gets its own resource class. This keeps the client clean and lets users discover functionality through autocomplete.
# src/searchhive/resources/search.py
from typing import Optional
class SearchResource:
def __init__(self, client):
self._client = client
def query(
self,
query: str,
limit: int = 10,
recency: Optional[str] = None,
) -> dict:
"""Search the web using SwiftSearch.
Args:
query: The search query string.
limit: Maximum number of results to return (default: 10).
recency: Time filter -- 'day', 'week', 'month', or None for any time.
Returns:
dict with 'results' list containing search results.
"""
payload = {"query": query, "limit": limit}
if recency:
payload["recency"] = recency
return self._client._request("POST", "/swiftsearch", json=payload)
# src/searchhive/resources/scrape.py
from typing import Optional
class ScrapeResource:
def __init__(self, client):
self._client = client
def extract(
self,
url: str,
format: str = "json",
timeout: Optional[int] = None,
) -> dict:
"""Extract structured data from a URL using ScrapeForge.
Args:
url: The URL to scrape.
format: Output format -- 'json', 'markdown', 'html', or 'raw'.
timeout: Custom timeout in seconds for slow pages.
Returns:
dict with extracted data.
"""
payload = {"url": url, "format": format}
kwargs = {}
if timeout:
kwargs["timeout"] = timeout
return self._client._request("POST", "/scrapeforge", json=payload, **kwargs)
# src/searchhive/resources/deepdive.py
class DeepDiveResource:
def __init__(self, client):
self._client = client
def extract(
self,
url: str,
extract: str = "full",
) -> dict:
"""Deep extract full page content using DeepDive.
Args:
url: The URL to extract content from.
extract: Extraction mode -- 'full', 'article', 'text', or 'schema'.
Returns:
dict with full page content and metadata.
"""
payload = {"url": url, "extract": extract}
return self._client._request("POST", "/deepdive", json=payload)
Step 5: Implement Proper Error Handling
Custom exception classes give users clear, actionable error information:
# src/searchhive/exceptions.py
class SearchHiveError(Exception):
"""Base exception for all SearchHive SDK errors."""
def __init__(self, message: str, status_code: int = None):
super().__init__(message)
self.status_code = status_code
class AuthenticationError(SearchHiveError):
"""Raised when API key is invalid or missing."""
pass
class RateLimitError(SearchHiveError):
"""Raised when rate limit is exceeded."""
def __init__(self, message: str, retry_after: int = None):
super().__init__(message)
self.retry_after = retry_after
class InvalidRequestError(SearchHiveError):
"""Raised when request parameters are invalid."""
pass
class ServerError(SearchHiveError):
"""Raised when the API returns a 5xx error."""
pass
Map HTTP status codes reference codes to exception types in the client:
# Add to _request method
STATUS_EXCEPTIONS = {
401: AuthenticationError,
403: AuthenticationError,
429: RateLimitError,
400: InvalidRequestError,
}
# In the error handling block:
exc_class = STATUS_EXCEPTIONS.get(response.status_code, SearchHiveError)
if response.status_code >= 500:
exc_class = ServerError
raise exc_class(f"HTTP {response.status_code}: {response.text}", status_code=response.status_code)
Step 6: Add Type Hints and Response Models
Pydantic models give users autocompletion, validation, and documentation:
# src/searchhive/models.py
from pydantic import BaseModel, HttpUrl
from typing import Optional
class SearchResult(BaseModel):
title: str
url: str
snippet: str
source: Optional[str] = None
class SearchResponse(BaseModel):
results: list[SearchResult]
total: int
class ScrapeResponse(BaseModel):
url: str
title: Optional[str] = None
content: dict
format: str
Step 7: Write Tests
Testing an SDK means testing against mocked HTTP responses, not the real API:
# tests/test_client.py
import pytest
from unittest.mock import patch, MagicMock
from searchhive import SearchHiveClient
def test_search_query():
with SearchHiveClient(api_key="test-key") as client:
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": [
{"title": "Test", "url": "https://example.com", "snippet": "A test result"}
],
"total": 1,
}
mock_response.headers = {}
with patch.object(client._session, "request", return_value=mock_response):
result = client.search.query("test query")
assert len(result["results"]) == 1
assert result["results"][0]["title"] == "Test"
def test_invalid_api_key():
with pytest.raises(ValueError):
SearchHiveClient(api_key="")
Run tests with pytest tests/ -v.
Step 8: Publish to PyPI
Build and publish:
# Install build tools
pip install build twine
# Build the package
python -m build
# Upload to PyPI (use --repository testpypi for testing first)
twine upload dist/*
Set up PyPI tokens in ~/.pypirc or use TWINE_USERNAME=__token__ with a PyPI API token.
Common Issues and Solutions
Import path confusion. Make sure src/searchhive/__init__.py exports the client: from .client import SearchHiveClient.
SSL certificate errors in corporate environments. Add a verify parameter to the client that defaults to True but can be set to a custom cert bundle path.
Timeout handling. Different operations need different timeouts. A simple search should timeout in 10 seconds, but a deep scrape might need 60. Let users override per-request.
Version compatibility. Pin minimum versions in dependencies, test against the oldest supported Python version, and use pyupgrade to catch deprecated syntax.
Next Steps
Once your SDK is published and stable, consider adding:
- Async support using
httpx.AsyncClientfor concurrent API calls - Pagination helpers that automatically page through large result sets
- CLI interface using
clickortyperfor quick testing and debugging - Response caching with configurable TTL for expensive API calls
- Webhook integration for push-based data delivery
The SearchHive Python SDK implements all of these patterns. Get your free API key at searchhive.dev -- 500 credits, no credit card required. Full SDK docs and examples at docs.searchhive.dev.
For more on building data extraction pipelines, check out our guide on data extraction techniques.