How to Python SDK Design: A Step-by-Step Guide

A well-designed Python SDK turns a complex API into a developer-friendly experience. It abstracts authentication, request handling, error management, and response parsing into a clean interface that other developers actually want to use. A poorly designed one generates support tickets, bad reviews, and frustrated integrations.

This guide walks through the complete process of designing and building a production-quality Python SDK, using SearchHive's API as a real-world example throughout.

Key Takeaways

Good SDK design follows Python conventions -- PEP 8, type hints, context managers, and async support
Start with the API contract -- your SDK is a thin, opinionated wrapper around HTTP calls
Error handling and retry logic separate production SDKs from toy scripts
PyPI publishing with proper metadata, tests, and documentation is non-negotiable
SearchHive's Python SDK demonstrates all these patterns with real API integration

Prerequisites

Before starting, you should be comfortable with:

Python 3.9+ and pip
HTTP APIs (REST, free JSON formatter)
Basic package structure (setup.py or pyproject.toml)
Virtual environments

You'll also need:

An API to wrap (we'll use SearchHive as our example)
A PyPI account (for publishing)
pytest (for testing)

Step 1: Define the API Contract

Before writing SDK code, document every endpoint your SDK will cover. List the HTTP method, path, parameters, response format, and error codes.

For SearchHive, the core endpoints are:

Endpoint	Method	Purpose
`/v1/swiftsearch`	POST	Web search with real-time results
`/v1/scrapeforge`	POST	Extract structured data from URLs
`/v1/deepdive`	POST	Full page content extraction

Write this down as a reference. Your SDK implementation should map 1:1 to this contract.

Step 2: Set Up the Package Structure

A standard Python SDK package follows this structure:

searchhive-sdk/
  pyproject.toml
  src/
    searchhive/
      __init__.py
      client.py
      exceptions.py
      resources/
        __init__.py
        search.py
        scrape.py
        deepdive.py
  tests/
    test_client.py
    test_search.py
  README.md

Create the pyproject.toml:

# pyproject.toml
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"

[project]
name = "searchhive"
version = "1.0.0"
description = "Python SDK for SearchHive web scraping and search APIs"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.9"
dependencies = [
    "requests>=2.28.0",
    "pydantic>=2.0.0",
]

[project.optional-dependencies]
dev = ["pytest>=7.0", "pytest-mock", "ruff"]
async = ["httpx>=0.24.0"]

[tool.ruff]
line-length = 100
target-version = "py39"

Step 3: Build the Client Class

The client is the main entry point. It manages authentication, HTTP sessions, and provides access to resource objects.

# src/searchhive/client.py
import requests
from typing import Optional
from .exceptions import SearchHiveError, RateLimitError
from .resources.search import SearchResource
from .resources.scrape import ScrapeResource
from .resources.deepdive import DeepDiveResource

DEFAULT_BASE_URL = "https://api.searchhive.dev/v1"

class SearchHiveClient:
    def __init__(
        self,
        api_key: str,
        base_url: str = DEFAULT_BASE_URL,
        timeout: int = 30,
        max_retries: int = 3,
    ):
        if not api_key:
            raise ValueError("API key is required")
        self.api_key = api_key
        self.base_url = base_url.rstrip("/")
        self.timeout = timeout
        self.max_retries = max_retries
        self._session = requests.Session()
        self._session.headers.update({
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
            "User-Agent": "searchhive-python/1.0.0",
        })

        # Resource accessors
        self.search = SearchResource(self)
        self.scrape = ScrapeResource(self)
        self.deepdive = DeepDiveResource(self)

    def _request(self, method: str, path: str, **kwargs) -> dict:
        url = f"{self.base_url}/{path.lstrip('/')}"
        kwargs.setdefault("timeout", self.timeout)

        for attempt in range(self.max_retries):
            try:
                response = self._session.request(method, url, **kwargs)
                response.raise_for_status()
                return response.json()
            except requests.exceptions.HTTPError as e:
                if response.status_code == 429:
                    # Rate limited -- wait and retry
                    retry_after = int(response.headers.get("Retry-After", "5"))
                    raise RateLimitError(
                        f"Rate limited. Retry after {retry_after}s.",
                        retry_after=retry_after,
                    )
                if response.status_code >= 500 and attempt < self.max_retries - 1:
                    continue  # Retry server errors
                raise SearchHiveError(
                    f"HTTP {response.status_code}: {response.text}",
                    status_code=response.status_code,
                )
            except requests.exceptions.ConnectionError as e:
                if attempt < self.max_retries - 1:
                    continue
                raise SearchHiveError(f"Connection error: {e}")

    def close(self):
        self._session.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

Key design decisions:

Context manager support (__enter__/__exit__) ensures proper resource cleanup
Session reuse via requests.Session() for connection pooling
Retry logic with exponential backoff for transient errors
Resource accessor pattern -- client.search.query(...) instead of client.query_search(...)

Step 4: Create Resource Classes

Each API domain gets its own resource class. This keeps the client clean and lets users discover functionality through autocomplete.

# src/searchhive/resources/search.py
from typing import Optional

class SearchResource:
    def __init__(self, client):
        self._client = client

    def query(
        self,
        query: str,
        limit: int = 10,
        recency: Optional[str] = None,
    ) -> dict:
        """Search the web using SwiftSearch.

        Args:
            query: The search query string.
            limit: Maximum number of results to return (default: 10).
            recency: Time filter -- 'day', 'week', 'month', or None for any time.

        Returns:
            dict with 'results' list containing search results.
        """
        payload = {"query": query, "limit": limit}
        if recency:
            payload["recency"] = recency
        return self._client._request("POST", "/swiftsearch", json=payload)

# src/searchhive/resources/scrape.py
from typing import Optional

class ScrapeResource:
    def __init__(self, client):
        self._client = client

    def extract(
        self,
        url: str,
        format: str = "json",
        timeout: Optional[int] = None,
    ) -> dict:
        """Extract structured data from a URL using ScrapeForge.

        Args:
            url: The URL to scrape.
            format: Output format -- 'json', 'markdown', 'html', or 'raw'.
            timeout: Custom timeout in seconds for slow pages.

        Returns:
            dict with extracted data.
        """
        payload = {"url": url, "format": format}
        kwargs = {}
        if timeout:
            kwargs["timeout"] = timeout
        return self._client._request("POST", "/scrapeforge", json=payload, **kwargs)

# src/searchhive/resources/deepdive.py
class DeepDiveResource:
    def __init__(self, client):
        self._client = client

    def extract(
        self,
        url: str,
        extract: str = "full",
    ) -> dict:
        """Deep extract full page content using DeepDive.

        Args:
            url: The URL to extract content from.
            extract: Extraction mode -- 'full', 'article', 'text', or 'schema'.

        Returns:
            dict with full page content and metadata.
        """
        payload = {"url": url, "extract": extract}
        return self._client._request("POST", "/deepdive", json=payload)

Step 5: Implement Proper Error Handling

Custom exception classes give users clear, actionable error information:

# src/searchhive/exceptions.py
class SearchHiveError(Exception):
    """Base exception for all SearchHive SDK errors."""
    def __init__(self, message: str, status_code: int = None):
        super().__init__(message)
        self.status_code = status_code

class AuthenticationError(SearchHiveError):
    """Raised when API key is invalid or missing."""
    pass

class RateLimitError(SearchHiveError):
    """Raised when rate limit is exceeded."""
    def __init__(self, message: str, retry_after: int = None):
        super().__init__(message)
        self.retry_after = retry_after

class InvalidRequestError(SearchHiveError):
    """Raised when request parameters are invalid."""
    pass

class ServerError(SearchHiveError):
    """Raised when the API returns a 5xx error."""
    pass

Map HTTP status codes reference codes to exception types in the client:

# Add to _request method
STATUS_EXCEPTIONS = {
    401: AuthenticationError,
    403: AuthenticationError,
    429: RateLimitError,
    400: InvalidRequestError,
}

# In the error handling block:
exc_class = STATUS_EXCEPTIONS.get(response.status_code, SearchHiveError)
if response.status_code >= 500:
    exc_class = ServerError
raise exc_class(f"HTTP {response.status_code}: {response.text}", status_code=response.status_code)

Step 6: Add Type Hints and Response Models

Pydantic models give users autocompletion, validation, and documentation:

# src/searchhive/models.py
from pydantic import BaseModel, HttpUrl
from typing import Optional

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str
    source: Optional[str] = None

class SearchResponse(BaseModel):
    results: list[SearchResult]
    total: int

class ScrapeResponse(BaseModel):
    url: str
    title: Optional[str] = None
    content: dict
    format: str

Step 7: Write Tests

Testing an SDK means testing against mocked HTTP responses, not the real API:

# tests/test_client.py
import pytest
from unittest.mock import patch, MagicMock
from searchhive import SearchHiveClient

def test_search_query():
    with SearchHiveClient(api_key="test-key") as client:
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": [
                {"title": "Test", "url": "https://example.com", "snippet": "A test result"}
            ],
            "total": 1,
        }
        mock_response.headers = {}

        with patch.object(client._session, "request", return_value=mock_response):
            result = client.search.query("test query")
            assert len(result["results"]) == 1
            assert result["results"][0]["title"] == "Test"

def test_invalid_api_key():
    with pytest.raises(ValueError):
        SearchHiveClient(api_key="")

Run tests with pytest tests/ -v.

Step 8: Publish to PyPI

Build and publish:

# Install build tools
pip install build twine

# Build the package
python -m build

# Upload to PyPI (use --repository testpypi for testing first)
twine upload dist/*

Set up PyPI tokens in ~/.pypirc or use TWINE_USERNAME=__token__ with a PyPI API token.

Common Issues and Solutions

Import path confusion. Make sure src/searchhive/__init__.py exports the client: from .client import SearchHiveClient.

SSL certificate errors in corporate environments. Add a verify parameter to the client that defaults to True but can be set to a custom cert bundle path.

Timeout handling. Different operations need different timeouts. A simple search should timeout in 10 seconds, but a deep scrape might need 60. Let users override per-request.

Version compatibility. Pin minimum versions in dependencies, test against the oldest supported Python version, and use pyupgrade to catch deprecated syntax.

Next Steps

Once your SDK is published and stable, consider adding:

Async support using httpx.AsyncClient for concurrent API calls
Pagination helpers that automatically page through large result sets
CLI interface using click or typer for quick testing and debugging
Response caching with configurable TTL for expensive API calls
Webhook integration for push-based data delivery

The SearchHive Python SDK implements all of these patterns. Get your free API key at searchhive.dev -- 500 credits, no credit card required. Full SDK docs and examples at docs.searchhive.dev.

For more on building data extraction pipelines, check out our guide on data extraction techniques.

How to Python SDK Design: A Step-by-Step Guide

AI-Powered Research

How to Python SDK Design: A Step-by-Step Guide

Key Takeaways

Prerequisites

Step 1: Define the API Contract

Step 2: Set Up the Package Structure

Step 3: Build the Client Class

Step 4: Create Resource Classes

Step 5: Implement Proper Error Handling

Step 6: Add Type Hints and Response Models

Step 7: Write Tests

Step 8: Publish to PyPI

Common Issues and Solutions

Next Steps

Keywords

RELATED ARTICLES

Top 7 Data Extraction Techniques and Tools for 2025

Automation for Finance: Common Questions Answered

Best LLM Function Calling Tools (2025): Complete Developer Guide

BUILD WITH SEARCHHIVE