How to Build a Semantic Search API: Step-by-Step Tutorial

Semantic search understands the meaning behind queries, not just matching keywords. Instead of finding pages with the exact words "best CRM for small teams," a semantic search API returns results about team collaboration tools, contact management platforms, and sales automation software -- even if those exact words never appear.

This tutorial shows you how to build a semantic search API using Python, sentence embeddings, and SearchHive for web data collection.

Key Takeaways

Semantic search uses vector embeddings to match queries by meaning, not keywords
Sentence-transformers provides free, open-source embeddings that run on CPU
FAISS handles fast similarity search over millions of vectors
SearchHive's SwiftSearch and ScrapeForge APIs collect the training corpus
You can build a working semantic search API in under 150 lines of Python

Prerequisites

Python 3.9+
A SearchHive API key (free tier with 500 credits)
2GB free RAM (for the embedding model)

Install dependencies:

pip install sentence-transformers faiss-cpu requests fastapi uvicorn numpy

Step 1: Collect Your Document Corpus

Before you can search semantically, you need documents to search over. Use SearchHive to collect relevant pages from the web.

import requests
import json

API_KEY = "your-api-key"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def collect_documents(queries, pages_per_query=5):
    docs = []
    for query in queries:
        # Search for relevant pages
        resp = requests.get(
            "https://api.searchhive.dev/v1/search",
            headers=HEADERS,
            params={"q": query, "num": pages_per_query}
        )
        results = resp.json().get("results", [])

        for r in results:
            url = r["url"]
            try:
                # Scrape full page content
                scrape_resp = requests.get(
                    "https://api.searchhive.dev/v1/scrape",
                    headers=HEADERS,
                    params={"url": url}
                )
                data = scrape_resp.json()
                content = data.get("markdown", "")
                # Clean and chunk the content
                chunks = chunk_text(content, max_chars=1000)
                for i, chunk in enumerate(chunks):
                    docs.append({
                        "text": chunk,
                        "source": url,
                        "title": r.get("title", ""),
                        "chunk_index": i
                    })
            except Exception as e:
                print(f"Error scraping {url}: {e}")
    return docs

def chunk_text(text, max_chars=1000):
    chunks = []
    paragraphs = text.split("\n\n")
    current = ""
    for p in paragraphs:
        if len(current) + len(p) > max_chars:
            if current:
                chunks.append(current.strip())
            current = p
        else:
            current += "\n\n" + p if current else p
    if current.strip():
        chunks.append(current.strip())
    return chunks

Step 2: Generate Embeddings

Convert each document chunk into a vector using sentence-transformers.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dim vectors, fast on CPU

def generate_embeddings(docs):
    texts = [d["text"] for d in docs]
    embeddings = model.encode(texts, show_progress_bar=True, batch_size=32)
    return embeddings

The all-MiniLM-L6-v2 model produces 384-dimensional vectors. It runs on CPU at ~1,000 documents/minute -- fast enough for most corpora.

Step 3: Build the FAISS Index

FAISS provides millisecond-level similarity search over large vector collections.

import faiss
import numpy as np

def build_index(embeddings):
    dim = embeddings.shape[1]
    # Normalize vectors for cosine similarity
    norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
    normalized = embeddings / norms
    # Build an IndexFlatIP (inner product = cosine similarity for normalized vectors)
    index = faiss.IndexFlatIP(dim)
    index.add(normalized.astype(np.float32))
    return index

Step 4: Query the Index

Convert a user's query to an embedding and find the most similar documents.

def search(query, index, docs, model, top_k=5):
    query_embedding = model.encode([query])
    # Normalize query vector
    norm = np.linalg.norm(query_embedding, axis=1, keepdims=True)
    query_normalized = query_embedding / norm
    # Search
    distances, indices = index.search(query_normalized.astype(np.float32), top_k)
    results = []
    for i, (dist, idx) in enumerate(zip(distances[0], indices[0])):
        if idx < len(docs):
            results.append({
                "score": float(dist),
                "text": docs[idx]["text"][:300],
                "source": docs[idx]["source"],
                "title": docs[idx]["title"]
            })
    return results

Step 5: Build the REST API

Wrap the search function in a FastAPI endpoint.

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="Semantic Search API")

class SearchRequest(BaseModel):
    query: str
    top_k: int = 5

class SearchResponse(BaseModel):
    results: list

# Initialize globally (load model + index at startup)
docs = []
index = None

@app.post("/search", response_model=SearchResponse)
def semantic_search(req: SearchRequest):
    if not index:
        return SearchResponse(results=[])
    results = search(req.query, index, docs, model, req.top_k)
    return SearchResponse(results=results)

@app.get("/health")
def health():
    return {"status": "ok", "documents_indexed": len(docs)}

# Startup: load data and build index
@app.on_event("startup")
def startup():
    global docs, index
    print("Collecting documents...")
    docs = collect_documents([
        "semantic search API tutorial",
        "vector database comparison",
        "embedding models for search",
    ])
    print(f"Collected {len(docs)} document chunks")
    print("Generating embeddings...")
    embeddings = generate_embeddings(docs)
    print("Building FAISS index...")
    index = build_index(embeddings)
    print(f"Index ready with {len(docs)} documents")

Step 6: Run and Test

uvicorn main:app --host 0.0.0.0 --port 8000

Test with curl:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "How do I find similar documents using AI?", "top_k": 3}'

Expected response:

{
  "results": [
    {
      "score": 0.72,
      "text": "Semantic search uses vector embeddings to find documents...",
      "source": "https://example.com/semantic-search-guide",
      "title": "Understanding Semantic Search"
    }
  ]
}

Step 7: Add Persistence

Save and load the index to avoid re-embedding on every restart.

def save_index(index, docs, path="semantic_index"):
    faiss.write_index(index, f"{path}.faiss")
    with open(f"{path}_docs.json", "w") as f:
        json.dump(docs, f)

def load_index(path="semantic_index"):
    index = faiss.read_index(f"{path}.faiss")
    with open(f"{path}_docs.json") as f:
        docs = json.load(f)
    return index, docs

Step 8: Optimize Performance

For larger corpora, use FAISS's IVF index for approximate nearest neighbor search:

def build_ivf_index(embeddings, nlist=100):
    dim = embeddings.shape[1]
    norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
    normalized = embeddings / norms
    quantizer = faiss.IndexFlatIP(dim)
    index = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
    index.train(normalized.astype(np.float32))
    index.add(normalized.astype(np.float32))
    return index

IVF indexes reduce search time from O(n) to O(sqrt(n)) with minimal accuracy loss. Use nlist = sqrt(num_documents) as a starting point.

Common Issues

Out of memory: The all-MiniLM-L6-v2 model uses ~400MB RAM. Each embedding is 384 floats (1.5KB). 100K documents = ~150MB for embeddings + ~600MB for the model. Stay under 500K documents on a machine with 4GB RAM.

Slow indexing: CPU encoding runs at ~1,000 docs/min. For 10K+ documents, consider using a GPU or switching to a smaller model (all-MiniLM-L3-v2 for 256-dim vectors).

Low quality results: The all-MiniLM-L6-v2 model is good for general English text. For domain-specific search, fine-tune the model on your corpus or use a larger model like all-mpnet-base-v2 (768-dim, better quality but 2x slower).

Or use SearchHive directly: If you don't need a custom corpus, SearchHive's SwiftSearch API already provides high-quality semantic web search out of the box -- no embedding infrastructure required.

Next Steps

Scale up: Use Pinecone, Weaviate, or Qdrant instead of FAISS for managed vector search
Add reranking: Use a cross-encoder to rerank the top-k results for better precision
Hybrid search: Combine keyword search (BM25) with vector search for the best of both worlds
Real-time updates: Stream new documents into the index as they're discovered

Get started with SearchHive -- 500 free credits for search, scraping, and research APIs. No credit card needed. Check the API documentation for advanced usage patterns.

/tutorials/semantic-search | /blog/cheapest-serp-api-for-developers | /compare/exa

How to Build a Semantic Search API: Step-by-Step Tutorial

AI-Powered Research

How to Build a Semantic Search API: Step-by-Step Tutorial

Key Takeaways

Prerequisites

Step 1: Collect Your Document Corpus

Step 2: Generate Embeddings

Step 3: Build the FAISS Index

Step 4: Query the Index

Step 5: Build the REST API

Step 6: Run and Test

Step 7: Add Persistence

Step 8: Optimize Performance

Common Issues

Next Steps

Keywords

RELATED ARTICLES

SearchHive vs WebScraper.io — Proxy Management Compared

Complete Guide to API for LLM Integration

Complete Guide to Marketplace Data Collection

BUILD WITH SEARCHHIVE