How to Build a Semantic Search API: Step-by-Step Tutorial
Semantic search understands the meaning behind queries, not just matching keywords. Instead of finding pages with the exact words "best CRM for small teams," a semantic search API returns results about team collaboration tools, contact management platforms, and sales automation software -- even if those exact words never appear.
This tutorial shows you how to build a semantic search API using Python, sentence embeddings, and SearchHive for web data collection.
Key Takeaways
- Semantic search uses vector embeddings to match queries by meaning, not keywords
- Sentence-transformers provides free, open-source embeddings that run on CPU
- FAISS handles fast similarity search over millions of vectors
- SearchHive's SwiftSearch and ScrapeForge APIs collect the training corpus
- You can build a working semantic search API in under 150 lines of Python
Prerequisites
- Python 3.9+
- A SearchHive API key (free tier with 500 credits)
- 2GB free RAM (for the embedding model)
Install dependencies:
pip install sentence-transformers faiss-cpu requests fastapi uvicorn numpy
Step 1: Collect Your Document Corpus
Before you can search semantically, you need documents to search over. Use SearchHive to collect relevant pages from the web.
import requests
import json
API_KEY = "your-api-key"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
def collect_documents(queries, pages_per_query=5):
docs = []
for query in queries:
# Search for relevant pages
resp = requests.get(
"https://api.searchhive.dev/v1/search",
headers=HEADERS,
params={"q": query, "num": pages_per_query}
)
results = resp.json().get("results", [])
for r in results:
url = r["url"]
try:
# Scrape full page content
scrape_resp = requests.get(
"https://api.searchhive.dev/v1/scrape",
headers=HEADERS,
params={"url": url}
)
data = scrape_resp.json()
content = data.get("markdown", "")
# Clean and chunk the content
chunks = chunk_text(content, max_chars=1000)
for i, chunk in enumerate(chunks):
docs.append({
"text": chunk,
"source": url,
"title": r.get("title", ""),
"chunk_index": i
})
except Exception as e:
print(f"Error scraping {url}: {e}")
return docs
def chunk_text(text, max_chars=1000):
chunks = []
paragraphs = text.split("\n\n")
current = ""
for p in paragraphs:
if len(current) + len(p) > max_chars:
if current:
chunks.append(current.strip())
current = p
else:
current += "\n\n" + p if current else p
if current.strip():
chunks.append(current.strip())
return chunks
Step 2: Generate Embeddings
Convert each document chunk into a vector using sentence-transformers.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # 384-dim vectors, fast on CPU
def generate_embeddings(docs):
texts = [d["text"] for d in docs]
embeddings = model.encode(texts, show_progress_bar=True, batch_size=32)
return embeddings
The all-MiniLM-L6-v2 model produces 384-dimensional vectors. It runs on CPU at ~1,000 documents/minute -- fast enough for most corpora.
Step 3: Build the FAISS Index
FAISS provides millisecond-level similarity search over large vector collections.
import faiss
import numpy as np
def build_index(embeddings):
dim = embeddings.shape[1]
# Normalize vectors for cosine similarity
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
normalized = embeddings / norms
# Build an IndexFlatIP (inner product = cosine similarity for normalized vectors)
index = faiss.IndexFlatIP(dim)
index.add(normalized.astype(np.float32))
return index
Step 4: Query the Index
Convert a user's query to an embedding and find the most similar documents.
def search(query, index, docs, model, top_k=5):
query_embedding = model.encode([query])
# Normalize query vector
norm = np.linalg.norm(query_embedding, axis=1, keepdims=True)
query_normalized = query_embedding / norm
# Search
distances, indices = index.search(query_normalized.astype(np.float32), top_k)
results = []
for i, (dist, idx) in enumerate(zip(distances[0], indices[0])):
if idx < len(docs):
results.append({
"score": float(dist),
"text": docs[idx]["text"][:300],
"source": docs[idx]["source"],
"title": docs[idx]["title"]
})
return results
Step 5: Build the REST API
Wrap the search function in a FastAPI endpoint.
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI(title="Semantic Search API")
class SearchRequest(BaseModel):
query: str
top_k: int = 5
class SearchResponse(BaseModel):
results: list
# Initialize globally (load model + index at startup)
docs = []
index = None
@app.post("/search", response_model=SearchResponse)
def semantic_search(req: SearchRequest):
if not index:
return SearchResponse(results=[])
results = search(req.query, index, docs, model, req.top_k)
return SearchResponse(results=results)
@app.get("/health")
def health():
return {"status": "ok", "documents_indexed": len(docs)}
# Startup: load data and build index
@app.on_event("startup")
def startup():
global docs, index
print("Collecting documents...")
docs = collect_documents([
"semantic search API tutorial",
"vector database comparison",
"embedding models for search",
])
print(f"Collected {len(docs)} document chunks")
print("Generating embeddings...")
embeddings = generate_embeddings(docs)
print("Building FAISS index...")
index = build_index(embeddings)
print(f"Index ready with {len(docs)} documents")
Step 6: Run and Test
uvicorn main:app --host 0.0.0.0 --port 8000
Test with curl:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "How do I find similar documents using AI?", "top_k": 3}'
Expected response:
{
"results": [
{
"score": 0.72,
"text": "Semantic search uses vector embeddings to find documents...",
"source": "https://example.com/semantic-search-guide",
"title": "Understanding Semantic Search"
}
]
}
Step 7: Add Persistence
Save and load the index to avoid re-embedding on every restart.
def save_index(index, docs, path="semantic_index"):
faiss.write_index(index, f"{path}.faiss")
with open(f"{path}_docs.json", "w") as f:
json.dump(docs, f)
def load_index(path="semantic_index"):
index = faiss.read_index(f"{path}.faiss")
with open(f"{path}_docs.json") as f:
docs = json.load(f)
return index, docs
Step 8: Optimize Performance
For larger corpora, use FAISS's IVF index for approximate nearest neighbor search:
def build_ivf_index(embeddings, nlist=100):
dim = embeddings.shape[1]
norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
normalized = embeddings / norms
quantizer = faiss.IndexFlatIP(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
index.train(normalized.astype(np.float32))
index.add(normalized.astype(np.float32))
return index
IVF indexes reduce search time from O(n) to O(sqrt(n)) with minimal accuracy loss. Use nlist = sqrt(num_documents) as a starting point.
Common Issues
Out of memory: The all-MiniLM-L6-v2 model uses ~400MB RAM. Each embedding is 384 floats (1.5KB). 100K documents = ~150MB for embeddings + ~600MB for the model. Stay under 500K documents on a machine with 4GB RAM.
Slow indexing: CPU encoding runs at ~1,000 docs/min. For 10K+ documents, consider using a GPU or switching to a smaller model (all-MiniLM-L3-v2 for 256-dim vectors).
Low quality results: The all-MiniLM-L6-v2 model is good for general English text. For domain-specific search, fine-tune the model on your corpus or use a larger model like all-mpnet-base-v2 (768-dim, better quality but 2x slower).
Or use SearchHive directly: If you don't need a custom corpus, SearchHive's SwiftSearch API already provides high-quality semantic web search out of the box -- no embedding infrastructure required.
Next Steps
- Scale up: Use Pinecone, Weaviate, or Qdrant instead of FAISS for managed vector search
- Add reranking: Use a cross-encoder to rerank the top-k results for better precision
- Hybrid search: Combine keyword search (BM25) with vector search for the best of both worlds
- Real-time updates: Stream new documents into the index as they're discovered
Get started with SearchHive -- 500 free credits for search, scraping, and research APIs. No credit card needed. Check the API documentation for advanced usage patterns.
/tutorials/semantic-search | /blog/cheapest-serp-api-for-developers | /compare/exa