Vector search APIs enable semantic search -- finding content by meaning rather than exact keyword matches. Instead of matching the word "car" to "car," vector search understands that "automobile," "vehicle," and "sedan" express the same concept and ranks results by semantic similarity.
This guide covers how vector search works, when to use it (and when not to), how to choose a vector search API, and how to integrate it into production applications.
Key Takeaways
- Vector search converts text into numerical embeddings and finds nearest neighbors in vector space, enabling semantic understanding
- Use vector search when you need fuzzy matching, concept similarity, or cross-lingual retrieval
- Don't use vector search when exact keyword matching is sufficient (IDs, codes, exact names)
- Major providers include Pinecone, Weaviate, Qdrant, and cloud-native options from AWS, GCP, and Azure
- SearchHive's DeepDive API combines web extraction with embedding generation, letting you build vector search indexes directly from web content
How Vector Search Works
The Embedding Process
- Text input -- a document, query, or chunk of text
- Embedding model -- converts text into a fixed-length vector (typically 384 to 1536 dimensions)
- Vector database -- stores and indexes vectors for fast similarity search
- Query -- converts a search query into a vector and finds the closest matches using distance metrics (cosine similarity, dot product, Euclidean distance)
"best project management tools for startups"
--> Embedding Model (e.g., text-embedding-3-small)
--> [0.023, -0.041, 0.087, ..., 0.015] (1536 dimensions)
--> Vector Database lookup
--> Returns documents with highest cosine similarity
Embedding Models
The quality of your vector search depends entirely on the embedding model. Current state of the art:
| Model | Dimensions | Context | Cost | Best For |
|---|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | 8191 tokens | $0.02/1M tokens | General purpose |
| OpenAI text-embedding-3-large | 3072 | 8191 tokens | $0.13/1M tokens | Higher accuracy |
| Cohere embed-v3 | 1024 | 512 tokens | Free tier available | Multilingual |
| Voyage AI voyage-3 | 1024 | 32000 tokens | $0.06/1M tokens | Long documents |
| Alibaba GTE-large | 1024 | 8192 tokens | Open source | Self-hosted, free |
Distance Metrics
- Cosine similarity -- most common, measures angle between vectors. Range: -1 to 1.
- Dot product -- faster to compute, works when vectors are normalized. Range: -inf to inf.
- Euclidean distance -- measures straight-line distance. Range: 0 to inf.
Cosine similarity is the standard choice for text embeddings because it's direction-agnostic (a longer document with the same meaning scores the same as a short one).
When to Use Vector Search
Vector search shines in these scenarios:
Semantic Search
Users search with natural language, not exact keywords. "Why is my code slow" should match "performance optimization techniques" even though no words overlap.
Recommendation Systems
"Products similar to this one" -- vector search finds items with similar embeddings based on descriptions, reviews, or metadata.
Document Q&A (RAG)
Retrieval-Augmented Generation pipelines use vector search to find the most relevant document chunks for a user's question, then feed those chunks to an LLM for answer generation.
Deduplication
Find near-duplicate documents by computing embedding similarity. More robust than exact text matching.
When NOT to Use Vector Search
- Exact lookups -- finding a user by email, looking up a product by SKU, matching a transaction ID
- Small datasets -- under 1,000 documents, brute-force comparison is faster than setting up a vector database
- Structured queries -- "price under $100 AND category = electronics" is a filter, not a vector search
Vector Search API Providers
Pinecone
Fully managed vector database with serverless and pod-based deployments. Supports metadata filtering, sparse-dense hybrid search, and namespaced indexes.
- Free tier: 1 index, 100K vectors (768 dims), sufficient for prototyping
- Serverless: $0.11/million read units, $0.23/million write units
- Pod-based: Starts at $70/month for a single pod (1M vectors)
Weaviate
Open-source vector database with managed cloud offering. Supports multiple embedding modules, GraphQL API, and hybrid search (BM25 + vector).
- Free sandbox: 1M vectors, limited compute
- Serverless: Pay per use from $25/month
- Dedicated: Starts at $65/month
Qdrant
Open-source vector database written in Rust. High performance, low resource usage. Available as open-source, Qdrant Cloud (managed), or Qdrant Hybrid.
- Free tier: 1 GB RAM, 10K vectors
- Cloud: From $25/month
- Self-hosted: Free, runs on any server
Cloud-Native Options
- AWS OpenSearch Serverless -- vector search via OpenSearch. $0.009/OCU-hour
- GCP Vertex AI Vector Search -- managed vector search, autoscaling. $4.17 per node-hour minimum
- Azure AI Search -- hybrid keyword + vector search. Standard tier from $250/month
- Azure Cosmos DB -- vector search add-on for document databases
Building Vector Search from Web Content with SearchHive
The biggest challenge in building a vector search application isn't the search itself -- it's getting quality content to index. SearchHive's DeepDive API extracts clean, structured content from web pages, ready for embedding.
import requests
import json
SEARCHHIVE_KEY = "your-key"
def build_vector_index_from_web(urls):
"""Extract content from web pages and prepare for vector indexing."""
documents = []
for url in urls:
# Extract clean content with DeepDive
resp = requests.post(
"https://api.searchhive.dev/v1/deepdive",
headers={
"Authorization": f"Bearer {SEARCHHIVE_KEY}",
"Content-Type": "application/json"
},
json={
"url": url,
"depth": "full",
"extract": ["headings", "entities", "metadata"],
"format": "markdown"
}
)
data = resp.json()
# Chunk the content for embedding
content = data.get("content", "")
chunks = chunk_text(content, max_chars=1000, overlap=200)
for i, chunk in enumerate(chunks):
documents.append({
"id": f"{url}#{i}",
"text": chunk,
"source": url,
"title": data.get("metadata", {}).get("title", ""),
"chunk_index": i
})
return documents
def chunk_text(text, max_chars=1000, overlap=200):
"""Split text into overlapping chunks for embedding."""
chunks = []
start = 0
while start < len(text):
end = start + max_chars
chunk = text[start:end]
if chunk.strip():
chunks.append(chunk.strip())
start += max_chars - overlap
return chunks
Searching Your Index
import openai
OPENAI_KEY = "your-openai-key"
def semantic_search(query, documents, top_k=5):
"""Search documents by semantic similarity."""
# Generate query embedding
query_embedding = openai.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
# In production, use a vector DB. For small sets, compute directly.
results = []
for doc in documents:
doc_embedding = openai.embeddings.create(
model="text-embedding-3-small",
input=doc["text"]
).data[0].embedding
similarity = cosine_similarity(query_embedding, doc_embedding)
results.append({**doc, "score": similarity})
results.sort(key=lambda x: x["score"], reverse=True)
return results[:top_k]
def cosine_similarity(a, b):
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x ** 2 for x in a) ** 0.5
norm_b = sum(x ** 2 for x in b) ** 0.5
return dot / (norm_a * norm_b) if norm_a and norm_b else 0.0
Hybrid Search: Keywords + Vectors
Pure vector search misses exact matches. Pure keyword search misses semantics. Hybrid search combines both.
Most production systems use a weighted blend:
- Vector score * alpha + BM25 score * (1 - alpha), where alpha is typically 0.5-0.7
Pinecone, Weaviate, and Qdrant all support native hybrid search with a single query.
Performance Considerations
Index Size and Memory
Vector databases need memory proportional to vector dimensions and document count. A million documents with 1536-dimensional float32 vectors require about 6 GB of memory just for the vectors.
Approximate Nearest Neighbor (ANN)
Exact nearest-neighbor search is O(n) -- too slow for large datasets. ANN algorithms (HNSW, IVF, ScaNN) trade a small accuracy loss for orders-of-magnitude speed improvement. All production vector databases use ANN by default.
Pre-filtering vs. Post-filtering
When combining vector search with metadata filters (e.g., "only documents from 2026"), pre-filtering (applying filters before the vector search) is faster but less accurate. Post-filtering (filtering results after vector search) is more accurate but wastes computation.
Quantization
Reducing vector precision from float32 to float16 or int8 halves or quarters memory usage with minimal accuracy loss. Most vector databases support this.
Best Practices
-
Choose your embedding model first. The model determines your vector dimension and search quality. Switching models later means re-embedding everything.
-
Chunk size matters. Too small (100 chars) loses context. Too large (5000 chars) dilutes the embedding. 500-1000 characters with 100-200 character overlap is a good starting point.
-
Add metadata at index time. Filter by source, date, category, and author at query time. Pure vector search without metadata is rarely useful in production.
-
Benchmark with real queries. Vector search quality depends on your specific domain. Test with actual user queries, not synthetic ones.
-
Monitor recall and latency. Track what percentage of relevant results appear in the top-k, and how long queries take at the 95th percentile.
-
Use hybrid search for production. Pure semantic search misses exact matches. Combine it with keyword search for the best results.
Conclusion
Vector search is a powerful tool for building intelligent, semantic search experiences. The technology is mature, the APIs are accessible, and the cloud providers make it easy to get started. The key decisions are choosing the right embedding model, chunking strategy, and hybrid search approach for your specific use case.
Need to build a vector search index from web content? SearchHive's DeepDive API extracts clean, structured content from any URL, ready for chunking and embedding. Combined with SwiftSearch for source discovery and ScrapeForge for batch extraction, it's the fastest way to build a production vector search pipeline. Start with 500 free credits at searchhive.dev.
Related reading: Complete Guide to Web Data Mining | Top 10 Market Data Extraction Tools