Complete Guide to Vector Search API

Vector search APIs enable semantic search -- finding content by meaning rather than exact keyword matches. Instead of matching the word "car" to "car," vector search understands that "automobile," "vehicle," and "sedan" express the same concept and ranks results by semantic similarity.

This guide covers how vector search works, when to use it (and when not to), how to choose a vector search API, and how to integrate it into production applications.

Key Takeaways

Vector search converts text into numerical embeddings and finds nearest neighbors in vector space, enabling semantic understanding
Use vector search when you need fuzzy matching, concept similarity, or cross-lingual retrieval
Don't use vector search when exact keyword matching is sufficient (IDs, codes, exact names)
Major providers include Pinecone, Weaviate, Qdrant, and cloud-native options from AWS, GCP, and Azure
SearchHive's DeepDive API combines web extraction with embedding generation, letting you build vector search indexes directly from web content

How Vector Search Works

The Embedding Process

Text input -- a document, query, or chunk of text
Embedding model -- converts text into a fixed-length vector (typically 384 to 1536 dimensions)
Vector database -- stores and indexes vectors for fast similarity search
Query -- converts a search query into a vector and finds the closest matches using distance metrics (cosine similarity, dot product, Euclidean distance)

"best project management tools for startups"
    --> Embedding Model (e.g., text-embedding-3-small)
    --> [0.023, -0.041, 0.087, ..., 0.015]  (1536 dimensions)
    --> Vector Database lookup
    --> Returns documents with highest cosine similarity

Embedding Models

The quality of your vector search depends entirely on the embedding model. Current state of the art:

Model	Dimensions	Context	Cost	Best For
OpenAI text-embedding-3-small	1536	8191 tokens	$0.02/1M tokens	General purpose
OpenAI text-embedding-3-large	3072	8191 tokens	$0.13/1M tokens	Higher accuracy
Cohere embed-v3	1024	512 tokens	Free tier available	Multilingual
Voyage AI voyage-3	1024	32000 tokens	$0.06/1M tokens	Long documents
Alibaba GTE-large	1024	8192 tokens	Open source	Self-hosted, free

Distance Metrics

Cosine similarity -- most common, measures angle between vectors. Range: -1 to 1.
Dot product -- faster to compute, works when vectors are normalized. Range: -inf to inf.
Euclidean distance -- measures straight-line distance. Range: 0 to inf.

Cosine similarity is the standard choice for text embeddings because it's direction-agnostic (a longer document with the same meaning scores the same as a short one).

When to Use Vector Search

Vector search shines in these scenarios:

Semantic Search

Users search with natural language, not exact keywords. "Why is my code slow" should match "performance optimization techniques" even though no words overlap.

Recommendation Systems

"Products similar to this one" -- vector search finds items with similar embeddings based on descriptions, reviews, or metadata.

Document Q&A (RAG)

Retrieval-Augmented Generation pipelines use vector search to find the most relevant document chunks for a user's question, then feed those chunks to an LLM for answer generation.

Deduplication

Find near-duplicate documents by computing embedding similarity. More robust than exact text matching.

When NOT to Use Vector Search

Exact lookups -- finding a user by email, looking up a product by SKU, matching a transaction ID
Small datasets -- under 1,000 documents, brute-force comparison is faster than setting up a vector database
Structured queries -- "price under $100 AND category = electronics" is a filter, not a vector search

Vector Search API Providers

Pinecone

Fully managed vector database with serverless and pod-based deployments. Supports metadata filtering, sparse-dense hybrid search, and namespaced indexes.

Free tier: 1 index, 100K vectors (768 dims), sufficient for prototyping
Serverless: $0.11/million read units, $0.23/million write units
Pod-based: Starts at $70/month for a single pod (1M vectors)

Weaviate

Open-source vector database with managed cloud offering. Supports multiple embedding modules, GraphQL API, and hybrid search (BM25 + vector).

Free sandbox: 1M vectors, limited compute
Serverless: Pay per use from $25/month
Dedicated: Starts at $65/month

Qdrant

Open-source vector database written in Rust. High performance, low resource usage. Available as open-source, Qdrant Cloud (managed), or Qdrant Hybrid.

Free tier: 1 GB RAM, 10K vectors
Cloud: From $25/month
Self-hosted: Free, runs on any server

Cloud-Native Options

AWS OpenSearch Serverless -- vector search via OpenSearch. $0.009/OCU-hour
GCP Vertex AI Vector Search -- managed vector search, autoscaling. $4.17 per node-hour minimum
Azure AI Search -- hybrid keyword + vector search. Standard tier from $250/month
Azure Cosmos DB -- vector search add-on for document databases

Building Vector Search from Web Content with SearchHive

The biggest challenge in building a vector search application isn't the search itself -- it's getting quality content to index. SearchHive's DeepDive API extracts clean, structured content from web pages, ready for embedding.

import requests
import json

SEARCHHIVE_KEY = "your-key"

def build_vector_index_from_web(urls):
    """Extract content from web pages and prepare for vector indexing."""
    documents = []
    for url in urls:
        # Extract clean content with DeepDive
        resp = requests.post(
            "https://api.searchhive.dev/v1/deepdive",
            headers={
                "Authorization": f"Bearer {SEARCHHIVE_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "url": url,
                "depth": "full",
                "extract": ["headings", "entities", "metadata"],
                "format": "markdown"
            }
        )
        data = resp.json()

        # Chunk the content for embedding
        content = data.get("content", "")
        chunks = chunk_text(content, max_chars=1000, overlap=200)

        for i, chunk in enumerate(chunks):
            documents.append({
                "id": f"{url}#{i}",
                "text": chunk,
                "source": url,
                "title": data.get("metadata", {}).get("title", ""),
                "chunk_index": i
            })

    return documents

def chunk_text(text, max_chars=1000, overlap=200):
    """Split text into overlapping chunks for embedding."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + max_chars
        chunk = text[start:end]
        if chunk.strip():
            chunks.append(chunk.strip())
        start += max_chars - overlap
    return chunks

Searching Your Index

import openai

OPENAI_KEY = "your-openai-key"

def semantic_search(query, documents, top_k=5):
    """Search documents by semantic similarity."""
    # Generate query embedding
    query_embedding = openai.embeddings.create(
        model="text-embedding-3-small",
        input=query
    ).data[0].embedding

    # In production, use a vector DB. For small sets, compute directly.
    results = []
    for doc in documents:
        doc_embedding = openai.embeddings.create(
            model="text-embedding-3-small",
            input=doc["text"]
        ).data[0].embedding

        similarity = cosine_similarity(query_embedding, doc_embedding)
        results.append({**doc, "score": similarity})

    results.sort(key=lambda x: x["score"], reverse=True)
    return results[:top_k]

def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = sum(x ** 2 for x in a) ** 0.5
    norm_b = sum(x ** 2 for x in b) ** 0.5
    return dot / (norm_a * norm_b) if norm_a and norm_b else 0.0

Hybrid Search: Keywords + Vectors

Pure vector search misses exact matches. Pure keyword search misses semantics. Hybrid search combines both.

Most production systems use a weighted blend:

Vector score * alpha + BM25 score * (1 - alpha), where alpha is typically 0.5-0.7

Pinecone, Weaviate, and Qdrant all support native hybrid search with a single query.

Performance Considerations

Index Size and Memory

Vector databases need memory proportional to vector dimensions and document count. A million documents with 1536-dimensional float32 vectors require about 6 GB of memory just for the vectors.

Approximate Nearest Neighbor (ANN)

Exact nearest-neighbor search is O(n) -- too slow for large datasets. ANN algorithms (HNSW, IVF, ScaNN) trade a small accuracy loss for orders-of-magnitude speed improvement. All production vector databases use ANN by default.

Pre-filtering vs. Post-filtering

When combining vector search with metadata filters (e.g., "only documents from 2026"), pre-filtering (applying filters before the vector search) is faster but less accurate. Post-filtering (filtering results after vector search) is more accurate but wastes computation.

Quantization

Reducing vector precision from float32 to float16 or int8 halves or quarters memory usage with minimal accuracy loss. Most vector databases support this.

Best Practices

Choose your embedding model first. The model determines your vector dimension and search quality. Switching models later means re-embedding everything.
Chunk size matters. Too small (100 chars) loses context. Too large (5000 chars) dilutes the embedding. 500-1000 characters with 100-200 character overlap is a good starting point.
Add metadata at index time. Filter by source, date, category, and author at query time. Pure vector search without metadata is rarely useful in production.
Benchmark with real queries. Vector search quality depends on your specific domain. Test with actual user queries, not synthetic ones.
Monitor recall and latency. Track what percentage of relevant results appear in the top-k, and how long queries take at the 95th percentile.
Use hybrid search for production. Pure semantic search misses exact matches. Combine it with keyword search for the best results.

Conclusion

Vector search is a powerful tool for building intelligent, semantic search experiences. The technology is mature, the APIs are accessible, and the cloud providers make it easy to get started. The key decisions are choosing the right embedding model, chunking strategy, and hybrid search approach for your specific use case.

Need to build a vector search index from web content? SearchHive's DeepDive API extracts clean, structured content from any URL, ready for chunking and embedding. Combined with SwiftSearch for source discovery and ScrapeForge for batch extraction, it's the fastest way to build a production vector search pipeline. Start with 500 free credits at searchhive.dev.