Vector Search

Approximate nearest-neighbor retrieval over embeddings. The substrate under RAG, semantic caching, recommendations, and many modern ML retrievers. pgvector, Milvus, Qdrant.

Embed the corpus. Embed the query. Find the K nearest neighbors. The substrate of RAG, semantic caching, and recommendations. Status: STUB — promoted to OUTLINE in Y5 Phase 42.

What this pattern is

Vector search is approximate nearest-neighbor (ANN) retrieval over high-dimensional embeddings. Each item in a corpus is represented as a vector (typically 768-4096 dimensions, produced by an embedding model). Queries are embedded the same way; the search returns the K items whose vectors are closest to the query (cosine similarity, dot product, or Euclidean distance). Brute-force search is O(N) per query — fine for thousands of items, infeasible for millions. ANN indexes (HNSW, IVF, ScaNN, DiskANN) trade exactness for speed, reaching sub-millisecond latencies on hundreds of millions of vectors. pgvector brings vector search to Postgres; Milvus is the K8s-native dedicated vector database; Qdrant, Weaviate, Pinecone are alternatives.

The pattern underlies almost every modern retrieval system. RAG (rag-as-pattern) retrieves documents by semantic similarity. Semantic caching (llm-caching) deduplicates similar prompts. Recommendation systems use vector similarity over user/item embeddings.

The choice between “vector search in an existing database” vs “dedicated vector database” is one of the most-debated ML platform decisions of 2024-2026. pgvector’s advantage: no new operational surface, transactional semantics, join to relational data. Milvus/Qdrant advantage: purpose-built performance at large scale (billions of vectors), richer indexing options, better distributed operation. For basecamp, pgvector is the starting point because Postgres is already deployed; upgrading to a dedicated vector database is a phase-3 decision if scale demands it.

The pattern’s underlying algorithms have specific trade-offs. HNSW (Hierarchical Navigable Small Worlds) is memory-hungry but very fast. IVF (Inverted File) uses less memory but slower. DiskANN targets billion-scale on disk with acceptable latency. ScaNN is Google’s contribution optimized for their embeddings. Understanding which algorithm your vector database uses matters when tuning for scale — vector search performance is dominated by index choice at large corpus sizes.

Concrete instances in the wild

  • pgvector. Postgres extension for vector search. HNSW and IVFFlat indexes. basecamp default.
  • Milvus. OSS vector database, K8s-native. Purpose-built for billions of vectors.
  • Qdrant. OSS vector database in Rust. Popular for smaller-scale production.
  • Weaviate. OSS vector database with built-in ML models for embedding.
  • Pinecone. Commercial hosted vector database. Popular in prototyping.
  • Chroma. OSS embedded vector database. Common for local development.
  • Elasticsearch / OpenSearch with vector fields. Existing search infrastructure with vector support added.
  • Vespa. OSS from Yahoo. Combines vector search with structured search at scale.
  • FAISS (Facebook AI Similarity Search). OSS library from Meta. Underlies many vector databases.
  • ScaNN. Google’s ANN library. Used inside Google’s internal retrieval systems.
  • Cloudflare Vectorize. Managed vector database on Cloudflare’s edge network.

Why this pattern matters

Before dense embeddings and vector search, retrieval was keyword-based (TF-IDF, BM25). This works for lexical matches but fails on semantic ones. Searching for “auto” doesn’t find documents about “car.” Searching for “how do I fix bug X” doesn’t find documents about “workaround for issue Y.” Vector search embeds meaning, not just tokens, so semantically similar content ranks high even when word-level overlap is zero.

The pattern is what makes LLM-powered retrieval possible. RAG works because vector search finds semantically relevant documents given a natural-language query. Semantic caching works because vector search finds previously-answered queries that are semantically equivalent to the new query. Personalization works because vector search finds items whose embeddings match the user’s preference embedding. Every one of these is impossible without vector search.

For basecamp specifically, vector search is the substrate under the ops-handbook RAG chatbot (Y5 Phase 42). Ingest the runbooks; embed them; on-call queries retrieve relevant runbook sections; LLM synthesizes an answer with citations. Without vector search, the chatbot would either miss relevant context (keyword search) or drown in irrelevant results (no ranking). With vector search, the retrieval quality is what makes the chatbot useful.

The pattern also underlies model behaviors that don’t look like search. Recommender systems use vector similarity between user and item embeddings. Anomaly detection uses distance from cluster centroids in embedding space. Clustering uses embedding proximity. Deduplication uses embedding similarity thresholds. The mental model “everything is embeddings + nearest neighbor” is more accurate than it sounds for a surprising fraction of modern ML.

Modern platforms make vector search operationally accessible. pgvector deploys as a Postgres extension. Milvus deploys as a K8s operator. Managed services (Pinecone, Cloudflare Vectorize) remove operational burden entirely. What was cutting-edge in 2020 is a commodity in 2026.

The failure modes to know: embedding drift as the model changes (need to re-embed the corpus); dimensionality mismatch between corpus and query embeddings; ANN recall trade-offs at scale (need to tune ef_search / nprobe / other parameters); index build time on large corpora (billions of vectors can take hours); memory pressure of HNSW indexes at scale. Each has known patterns, but operating vector search at scale means owning them.

Depth progression

STUB     ← you are here.
OUTLINE  Promoted when Y5 Phase 42 deploys pgvector on basecamp.
DEEP     Promoted after Y5 Phase 42 — at least one production-shaped use case
         operational (RAG on ops-handbook is the natural one).

Preview: what OUTLINE will answer

When Y5 Phase 42 promotes this entry to OUTLINE, it will name:

  • PROBLEM. How do you find semantically relevant items from a large corpus given a natural-language or embedding-based query?
  • PRINCIPLES. Embed items; embed queries; find nearest neighbors. Use ANN indexes for scale. Trade recall for latency deliberately. Match embedding model between corpus and query. Re-embed when model changes.
  • TRADE-OFFS. pgvector (embedded in Postgres, familiar) vs dedicated vector DB (Milvus, Qdrant — scales further). HNSW (fast, memory-hungry) vs IVF (slower, less memory) vs DiskANN (billion-scale, disk-based). Managed (Pinecone) vs self-hosted. High recall (accurate, slow) vs low recall (fast, less accurate).
  • TOOLS (time-stamped as of 2026-06): pgvector (basecamp default), Milvus, Qdrant, Weaviate, Pinecone, Chroma, Elasticsearch/OpenSearch, Vespa, FAISS, ScaNN, Cloudflare Vectorize.

The DEEP promotion, after Y5 Phase 42 with production RAG operational, will add MASTERY (operating vector search on basecamp), COMPARE (pgvector vs Milvus vs Qdrant at different scales), OPERATE (a specific tuning event or corpus re-embedding), and CONTRIBUTE (a pgvector or Milvus documentation improvement).

Canonical references

  • pgvector documentation. Free at github.com/pgvector/pgvector.
  • Milvus documentation. Free at milvus.io.
  • Qdrant documentation. Free at qdrant.tech.
  • Malkov & Yashunin, “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs” (HNSW paper, 2016). Free.
  • Meta’s FAISS documentation and papers. Free.

Cross-references