Scaling Vector Search: Beyond the Basics of Pgvector and Pinecone
When building your first RAG application, index creation is simple. You dump text chunks into Pinecone, query a similarity match, and call it a day. But when your database climbs past 10 million vectors, latency spikes, index build memory runs dry, and standard keyword search begins to look surprisingly attractive again.
Moving beyond a simple proof of concept requires balancing precision against compute costs. In this guide, we discuss practical tuning mechanisms for relational vector engines (specifically Postgres pgvector) and dedicated managed clusters, showing how to balance indexing parameters for millisecond retrieval times.
1. The Pitfalls of HNSW Index Building
Hierarchical Navigable Small World (HNSW) graphs are the default standard for fast nearest-neighbor lookups. However, building HNSW indices is highly CPU and memory intensive. If your database workspace memory parameter (`maintenance_work_mem` in PostgreSQL) is misconfigured, Postgres is forced to spill temp files to disk, dragging down build speeds by factors of ten.
-- Correctly tuning Postgres memory settings before building a large vector index
SET maintenance_work_mem = '4GB';
CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
In the code snippet above, the parameter m limits the maximum connection links per node in the graph, while ef_construction controls the search queue size during graph building. Lowering these limits saves substantial build memory but risks a minor drop in search accuracy. For dynamic datasets that append records continuously, IVFFlat indices can sometimes perform better than HNSW by partitioning vector space into simple lists rather than complex multi-layered graphs.
2. Why Hybrid Search is Mandatory
Vector embeddings are excellent at matching semantic concepts, but they struggle with exact matches (like serial numbers, acronyms, or specific database keys). A user search for "ACME-9023" will often return irrelevant conceptual documents because the vector representation doesn't weigh text overlap heavily.
"An AI search engine that cannot handle a exact-match lookup for a SKU or a user name is useless to B2B startups. You must combine relational keyword filters with your vector search."
Conclusion
Optimizing vector retrieval is an iterative balancing act. By sizing HNSW memory structures correctly and pairing vector matching with traditional SQL filtering, you protect your system against high query costs and scaling bottlenecks.
Ananya Iyer
Head of AI & Engineering at AICraftGen. Former systems architect specializing in secure LLM pipelines and workflow orchestration.