Align Vector DB Choice to Infrastructure and Scale
If your app runs on PostgreSQL with under 10M vectors, install pgvector extension for free—vectors join relational data in ACID transactions without new infra or sync lag. MongoDB Atlas Vector Search unifies embeddings, JSON docs, and metadata in one collection (HNSW indexing to 4096 dims); M0 free tier (512MB), Flex caps at $30/mo, dedicated from $57/mo (M10), with one-click Voyage AI embeddings. These eliminate dual writes and sprawl, ideal for full-stack apps where vectors augment operational data.
For billion-scale RAG/agentic workloads without DevOps, Pinecone's serverless SaaS handles billions (Rust engine, multi-tenant isolation); tiers: free Starter, $20/mo Builder (new 2026 for solos), $50/mo Standard min, $500/mo Enterprise. Add BYOC on AWS/GCP/Azure, Inference for hosted embeddings/rerankers, Assistant for chat agents, Dedicated Read Nodes for read-heavy loads. Milvus OSS/Zilliz Cloud targets 100B+ vectors with Cardinal engine (10x throughput, 3x faster indexing vs HNSW) and GPU accel; pairs with Kafka/Spark but adds metadata/object storage ops overhead.
Self-host Qdrant (Rust-native, 29k GitHub stars) for top price-perf up to 50M vectors at $30-50/mo VPS—composable queries fuse dense/sparse vectors, filters, custom scoring; free tier 1GB RAM/4GB disk (no CC), edge deployable. Weaviate excels at hybrid search (BM25 keywords + dense vectors + filters in one query, multimodal text/images/audio); $45/mo Flex min (post-Oct 2025, retired $25), $280/mo Plus annual, swap embedding models modularly.
Tradeoffs: Prototyping Speed vs Production Scale
Prototype LLM apps fastest with Chroma OSS (embedded or server)—intuitive API, high recall ANN, no DB expertise needed; Cloud Starter $0 + usage, Team $250/mo + usage, suits small-medium scale scaffolding. LanceDB OSS/cloud goes serverless on S3/GCS (Lance columnar format for on-disk filtering, no memory overhead), AWS-validated for billion-scale elastic queries, strong multimodal text/images/structured retrieval.
Skip full DBs for research/custom pipelines—use Faiss library (Meta AI, GPU CUDA) with IVF/HNSW/PQ indexes; tune nlist/nprobe for speed/accuracy, but add your own persistence/query API.
| DB | Max Scale | Start Price | Key Tradeoff |
|---|---|---|---|
| Pinecone | SaaS | Billions | Free/$20 |
| Milvus/Zilliz | 100B+ | OSS free | GPU scale, ops complexity |
| Qdrant | 50M | Free tier | $30-50 perf leader |
| Weaviate | Large | $45 | Hybrid search native |
| pgvector | Millions | Free | Postgres only |
| Mongo Atlas | Millions | $0-30 | Doc unification |
| Chroma | Small-Med | Free/$0+ | Dev speed, not extreme scale |
| LanceDB | Large | Free | S3 serverless |
| Faiss | Custom | Free | Library, no ops |