Turbovec: High-Performance Vector Search via TurboQuant

Eliminating Training Overhead with Data-Oblivious Quantization

Most production-grade vector search libraries, such as FAISS, rely on Product Quantization (PQ) which requires a codebook training step. This process involves running k-means over a representative sample of vectors, creating a dependency on the data distribution. If the corpus shifts or grows, the index often requires retraining.

Turbovec leverages Google's TurboQuant algorithm to bypass this requirement entirely. It is a data-oblivious quantizer that uses analytical properties of rotated vectors to precompute bucket boundaries and centroids. Because it does not require a training pass, it is significantly more flexible for dynamic datasets where data distributions may change over time.

The Quantization Pipeline

Turbovec achieves a 16x compression ratio (e.g., reducing a 1536-dimensional vector from 6,144 bytes in FP32 to 384 bytes at 2-bit) through a four-step process:

Normalization: Vectors are converted to unit directions on a high-dimensional hypersphere, with the original norm stored separately as a float.
Random Rotation: Vectors are multiplied by a random orthogonal matrix, ensuring that coordinates follow a predictable Beta distribution that converges to Gaussian in high dimensions.
Lloyd-Max Scalar Quantization: Because the distribution is analytically known, optimal bucket boundaries are precomputed without needing to inspect the specific input data.
Bit-Packing: Quantized coordinates are packed into bytes for storage efficiency.

Performance and Integration

Turbovec is written in Rust with Python bindings, making it accessible for RAG pipelines. It utilizes SIMD intrinsics (NEON for ARM, AVX-512BW/AVX2 for x86) to accelerate search. Benchmarks indicate that on ARM hardware (Apple M3 Max), it outperforms FAISS IndexPQFastScan by 12–20%. On x86, it remains competitive, generally matching or slightly exceeding FAISS performance in 4-bit configurations.

The library includes an IdMapIndex for scenarios requiring stable uint64 IDs and O(1) deletion, and it provides native integrations for LangChain, LlamaIndex, and Haystack.

Eliminating Training Overhead with Data-Oblivious Quantization

The Quantization Pipeline

Performance and Integration

More from Software Engineering

Go 1.25 & 1.26: Performance, Modernization, and AI Readiness

Achieving 1000+ TPS on 1T Models via Model-System Codesign

xAI Clones Voices from 1 Min Speech for TTS APIs

Tripo AI HD V3.1 Turns Photos into Production 3D Assets