__oneoff__
TurboQuant: 3-Bit KV Cache Slash Memory in llama.cpp
Google's TurboQuant quantizes KV cache to 2.67 bits/value with <1% perplexity loss, enabling 110K+ contexts on consumer GPUs; llama.cpp community forks deliver CUDA/ROCm support and 5x compression.