TriAttention: Trigonometric KV Scoring Beats Baselines on Long Reasoning

Fixing KV Selection Instability from RoPE Rotation

Standard KV cache compression relies on attention scores from recent post-RoPE queries, but RoPE rotates queries by position, yielding few representative samples. This causes poor top-key selection and unstable long reasoning. TriAttention sidesteps this by analyzing the pre-RoPE space, where query (Q) and key (K) vectors concentrate tightly around fixed, non-zero centers that stay stable across positions—termed Q/K concentration.

This concentration drives position-specific attention biases: queries favor keys at certain distances (like nearest neighbors), with preferences dictated by center angles via a trigonometric series expansion. Q/K vector norms provide an extra importance signal.

TriAttention's Position-Aware Scoring

Key importance is computed using the trigonometric series from Q/K centers to score based on relative positions, avoiding rotation issues entirely. No need for query sampling—instead, derive distance preferences analytically from stable pre-RoPE geometry.

Implementation integrates this scoring into KV eviction, retaining top keys by combined trigonometric position score and norm signals. This preserves reasoning fidelity while slashing cache size.

10.7x KV Savings with Full Accuracy

On AIME25 benchmark with 32K-token generation, TriAttention equals full attention accuracy but delivers 2.5x higher throughput or 10.7x KV memory reduction. Leading baselines halve accuracy at equivalent efficiency. Enables OpenClaw model deployment on a single consumer GPU, dodging OOM failures from long-context full attention.

Code at https://github.com/WeianMao/triattention confirms production viability for efficient long-reasoning LLMs.

Fixing KV Selection Instability from RoPE Rotation

TriAttention's Position-Aware Scoring

10.7x KV Savings with Full Accuracy

More from AI & LLMs

GPUs Crush AI Tasks with Parallel Compute and Vast Memory

GPUs Power AI with Parallel Compute and Massive Memory

PrfaaS Enables Cross-Datacenter LLM Serving with 54% Throughput Gain

Mistral-7B-v0.3 Reaches 86.5% Text-to-SQL via Logic Normalization