LLM Context: More Tokens, Worse Results

Positional Bias Buries Middle Context

LLMs exhibit a U-shaped performance curve: accuracy peaks when relevant info sits at the prompt's start or end, but drops sharply in the middle. Stanford's 2023 study (Lost in the Middle) hid answers across documents; models like those optimized for long contexts still faltered mid-prompt. This persists even pre-training—University of Rochester's 2026 analysis of initialized Qwen2 and GPT-2 showed innate token influence favoring edges, like a desk where top/bottom papers are visible but the stack's core blurs. Trade-off: long contexts enable complex tasks but risk hiding critical data unless positioned deliberately.

Distance and Noise Accelerate 'Context Rot'

Pure length hurts, even without distractions. University of Washington/Amazon's 2025 experiment padded short prompts with whitespace or masked irrelevant tokens—increasing distance alone cut accuracy 7-48%. Llama dropped ~50% on variable-tracking; Mistral lost ~30% on arithmetic. KAIST's 2026 NoisyBench added realistic noise (irrelevant search results, chat history, plausible fakes): reasoning models lost up to 80%, with longer chain-of-thought amplifying errors as each step latches onto distractions. Chroma/Anthropic formalized this as 'context rot'—tokens as a depleting budget where extras yield diminishing returns, turning context from asset to liability.

Optimize by Trimming and Repositioning

Treat context as finite desk space: excise irrelevancies like old emails/boilerplate—they actively degrade, not idle. Rules: (1) Start with key docs, end with query/instruction. (2) New tasks? Fresh chats to shed history. (3) Restate essentials pre-answer—shifts them to the privileged end position, mimicking targeted duplication for reliability. Rewriting bloated prompts (e.g., full threads + briefs) to essentials boosts precision without new models. Outcome: reproducible gains on production tasks, sidestepping architecture limits until better optics emerge.

Positional Bias Buries Middle Context

Distance and Noise Accelerate 'Context Rot'

Optimize by Trimming and Repositioning

More on Edge

Guarantee LLM Outputs Match Exact Taxonomies with Tries

Rebuild GPT-5.5 Prompts from Scratch: Minimal Wins Over Legacy Detail

KERNEL Framework Delivers 340% AI Accuracy Gains

7 Prompts to Stop AI Sycophancy