Positional Bias Buries Middle Context
LLMs exhibit a U-shaped performance curve: accuracy peaks when relevant info sits at the prompt's start or end, but drops sharply in the middle. Stanford's 2023 study (Lost in the Middle) hid answers across documents; models like those optimized for long contexts still faltered mid-prompt. This persists even pre-training—University of Rochester's 2026 analysis of initialized Qwen2 and GPT-2 showed innate token influence favoring edges, like a desk where top/bottom papers are visible but the stack's core blurs. Trade-off: long contexts enable complex tasks but risk hiding critical data unless positioned deliberately.
Distance and Noise Accelerate 'Context Rot'
Pure length hurts, even without distractions. University of Washington/Amazon's 2025 experiment padded short prompts with whitespace or masked irrelevant tokens—increasing distance alone cut accuracy 7-48%. Llama dropped ~50% on variable-tracking; Mistral lost ~30% on arithmetic. KAIST's 2026 NoisyBench added realistic noise (irrelevant search results, chat history, plausible fakes): reasoning models lost up to 80%, with longer chain-of-thought amplifying errors as each step latches onto distractions. Chroma/Anthropic formalized this as 'context rot'—tokens as a depleting budget where extras yield diminishing returns, turning context from asset to liability.
Optimize by Trimming and Repositioning
Treat context as finite desk space: excise irrelevancies like old emails/boilerplate—they actively degrade, not idle. Rules: (1) Start with key docs, end with query/instruction. (2) New tasks? Fresh chats to shed history. (3) Restate essentials pre-answer—shifts them to the privileged end position, mimicking targeted duplication for reliability. Rewriting bloated prompts (e.g., full threads + briefs) to essentials boosts precision without new models. Outcome: reproducible gains on production tasks, sidestepping architecture limits until better optics emerge.