Instruction Bleed: The Hidden Risk of Prompt Composition

The Mechanism of Compositional Behavioral Leakage

Prompt-composed agentic systems—where multiple prompt modules are concatenated into a single context window—suffer from a failure mode called Compositional Behavioral Leakage (CBL). Because transformer self-attention mechanisms lack formal boundaries between concatenated modules, the model treats the entire context as a single, unified input. This architectural non-isolation means that modifying a prompt in one module can unintentionally alter the behavior of an entirely separate, non-dependent module.

Measuring Sub-Threshold Interference

Researchers tested CBL on a deployed job-evaluation agent using Claude Sonnet 4.6 across 144 trials. They employed a three-channel perturbation protocol, modifying non-focal modules along three axes:

Volume: Changing the length or quantity of text.
Content: Altering the semantic information.
Form: Changing the formatting or structure.

While volume and form perturbations showed no significant impact, content-based changes produced a statistically significant effect (Cohen's d = 0.63). Crucially, these shifts were "sub-threshold"—they did not trigger explicit failures or flip final recommendations. However, in a production environment where an agent makes thousands of decisions, these small, silent drifts compound, leading to systemic degradation that standard QA processes are currently ill-equipped to detect.

Implications for Agentic Architecture

CBL is distinct from other known failure modes like adversarial injection, privacy leakage, or multi-agent fault propagation. It is an inherent property of how current LLMs process concatenated prompts. The authors argue that measuring cross-module interference must become a standard requirement for evaluating prompt-composed systems. Developers should treat prompt modules not as isolated functions, but as interdependent components that require rigorous, holistic testing to ensure that changes in one area do not silently degrade performance elsewhere.

The Mechanism of Compositional Behavioral Leakage

Measuring Sub-Threshold Interference

Implications for Agentic Architecture

More from AI & LLMs

Anthropic's Mythos-Class Models: Fable 5 and Mythos 5 Explained

Qwen3.7-Max: Reasoning-First Agent Model with 1M Context

Ship Reliable AI Agents: Braintrust Hands-On

Build MCP Deep Research Agents + Writing Pipelines