The Shift from Static Alignment to Runtime Immunity

Traditional AI security relies on perimeter defense and training-time alignment (e.g., RLHF). These methods are insufficient for autonomous agents, which operate in dynamic environments with persistent memory and external tool access. The authors argue that a fully aligned agent remains vulnerable to runtime exploits like memory poisoning, tool-chain manipulation, and multi-agent protocol attacks. They propose the Agent-Native Immune System (ANIS), an endogenous defense architecture that functions as a dynamic "law enforcement" mechanism during runtime, distinct from the static "constitutional" values established during training.

The Immune Tower and Harness Triad

ANIS introduces two core structural concepts for agent security:

  • The Immune Tower (L0-L5): A six-layer hierarchical defense framework. A critical component is Barrier Immunity (L1), which provides non-cognitive, physical-and-logical isolation to prevent unauthorized access before it reaches the agent's reasoning loop.
  • The Harness Triad: A meta-cognitive automation backbone consisting of Meta, Self, and Auto components. This system enables Continual Immune Learning (CIL), allowing the agent to dynamically update its "vaccines" (defense protocols) in response to novel, evolving threats.

Taxonomy and Future Challenges

The framework formalizes the distinction between superficial defenses and robust security measures:

  • Agent Viruses vs. Agent Vaccines: The authors distinguish between non-parametric, superficial defenses and parametric, robust vaccines that provide deep protection.
  • Evaluation Metrics: A major open challenge is the definition of new metrics, such as the Autoimmunity Rate, which measures the false-positive intervention rate of the defense system.

The authors emphasize that the future of agent security lies in understanding the co-evolutionary dynamics between pathogens and vaccines within collective intelligence ecosystems, necessitating standardized immune protocols for agentic workflows.