The Mechanics of AI Injection

Prompt injection occurs when an attacker embeds natural language instructions within user input to manipulate an AI model's behavior. Much like SQL injection, the core issue is the failure to separate instructions from data. Because LLMs process the entire context window as a unified stream, attackers can use "role overrides" (e.g., "You are now a data extraction assistant") or "delimiter breakouts" (e.g., using XML tags to escape the intended input block) to hijack the model.

Crucially, developers must defend against two vectors:

  • Direct Injection: Malicious input provided directly through forms or APIs.
  • Indirect Injection: Malicious instructions hidden in data sources (databases, documents, or web scrapes) that are later retrieved and fed into the AI context window. This is often the more dangerous vector as it can originate from outside the application boundary.

A Multi-Layered Defensive Pipeline

Static blocklists are insufficient because attackers constantly evolve their payloads. A robust defense requires a multi-layered engineering pipeline:

  1. Input Normalization: Always apply NormalizationForm.FormKC to incoming strings. This defeats "token smuggling" where attackers use Unicode homoglyphs (lookalike characters) to bypass simple string matching.
  2. Semantic Pattern Detection: Screen inputs for common override markers (e.g., "ignore previous instructions," "system override"). While basic keyword checking is a good starting point, high-consequence systems should eventually use vector similarity evaluation.
  3. Dynamic Boundary Tokens: Instead of static delimiters like --- USER INPUT ---, generate cryptographically random nonces (e.g., <untrusted_user_data_nonce_8a2f1b>) to wrap user content. This makes it significantly harder for an attacker to predict and break out of the intended context block.
  4. Output Scanning: Validate the model's response before it reaches the user. If the AI begins repeating its system prompt or directives, flag it as a potential security breach.

Implementation in .NET

For .NET developers, these defenses can be integrated into the application lifecycle without cluttering business logic. Using the Microsoft Semantic Kernel SDK, you can implement an IPromptRenderFilter to intercept and sanitize parameters before they are rendered into the final prompt.

Finally, treat security as an observable system: log every detection and implement per-user rate limiting. A single trigger might be a false positive (e.g., a user writing about a "slip and fall incident"), but multiple triggers from the same user within an hour indicate a targeted attack. These layers do not make injection impossible, but they raise the cost of attack and provide the visibility needed to respond effectively.