3-Layer Scanner Stops RAG Prompt Injections Pre-Ingestion

CLI tool detects embedded prompt injections in documents via regex (40+ patterns, 7 categories), spaCy heuristics (6 signals), and LLM judge (89% chunks skipped), classifying chunks as CLEAN/SUSPICIOUS/DANGEROUS with zero false positives on 42 test chunks.

Secure RAG Ingestion by Blocking Injections Early

Prompt injection ranks as the #1 OWASP LLM Top 10 vulnerability for 2025, enabling exploits like code execution and API calls in AI agents. This Python CLI/library scans documents at ingestion, chunking into 512-char overlapping segments before applying defenses. It fills the gap of no prior pip-installable pre-ingestion scanner, preventing RAG poisoning where payloads hide in PDFs or compliance docs.

Risk combines layers into CLEAN (no flags), SUSPICIOUS (Layer 1/2 flags or low-confidence Layer 3), or DANGEROUS (Layer 3 INSTRUCTION). High-confidence Layer 3 DATA (≥0.90) overrides Layer 1 to avoid false positives on security docs. Exit codes support CI/CD: 0 (all clean), 1 (suspicious), 2 (dangerous).

Supports .txt/.md (Python), .pdf (pdfplumber), .html (BeautifulSoup4). Install via uv on Python 3.11+; requires free Groq key for Layer 3.

Layered Detection Minimizes Costs and False Positives

Layer 1 regex (~1ms/chunk) flags 40+ case-insensitive patterns across 7 categories: instruction overrides, role switching, system markers, imperatives, exfiltration, obfuscation, jailbreaks.

Layer 2 heuristics (~10ms/chunk, spaCy en_core_web_sm) scores 6 NLP signals: instruction verb density, imperative concentration, second-person pronouns, contextual mismatch, sentence uniformity, question ratio—catches paraphrased attacks.

Layer 3 LLM judge (Groq/Anthropic, flagged only) uses XML-isolated prompts for DATA/INSTRUCTION verdict with confidence and reasoning; 89% chunks skip it. Decision tree prioritizes Layer 3: INSTRUCTION→DANGEROUS; uncertain/low-conf→SUSPICIOUS; high-conf DATA→CLEAN unless conflicting flags.

Test Results Validate Precision in Real Scenarios

On 42 chunks from 7 docs (Wikipedia ML/Neural Nets, technical ML, clean short, explicit injection, buried injection in 10-para GDPR doc, poisoned policy): detected exact dangerous chunks (e.g., 1/7 in GDPR, para 6 injection), zero false positives on legit content. Cost-efficient: Layers 1/2 handle most.

Limitations: partial evasion by Base64/unicode obfuscation (Layers 2/3 mitigate), cross-chunk splits (50-char overlap helps), English-only. No formal benchmark yet; v1 validated on crafted/real docs. Roadmap eyes multilingual, obfuscation preprocessor.

Summarized by x-ai/grok-4.1-fast via openrouter

6966 input / 1486 output tokens in 10779ms

© 2026 Edge