Multi-Layer Validation Prevents Deadly LLM Medication Errors

Regex checks format but miss lethal doses; LLM self-validation repeats hallucinations; multi-layer checks against RxNorm, interactions, and patient data block unsafe recommendations before EHR entry.

Regex and Format Checks Fail Clinical Safety

Regex validation ensures LLM outputs match structures like "Warfarin 10mg daily"—parsing drug names, numeric doses (positive, <1000mg), units (mg/mcg), and frequencies (daily/BID)—but ignores patient-specific risks. For a 78-year-old male with CrCl 38mL/min, amiodarone, age>75, and 62kg weight, 10mg warfarin passes regex yet risks INR>8 and 15% intracranial hemorrhage chance within 72 hours; correct dose is 2-3mg. In a real October 2025 incident at a 240-bed hospital, regex approved "Enoxaparin 40mg BID" for a 48kg elderly patient with CrCl 42, causing retroperitoneal hematoma, Hgb drop to 7.8, transfusion, and $180K settlement. Audits of seven deployments show 65% use this pattern, yielding 3-5 near-misses per 1,000 outputs. It catches format issues but misses interactions (amiodarone+warfarin), contraindications (renal impairment), allergies, duplicates, and dose-per-kg needs.

Studies quantify the gap: 1.47% hallucination and 3.45% omission rates in clinical notes (12,999 clinician-annotated sentences, 18 configs) mean 7.35 daily hallucinations across 500 encounters, or 220 monthly—10% undetected equals 22 false recommendations. Adversarial prompts spike hallucinations to 50-82% in six LLMs, as models invent significance for fake biomarkers like "fictitious-enzyme-marker."

LLM Self-Validation Inherits the Same Errors

Asking the generating LLM (or another instance) to review outputs fails due to shared training gaps. For the warfarin case, Claude-sonnet-4 often outputs {"safe": true, "concerns": }, missing the overdose. In a September 2025 academic center case, GPT-4 recommended sumatriptan+ketorolac+metoclopramide for migraine, then validated it as safe—overlooking patient's coronary disease (MI history) and propranolol, contraindicating sumatriptan (vasoconstriction risk) and risking hypertensive crisis. Mitigation prompts drop hallucinations from 66% to 44%, but GPT-4o still hits 23%. Correlated errors mean the validator confirms plausible-but-wrong logic, optimizing for language over accuracy.

Multi-Layer External Checks Ensure Safety

Validate independently via seven layers: (1) regex format; (2) RxNorm drug existence; (3) interaction APIs (e.g., amiodarone+warfarin flags high bleeding risk); (4) FHIR/SNOMED contraindications; (5) allergies; (6) patient-specific dosing (age/weight/CrCl); (7) renal adjustments (<60mL/min). Critical issues (interactions/contras/allergies) block EHR entry; warnings queue pharmacist review.

Implementation uses PatientContext (age, weight, CrCl, meds, allergies, conditions, labs) and returns {'approved': bool, 'issues': ValidationIssue(severity, category, desc, source, rec), 'requires_review': bool}. For warfarin example, layers flag interaction (CRITICAL), renal dosing (WARNING), dose inappropriateness (WARNING), blocking approval. This architecture relies zero on LLM data, querying external sources to catch what format/self-checks miss, preventing incidents like bleeding risks in audited deployments.

Summarized by x-ai/grok-4.1-fast via openrouter

8518 input / 1637 output tokens in 21322ms

© 2026 Edge