Detecting LLM Epistemic Blind Spots via Cross-Model Attribution

The Challenge of Epistemic Uncertainty in Clinical AI

Large Language Models (LLMs) frequently exhibit overconfidence even when their internal knowledge is insufficient for a specific task. In high-stakes domains like clinical decision support, this 'epistemic blind spot'—where a model lacks the necessary training data to make an accurate prediction—poses a significant safety risk. Standard confidence scores (like softmax probabilities) are often poorly calibrated, failing to signal when a model is guessing based on noise rather than learned patterns.

The authors propose a novel framework, Cross-Model Attribution Divergence (CMAD), to detect these blind spots. Instead of relying on the model's output probability, the method analyzes the 'reasoning' path by comparing feature attribution maps across different models.

Attribution Divergence: The core insight is that when multiple models (or a model and a baseline) disagree significantly on which input features are driving a prediction, the model is likely operating in a region of high epistemic uncertainty.
Clinical Tabular Application: By applying this to clinical tabular data, the researchers demonstrate that divergence in feature importance scores serves as a reliable proxy for identifying samples where the model lacks sufficient training coverage. When models rely on different, non-causal, or noise-heavy features to reach a conclusion, the system flags the prediction as potentially unreliable.

Practical Implications for Model Deployment

This approach shifts the focus from 'what the model predicts' to 'why the model predicts it.' By quantifying the disagreement in feature attribution, developers can implement a 'human-in-the-loop' trigger. When CMAD scores exceed a defined threshold, the system can automatically route the query to a human clinician rather than allowing the LLM to provide an unverified, potentially hallucinated recommendation. This provides a robust mechanism for safety-critical AI, ensuring that models only operate within their known epistemic boundaries.

Edge

Detecting LLM Epistemic Blind Spots via Cross-Model Attribution

The Challenge of Epistemic Uncertainty in Clinical AI

Detecting Blind Spots via Cross-Model Attribution Divergence (CMAD)

Practical Implications for Model Deployment

The Challenge of Epistemic Uncertainty in Clinical AI

Detecting Blind Spots via Cross-Model Attribution Divergence (CMAD)

Practical Implications for Model Deployment

More from AI & LLMs

LivingArena: Scaling LLM Evaluation via Peer-Probing

RoCo-ACE: Improving Knowledge Retention in Online LLM Distillation

CaRE: A Compute-Aware Evaluation Protocol for Masked Diffusion Models

Mechanistic Auditing via Reference Feature Atlases