The Failure of 'Convincingness' as a Metric
Modern role-playing language agents (RPLAs) are evaluated using benchmarks that prioritize personality fidelity, fluency, and stylistic naturalness. While these metrics show high scores (e.g., 80%+ alignment), they fail to detect a critical flaw: the model's tendency to prioritize culturally dominant composites over historical reality. This is the 'Miranda distortion'—a phenomenon where the volume of cultural representation (like a Broadway musical) in the training corpus vastly outweighs the primary documentary record, causing the AI to reason from a modern, smoothed-out narrative rather than the actual historical figure.
The Structural Mechanism: Miranda Distortion
This distortion is not a bug that can be fixed via standard RLHF or fine-tuning; it is a structural byproduct of autoregressive training.
- Corpus Saturation: For any culturally salient figure, the training data is saturated with derivative works, social media discourse, and popular media.
- Algorithmic Sycophancy: Reinforcement learning (RLHF) exacerbates this. Human raters, themselves products of the same cultural environment, reward models that provide the 'mythologized' version of a figure they already believe in.
- The Mask vs. The Mirror: Current evals measure the 'mask' (does this sound like the character?) rather than the 'mirror' (is this reasoning constrained by the character's actual record at a specific point in time?).
From Cognitive to Epistemic Simulation
To fix this, the field must shift from 'cognitive simulation' (modeling the persona's mind) to 'epistemic simulation' (modeling the persona's access to information). This requires three commitments:
- Corpus-Bounded: The persona's reasoning must be strictly licensed by a specific set of primary documents.
- Temporally Anchored: The persona must be instantiated at a specific moment in their life, with knowledge and language that predate that moment treated as out-of-bounds.
- Expert-Loop Evaluated: Outputs must be audited by domain experts (historians, classicists, etc.) who can identify anachronisms that automated metrics miss.
Re-architecting the Encounter
Instead of treating the persona as a property of the model weights (which makes it uninspectable), builders should treat the persona as a 'configuration' of an encounter. This involves:
- Context Engineering: Using RAG-based architectures where anchor documents are provided in the context window, rather than fine-tuning weights.
- Legibility: By keeping the persona in the configuration (prompt + documents + temporal anchor), the system becomes versionable, auditable, and reproducible.
- Interpretive Custody: The human user or curator retains custody over the persona, using the model only as a voice to speak through the provided record.