The Limitations of Final Accuracy
Standard benchmarks for long-term memory in Large Language Models (LLMs) often rely on final accuracy—measuring whether a model can retrieve a specific piece of information after a delay. However, this metric is insufficient because it obscures the 'how' and 'why' of memory failure. A model might arrive at the correct answer through lucky guessing, memorization of patterns rather than facts, or transient activations that do not reflect true long-term storage. MemTrace addresses this by probing the internal states of the model to distinguish between robust knowledge retention and superficial performance.
Probing Memory Dynamics
MemTrace shifts the focus from static evaluation to dynamic tracing. By analyzing the model's internal representations during the interval between information ingestion and retrieval, the framework identifies the specific points where memory degrades. This allows researchers to differentiate between:
- Encoding Failures: Instances where the model never successfully integrated the information into its weights or context.
- Retrieval Failures: Instances where the information is present but the model fails to access it due to interference or poor query alignment.
- Decay Patterns: The rate at which information becomes inaccessible, providing a more granular view of memory stability than a binary pass/fail score.
By moving beyond the 'black box' approach of measuring only output, MemTrace provides a diagnostic tool for developers to understand if their RAG (Retrieval-Augmented Generation) pipelines or fine-tuning strategies are actually building durable knowledge or merely creating temporary, fragile associations.