Stateless LLMs and Vector RAG Create Amnesiac Agents

LLMs are stateless by design—each session starts with zero history, treating prior work as nonexistent, like Andrej Karpathy's "coworker with anterograde amnesia." Stuffing context windows fails because models ignore most of it: the "Lost in the Middle" paper shows LLMs use only 10–20% effectively, with Llama-3.1-70B's 128K window dropping to ~2,000 tokens in practice.

Vector RAG worsens this by retrieving semantically similar chunks without grasping relationships, missing links like a Tuesday feature tying to a Thursday bug. Google DeepMind mathematically proved fixed embeddings can't retrieve all relevant combos due to geometry limits. Result: agents contradict past decisions, re-ask answered questions, and fail tasks—Answer.AI's Devin eval succeeded on just 3/20 real engineering tasks, hallucinating on others from lacking structural context.

Mimic Brain Structure with Knowledge Graphs for True Recall

Human memory stores patterns of synaptic connections, not isolated facts (MIT research). Agents need the same: knowledge graphs encoding entities, typed relationships, and traversable paths (e.g., project → decisions → constraints → outcomes). This shifts from similarity search to graph traversal, letting agents reason: "What failed for authentication?" pulls multi-hop paths like Project X → authentication → tried approaches → failures.

Top agents prove graphs outperform: Augment Code's semantic dependency graph boosted SWE-bench Pro to 51.80% (vs. Cursor's 45.89% on same Claude model). Cognition Labs (Devin) trains dedicated models for info preservation. Letta's tiered memory targets personalization and planning as memory issues, raising $10M.

Deploy Graph Memory with Open-Source BrainAPI for Local Persistence

Build graph-native memory that's persistent, MCP-compatible, and self-hostable using BrainAPI (Lumen Labs, GitHub: Lumen-Labs/brainapi2). Pipeline extracts entities/relationships (Scout → Architect → Janitor → KG in Neo4j), exposing via Docker at localhost:8001/mcp for Claude Desktop/Cursor integration.

In practice, log decisions/constraints as nodes/edges; query traverses for context like past approaches or decision reasons. Runs fully local for sensitive projects, with cloud option and custom schema plugins. Surprise benefit: boosts inference—agents answer "Have we tried this?" or "Why this decision?" via structure, not docs, turning autocomplete into contextual reasoner.

Graphs separate elite agents from goldfish; models suffice—architecture wins.