The CoALA Framework for Agentic Memory

To move from static chatbots to functional AI agents, developers must implement persistent memory architectures. The CoALA (Cognitive Architectures for Language Agents) framework categorizes these into four distinct types, each serving a specific role in how an agent processes information and improves over time.

The Four Memory Architectures

1. Working Memory (Context Window)

This is the agent's immediate, volatile scratchpad. It holds the current conversation, system instructions, and active data. While modern context windows can handle up to a million tokens, they remain limited; stuffing too much information into the context window degrades performance and causes the model to lose track of information buried in the middle.

2. Semantic Memory (Knowledge Base)

This stores persistent facts, rules, and conventions. While vector databases and knowledge graphs are common academic implementations, production systems often use simple Markdown files (e.g., Claude.md) placed in project roots. These files provide the agent with architectural context and coding standards, preventing the agent from repeating basic errors across sessions.

3. Procedural Memory (Skills)

This dictates how an agent performs tasks. Using an open-standard approach, skills are stored as folders containing instructions and metadata. To avoid overwhelming the working memory, agents use progressive disclosure: they load only a lightweight index (name and description) of available skills. The full instructions and dependencies are only pulled into the context window when the agent identifies a task that requires that specific skill.

4. Episodic Memory (Distilled Experience)

This is the agent's record of past interactions and decisions. A naive implementation—saving full transcripts—is rarely useful. Effective systems use distillation, where the agent saves only high-value insights (e.g., "the auth issue was in the middleware layer") rather than raw logs. This allows the agent to learn and improve over time. However, this is the most difficult to implement, as it requires solving the "forgetting" problem: determining when information becomes obsolete or irrelevant.

Matching Memory to Agent Complexity

Not every agent requires all four memory types. Complexity should scale with the task:

  • Reflex Agents (e.g., simple routers): Require only working memory.
  • Narrow Task Agents (e.g., password reset bots): Require working memory and procedural memory.
  • Complex Agents (e.g., coding assistants): Require all four types to manage product knowledge, specific skills, and long-term project history.