Hermes Agent: Always-On Memory via Bounded Core Files

Hermes embeds persistent memory directly in the system prompt using MEMORY.md (2,200 chars max) for agent notes and USER.md (1,375 chars) for user profile, forcing curation and enabling prefix caching, with optional external providers for additive recall.

Retrieval Memory's Scalability Limits and Hermes' Curated Alternative

Traditional agent memory frameworks like Letta (21.7K GitHub stars, OS-tiered: core always-in-context, recall searchable history, archival cold storage), Zep/Graphiti (temporal entity graphs), Cognee (knowledge graphs from 30+ connectors), Hindsight (entity graphs with reflect synthesis), and Mem0 (48K stars, LLM-extracted facts, ECAI 2025 paper arXiv:2504.19413 benchmarking 10 approaches) treat memory as a retrieval problem: store externally, query/inject on demand. This adds latency, noise, and token costs as context balloons, hitting LLM window limits. Databricks' research shows agent performance scales with accumulated experience, but retrieval dilutes focus.

Hermes flips this: memory is the agent, baked into the frozen system prompt at session start via two file-backed layers—MEMORY.md (~800 tokens) for environment/project facts (e.g., "User's project is a Go microservice at ~/code/gateway using gRPC + PostgreSQL") and USER.md (~500 tokens) for user details (e.g., "User prefers snake_case, uses Ubuntu 22.04, deploys via Terraform"). Total <1,300 tokens. Bounded sizes enforce curation: full memory triggers consolidation/replace via memory tool (actions: add/replace/remove, target='memory' or 'user'). Changes persist to disk instantly but reload next session, leveraging prefix caching to avoid reprocessing static tokens per turn, cutting latency/costs. Security scans block injections/duplicates.

Two-Layer Runtime: Built-In Always + One External Plugin

Core flow: prefetch_all(query) pulls external context pre-LLM (no tool call needed for built-ins), LLM responds, then sync_all(user, assistant) persists (passive extraction or tools like honcho_conclude). External supports one provider (e.g., Honcho dialectic, Hindsight batch retention, Mem0 fact extraction) in modes: auto-injection, tools-only (honcho_search), or hybrid. Session history via session_search (SQLite FTS5 + Gemini summary). Activate via ~/.hermes/config.yaml (e.g., memory.provider: "hindsight"); CLI: hermes memory setup.

This keeps core fast/always-active while externals handle volume, avoiding "stuffing" full history.

Proactive Triggers Force Selective Persistence

Agent saves without prompting, using decision tree: prioritize corrections/preferences (e.g., "User uses poetry, not pip"), environment facts (OS/tools), project conventions, complex workflow lessons, tool quirks. Skip trivial/re-discoverable/session-ephemera. Recall is automatic (core in prompt) or targeted (session_search for history, provider tools/prefetch). Distill external knowledge (e.g., ArXiv papers, Obsidian via obsidian skill, filesystem) into core: vast library lookups become compact facts like "Memory scaling: performance rises with stored experience."

Internal (brain: preferences/lessons, always-loaded) complements external (library: docs/code, on-demand tools)—no overlap, enabling agents to evolve personally without noise.

Summarized by x-ai/grok-4.1-fast via openrouter

8556 input / 1877 output tokens in 17411ms

© 2026 Edge