Hermes: Self-Improving Agent Builds Skills from Conversations

Memory System Enables Cross-Session Recall Without Token Burn

Hermes persists all conversations in an SQLite database using FTS5 full-text search, allowing queries like "recall yesterday's discussion" to fetch exact matches without loading full history. Memory loads as a pre-compacted ~3,500-character snippet (~700 tokens) per session, avoiding context overflow. At 50% context window usage, it compresses by stripping old tool call outputs, retaining session head/tail, and middle summaries—more aggressive than OpenClaw's 80% threshold. External processors like Supermemory, Mem0, or OpenVikings can replace the default memory.md file. Hermes auto-nudges every ~10 turns to extract and save key facts or skills, ensuring long-term retention for tasks like matching your exact tweet style (e.g., pragmatic/developer-centric voice, 400-char length, specific emojis, avoiding hype like "incredible").

Auto-Skill Creation Turns Feedback into Reusable Tools

Interact once, and Hermes generates persistent skills via its Skill Manager—no manual coding needed. In a demo, it analyzed video scripts, internalized feedback (e.g., swap "breaking a sweat" for neutral phrasing, prefer "really good"), then built a "tweet generator" skill outputting 3+ options or threads. Invoke with /skill tweet in new sessions; it recalls preferences without prompts. Switch models mid-chat via model /glm-4-turbo for speed/cost (e.g., from Gemma 2 to GLM-4-Turbo). Skills evolve from experience, making Hermes self-improving: use it daily, and it handles repetitive tasks like content promotion autonomously.

Practical Trade-offs vs. Mature Agents Like OpenClaw

Install via simple CLI (pip install hermes-agent), supports local/VPS runs with any OpenAI-compatible model. Demo generated tweet threads from YouTube scripts in one session, fully recalled in a fresh one. Strengths: zero re-uploads, automatic evolution for personal workflows. Limits: immature vs. OpenClaw (fewer channels, weaker sandboxing); sessions start new unless specified; higher context use early on. Run cheap models like GLM-4 for daily assistance—test for 1 month to build production habits, as it extrapolates from short interactions to complex recall.