Hermes Agent: Self-Improving Skills Beat Stateless Agents

Stateless Agents Waste Learned Work—Hermes Compounds It

Most AI agent frameworks treat every interaction as isolated: you give a task, it executes via tools like APIs or code, then forgets everything. This mirrors early web browsers in 1995—plenty of options, no standards. Nick Puru argues this statelessness limits real value, as agents never improve. Hermes, an open-source framework from Nous Research (v0.6.0, 19k GitHub stars, MIT license), flips this by capturing full execution "trajectories" (every API call, decision, tool sequence) and distilling them into reusable "skills"—LLM-generated functions stored as code in /hermes/backend/skills.

For a first-time complex task like "Pull Stripe revenue, cross-reference HubSpot pipeline, analyze in Python, generate chart, send Slack summary with 3 insights," Hermes chains 15-20 steps across 40+ tools (web search, terminal, browser automation, code execution, image gen, TTS, vision). Post-execution:

Records trajectory.
Analyzes: "Can this be packaged as reusable?"
If yes, generates/tests/stores skill.
Next similar request: Runs/refines skill (faster, cleaner).

Over time, it patterns your preferences (e.g., vague queries get context-aware nudges: "You probably want skill X"). Skills evolve per user—your Hermes shapes to workflows, unlike generic ChatGPT/Claude resets. Puru: "Hermes, it is designed to be getting better. It gets smarter the longer that it actually works with you."

This enables compounding: Week 1 manual 20-step revenue analysis → Week 4: Runs polished skill autonomously Tuesdays. Export trajectories to ShareGPT for fine-tuning your models via RL pipeline.

"Most agent frameworks, they just throw that away completely. Like the task is done, the memory is gone, Hermes keeps it instead." — Nick Puru, explaining why trajectory capture changes agent architecture.

Deployment Matches Any Infrastructure—From Laptop to Serverless

Hermes avoids lock-in with 6 backends, suiting varied needs:

Local terminal: Dev/testing (your phone via Telegram).
Docker: Production isolation.
SSH/VPS: Puru's choice ($5-10/mo, runs 24/7 without idle costs).
Singularity: Research clusters (GPU).
Daytona: Persistent cloud dev.
Modal: Serverless (hibernates idle, wakes on message—pay-per-second).

Messaging gateways (12 platforms: Telegram, Discord, Slack, WhatsApp, Signal, email, etc.) ensure continuity—start CLI, resume Discord, no context loss. Python-heavy (92.5%) integrates seamlessly with LLM APIs/transformers. Install: Single curl command (60s), handles Python/Node deps.

Puru runs on VPS for always-on jobs without babysitting hardware: "I just want to keep it simple and see if I like this first through a VPS."

Trade-offs: Learning Excels for Personal Tools, Not Speed Pipelines

Hermes prioritizes adaptation over raw speed/orchestration (slower than CrewAI/LangGraph for multi-step flows). Key caveats:

OS: No native Windows (use WSL2; experimental PRs incoming)—blocks Windows-heavy enterprises.
Learning scope: Only complex tasks trigger skills; simple ones ignored. LLM-generated code may fail (needs tweaks)—"not magic."
Security: Local backend allows terminal access (runs as you); docs mandate Docker/Modal for prod (container boundary). Dangerous command checks exist, PR tightening.
Reliability: Can loop/ignore messages (v0.4+ fixes, but recurs). Gets stuck on interruptions.

"If your use case is run this pipeline as fast as possible... those frameworks are going to be better. ... If ... build an agent that gets smarter the longer I use it, that's where Hermes is going to actually be much much better."

"Hermes it is trading speed for learning and that's the bet that they are essentially making." — Nick Puru, on why Hermes optimizes for long-term intelligence over immediate execution.

Best for Mac/Linux shops building internal/personal agents (e.g., revenue analysis, custom workflows). Avoid for Windows-native, ultra-secure no-container, or latency-critical ops.

Hermes vs. OpenClaw: Learning Depth vs. Ecosystem Breadth

OpenClaw (Peter Steinberger's 2025 weekend project: 300k+ stars in 4 months, 13k ClawHub skills, 2M MAUs, 24 platforms) is a TypeScript messaging gateway/control plane—routes AI everywhere (iMessage to WeChat). Huge community/tutorials; human-maintained skills. No built-in learning: Memories via plugins (LanceDB vectors, lossless persistence)—manual config. No trajectory/RL export.

Hermes (Python, Nous Research): Baked-in learning (autonomous skills, self-improvement). Fewer platforms (12), but cross-continuity shines. Infra edge: 6 backends vs. OpenClaw's Node/Docker. Killer feature: hermesclaw migrate—one-command pulls OpenClaw's soul.md, memories/skills/API keys/config.

Puru rejects "which is better?"—"fundamentally different architectural bets."

OpenClaw: Ops/platform for customer-facing, multi-channel (e.g., 5 business channels Day 1).
Hermes: Personal/research for compounding knowledge/exportable training data.

"They're just completely different games. Pick the one that actually matches what you're going to be building." — Nick Puru, contrasting Hermes' learning focus with OpenClaw's distribution strengths.

Run both: Keep OpenClaw for breadth, Hermes for depth.

Frictionless Onboarding Proves the Value

Setup demo: curl GitHub installer → auto-detects/installs deps → ready. Live walkthrough: Config APIs/tools, chat via terminal/Telegram. Phone access: Message anywhere, VPS handles compute. Ships 40+ skills; custom ones via agentskills.io (npm-like). Puru migrated OpenClaw setups seamlessly.

"One thing that I really like about Hermes is how simple this is to actually set up. So it's legitimately a few minute task." — Nick Puru, after live install, highlighting barrier-to-entry as a key adoption driver.

Key Takeaways

Target complex, repeatable workflows (e.g., multi-API analysis) to trigger skill creation—simple queries won't learn.
Start on VPS/Docker for always-on without hardware; Modal for cheap serverless ($0 idle).
Expect LLM skill bugs: Review/edit /hermes/backend/skills code manually.
Migrate from OpenClaw with hermesclaw migrate to test learning alongside breadth.
Use for personal compounding agents; pick CrewAI/LangGraph for speed, OpenClaw for multi-platform ops.
Secure prod: Always containerize (Docker/Modal)—local terminal risks shell access.
Track evolution: Full-text search past convos/tasks for memory; patterns nudge suggestions.
Evaluate fit: Mac/Linux internal tools yes; Windows/enterprise security no.
Compound via RL: Export trajectories to ShareGPT for custom model fine-tuning.