Dreaming Curates Memories to Eliminate Agent Drift
Anthropic's research-preview 'dreaming' runs as a scheduled background process that replays an agent's past sessions overnight. It extracts patterns from memory stores, prunes contradictions, and builds a refined memory bank. This single change—not model upgrades or prompt tweaks—yielded Harvey AI a 6x lift in task completion rates, per Anthropic's customer slide. The process targets repeat tasks where agents degrade over sessions due to inconsistent memories, turning raw session history into a reliable knowledge base that prevents error accumulation.
For production agents handling ongoing work like coding or legal research, dreaming addresses a core failure mode: without curation, memories bloat with conflicts, causing 80-90% failure rates on iterative tasks. Activate it via Claude's closed beta, point it at your session history, and let it run 4 hours to ship an optimized store.
Replicating Gains: 5.4x Completion on 18 Go Billing Tasks
Test the same 18 coding tasks—building a Go billing service—against pre- and post-dream agents using identical Claude Opus 4.7 instances, system prompts, and evaluation rubric. Pre-dream agent struggled with repetition and low completion; post-dream version achieved 5.4x higher completion rates by leveraging curated memories to avoid past mistakes.
Token spend dropped 3.1x as the agent stopped redundant loops, directly tying memory quality to cost control. Run your own benchmark: log 18+ sessions on a real project, enable dreaming for 4 hours, then re-eval. This isolates dreaming's impact, proving it outperforms yesterday's agent without engineering lifts.
Trade-offs and Path to Production
Dreaming shines on repeat, memory-intensive tasks but requires session history buildup—start with 10+ runs for meaningful pruning. It's beta-only now, so join Anthropic's waitlist for access. Expect iteration: early versions focus on contradiction removal, future ones may add pattern synthesis. For indie builders, pair with structured evals to quantify gains before scaling agents.