AI Agents Won't Fix Productivity Without Better UIs
Decades of failed to-do apps led to AI agents like OpenClaw, but unreliable memory, bland models, and mismatched UIs (Discord/Telegram) cause chaos. Build custom UIs like Wolffer for predictable multi-agent orchestration; future OS inverts prompting so AI delegates to you.
Productivity Evolution: From Checkboxes to Overloaded Apps
Since age 10, the speaker tracked tasks in notebooks, then text files with Tasker for contextual reminders (Wi-Fi connect, location arrival). Rejected Todoist-like apps for lacking a "life OS." Built Toodo (2017) for tag-based priority scoring (e.g., "health" or "crisis" tags boost items). Expanded to Better app adding habits, planner, events. Culminated in Benji (2022, named after dog), mashing 60+ features: todos, routines, calendar, nutrition tracking via photo analysis, voice-to-API via microphone (pre-MCP, parsed speech to update calendar live). Friction killed adoption—forms for input caused oscillation between hyper-logging and abandonment. ChatGPT plugins sparked hype ("it's over for SaaS"), but models pre-JSON needed bullying ("no markdown, please"). Voice feature went viral on Twitter, yet ADHD sidelined shipping; others monetized single features (e.g., photo calorie tracking) for millions.
Shifted to agents: Claude skills for taxes/email/todos, then Cloudbot via WhatsApp/Telegram. OpenClaw obsession (joined <100-person Discord, wore lobster suits, made logo) enabled local self-hosting (NAS, Nextcloud, Image Local, Markdown). Prepared data piles (Google Drive, iCloud, high school photos) for agents. Tinkerer Club weekly meetups revealed 90% use cases doable via Claude/Cursor alone. Specialized agents beat one-to-one chats (mimics delegating life domains: business/personal/family). Created 20+ bots across 5 Discords with channels/threads/forums, but life grew chaotic (late rent/mortgage/emails)—performative mess.
Agent Pitfalls: Unreliability and UI Mismatch
Hype faded: Tinkerer Club meetups dropped from explosions to 5 people, like "OpenClaw Anonymous." Core issues—unreliable where critical: cron jobs, multi-agent handoffs, forgetting prior messages. Discord/Telegram unfit for life OS (not designed for it). Anthropic models (e.g., Claude) lost personality ("box of oats": obedient but bland, endless confirmations like "Did you do it? No."). Custom agents (OpenClaw, Hermes) tire tinkerers (pinball builders exhausted tweaking); cloud agents (Co-work, OpenAI/Perplexity upcoming) nerfed (5% OpenClaw capability), mass-market but unsatisfying. Juggling OpenClaw/Hermes/Paperclip (Conbo/Linear for agents, credit-heavy), plain terminal Claude/Cursor at frustration peaks.
Wolffer: Predictable UI Fixes Multi-Agent Chaos
Built Wolffer (Claude/Cursor abstraction, no Telegram/iMessage, non-extensible, no memory/plugins, ADHD-squirrel project) for personal use. Cons: locked to app UI, non-modular. Pros deliver reliability:
- Nested topics inject context: Child topics (e.g., Benji customer support) auto-load parent descriptions (work > Benji > support), bypassing flaky memory (beats Milo 'memory solve').
- Workspaces for switching contexts.
- Visible tool calls (collapse/expand, spinners, stop buttons), no slash commands.
- Predictable crons labeled with full history.
- Agent management UI: sidebar shows agent (e.g., Chandler), model, capabilities—tweak on fly.
- Dynamic @mentions: knowledge base Markdown, docs, passwords, skills (e.g., @Benji landing page + Tinkerer Club doc).
Leads to fluid multi-agent orchestration without Discord nests.
Future OS: AI Inverts Prompting, Kills Most Apps
Current computers absurd: 17 app updates, stale tabs greet returns. Future AI ingests life data (notifications/emails/todos), prioritizes tasks by absence duration, sequences work/breaks ("next task: X, then break"). Prompting inverts—delegate 99%, AI prompts you (e.g., "Send passport pic?" via forms/questionnaires). Background agents handle; users make decisions. No vim/code for grandma—task UIs generate on-fly. Most consumer apps die; survivors for specialists (color grading, music). Small apps persist for niches, but life OS dominates.