Replace Dumb Loops with LLM-Judged Persistence
Cursor's /goal iterates on rough-loop style automation but swaps fixed iterations for an LLM judge that checks if the goal is met after each agent run. Enable via /features enable goal, then /goal "migrate JS to TS, verify visuals with Playwright". The agent works (e.g., 9 hours overnight on migrations), gets paused/cleared with /goal pause or /goal clear, and receives context like "Continuing toward goal: take next steps or explicitly state complete." This fixes agents declaring victory early on tasks like fixing all repo tests (often incomplete after 10-15 mins). Hermes' persist goal mirrors it. Compared to rough-loop (max iterations) or auto-research loops, /goal handles ambiguous goals like "cut Docker image 60%" by exploring approaches incrementally. Key: LLM prompt demands "no proxy signals as completion—audit shows objective achieved, no work remains," forcing self-marking as complete.
Craft Prompts with Quantifiable 'Done' and Alignment
Goals must be >1 prompt but <backlog: specify achievement, constraints, validation, stop conditions. Examples: "Migrate stack, keep screens identical (Playwright verify);" "Optimize prompts until eval score hits target, run evals per change;" "Find 20 new issues: repro, fix, branch PR, log to run/ folder." Avoid fuzzy like "fix everything"—agents quit early or spiral. Pre-start: Chat for alignment (project context, bad UX, past bugs)—Vincent ran 3 days/30 rounds/gazillion tokens on OpenClaw this way. For prototypes, reference PRD.md, create milestone tests, include ref screenshots. Quantify: 20 issues, target score, visual matches.
Tools and Extensions for Reliable Execution
npx goal-buddy generates goal.md (describes request/constraints/stops) + state.yaml (tracks tasks)—/goal @goal.md yields full games (e.g., Rain-type with image-gen assets). Side chats fork convos mid-goal. Workshop at aibuilderclub.com teaches more.
Missions for Week+/Month+ Horizons
/goal limits to hours (e.g., fails on weeks-long SEO/ROAS without quick feedback). Use /mission: mission.md defines metrics, agent hypothesizes/tests (e.g., grow Twitter to 10k: post founder-voice threads, analyze perf, schedule next in hours/weeks). Human-in-loop for big changes. Crewlet (crewlet.io) in closed beta; iterated tweets from average to high-engagement by doubling down on winners.