Ralph Loops: Repeat Tasks Till AI Ships Perfect Code

Ditch Complex Orchestration for Simple Iteration

Traditional AI workflows, like the speaker's n8n setup for newsletters, crumble under complexity: multi-step flows with article scraping, deduping, and summarization fail weekly due to brittle integrations. Despite tools like n8n simplifying API orchestration, maintenance outweighs value—'it was probably easier to just write the newsletter.' Modern LLMs shift this: Claude's 'skills' internally loop (read instructions, call tools, repeat) to produce coherent outputs without explicit wiring. This scales to coding: paste a complex n8n JSON into Claude, and it builds a self-contained skill that iterates implicitly. Key insight: any agent is a loop; simplify to explicit repetition for reliability.

'This ships much better, more coherent newsletters than the previous workflow... I haven't really touched this skill.'

Trade-off: Early models (pre-Nov 2024) hallucinated incompleteness; now GPT-4o/4.1+ and Claude 3.5 Sonnet/Opus handle single passes better, but loops catch edge cases.

Core Mechanism: The 'Ralph' Repetition

Named after Simpsons' Ralph Wiggum ('tries the same thing over and over until it works'), a Ralph loop prompts AI to 'implement ticket X', lets it finish, then repeats verbatim. AI re-reads its output, spots misses (e.g., unupdated status flags), and fixes iteratively. No planning graphs or multi-agents needed.

In demo: Start with vibe-coded Pomodoro timer (pomodoro start saves timestamp, one test). Tickets in doc/tickets/001.md: 'Add status command showing time left.'

Prompt Claude: 'implement doc/tickets/001'.
AI reads ticket, adds status CLI (calc remaining via datetime), runs/saves tests (auto-adds test).
Repeat prompt: AI notices 'status should mark ticket done', updates file.
Third repeat: Confirms completeness, suggests next ticket.

Clear context (repo + ticket file) is crucial; kill session between loops to force fresh reads. Dumbest implementation: while true; do claude 'implement ticket'; done. Early plugins auto-reprompted on stop-hook but lacked context; now manual/shell loops suffice.

'The AI would often review its code and realize it had missed something... it go oh yeah I should have fixed that bit.'

Principle: AI's self-evaluation emerges from re-reading own work; repetition exploits stochastic variance for completeness. Avoids common pitfall: single-shot incompleteness in older models.

Hands-On: Build a Ticket-Processing Loop

Workshop repo (github.com/chrismdp/pomodoro-workshop): Python CLI timer + tests + tickets folder. Prerequisites: Python, Claude desktop/Cursor, basic Vim/git familiarity (audience: coders using Claude for 50%+ code).

Steps to replicate:

Clone repo, pip install -r requirements.txt (minimal), test: python -m pytest.
Run: python pomodoro.py → pomodoro start.
Open Claude: 'Implement doc/tickets/001' (add status).
Verify: git diff, pytest, run CLI.
Repeat prompt 2-3x: Watch self-corrections.
Extend: Loop over tickets (e.g., bash: for t in doc/tickets/*.md; do claude "implement $t"; done).

Quality criteria: Tests pass, CLI works end-to-end, no regressions. AI often adds unprompted tests—'what is the world coming to?' Fits mid-workflow: After ticket writing, before multi-ticket agents.

Pitfalls: WiFi drops mid-Claude (tether!); over-repetition wastes tokens (stop when '100% done'). For non-code: Emails, calendars—same loop.

'Dumb loops beat clever workflows. Most teams... reach for multi-agent orchestration... Then they spend months debugging them.'

Self-Improving Loops and Synthetic Feedback

Basic loops plateau; evolve with critique:

Post-run eval: After task, prompt: 'Review output, update instructions with improvements.' Claude tweaks skill/prompt (e.g., 'add edge-case handling').
Synthetic data: Generate fake tickets/feedback locally—no prod wait. Loop: Produce → Critique (score 1-10, explain fails) → Retry.
Full cycle: Process ticket → Test → Eval → Improve prompt → Next ticket. Ties to BMAD (Build-Measure-Analyze-Decide): Ralph iterates each stage.

Demo evolution: After ticket, 'update skill with what you should have done differently.' Yields better drafts over runs. For production: 24/7 loops on real tickets (speaker runs for client work).

Models matter: o1/Claude 3.5+ for reasoning; older need more loops. Cost: Cheap vs. orchestration dev time.

'A single loop that processes one ticket at a time, evaluates its own output, and improves on the next run will outperform all of it.'

Key Takeaways

Start every AI task with a Ralph loop: Repeat 'do X' 3x max; catches 90% incompletes.
Use file-based tickets (Markdown) for context; repo root + doc/tickets/NNN.md.
Verify via tests/diffs; prompt AI to run them.
Self-improve: End sessions with 'update instructions based on this run.'
Local synth data: Generate tickets/feedback to iterate offline.
Pick latest models (Claude 3.5 Sonnet+, GPT-4o+); loops ship where agents debug forever.
Apply beyond code: Newsletters, emails—any promptable task.
Bash-ify: while ! grep -q 'done' output; do claude 'implement'; done.
Trade-off: Token burn vs. zero orchestration; scales to solo bootstraps.
Practice: Fork Pomodoro repo, add 5 tickets, loop to MVP.