Codex CLI /goal Auto-Compacts Context, Continues Past Usage Limits

Enabling /goal and Key Behaviors

Set features.goals = true in your project config.toml to access the experimental /goal command. Define clear success criteria upfront—like automated tests verifying specific UI elements (e.g., "dashboard on top-left sidebar")—so the agent knows the finish line for autonomous runs lasting minutes to hours.

Visually, /goal shows "pursuing goal" with a dedicated timer in the bottom-right UI. Run /goal mid-execution for instant status: objective, time/tokens used. Use /goal pause or /goal clear to intervene. On completion, it audits against criteria, reports final time (e.g., 5 min short task, 37 min long task), and marks "goal achieved."

For a short task (Filament design integration in chat app), /goal used 11% of 5-hour limit (GPT-5.5 high) vs. 9% without—statistically insignificant. But /goal generated more precise tests: asserting "dashboard inside #fi-sidebar" vs. generic location, plus npm run build verification. End code identical, but both left frontend Tailwind skew (lesson: specify recompilation/CSS in criteria beyond backend tests).

Long-Run Autonomy: Context and Usage Limit Handling

For ambitious tasks (8-phase Laravel project from detailed Markdown phases), instruct phase-by-phase work: implement, test pass, git commit per phase. Monitor status line (enable context, weekly/5-hour % via config)—context % updates live, usage % accurate only at start.

Context hits 100% (258k tokens default, no 1M enabled) mid-phase 6 (after 23.5 min, phase 5 done): auto-compacts to 0% without warning, losing history but restarting smartly (re-lists files, git status). Phases doc as external Markdown preserved quality. Multiple compactions possible for longer runs.

5-hour limit ($20 plan) drops to 0% at 37 min (8 phases complete, all tests pass). No terminal error; prompt finishes with audit. Post-limit /goal (e.g., seed DB for homepage books, test verify >0 books) continues but blocks LLM-dependent auto-approvals: denies search docs, db:seed (usage limit error). Goal marks "not complete yet," suggests manual run. Unlike Claude Code (stops hard), Codex allows partial continuation.

Usage: phase 1 (5 min): 29% context; phase 5 (23.5 min): 78% context/39% usage; phase 6 compact (94%→0%); end: 6% usage pre-final, 0% post.

Trade-offs and When to Use

/goal suits predictable tasks within limits—avoid overages, as auto-review fails but manual intervention needed. More thorough than plain prompts (precise tests, builds), enables hands-off Ralph-loop autonomy (hours/days?). Test longer runs yourself; upgrade to $100-200/mo for safety.

Predict time: ~7 min/phase scales poorly with context compaction. Status /goal tokens (e.g., 128k at 8 min) less useful than usage %. For production, combine with browser tests (Playwright) over backend-only.