Codex /goal Autonomously Shipped 14/18 Features Overnight

OpenAI's Codex /goal CLI implemented 14 of 18 backlog features solo in 18 hours for $4.20 ($0.30/feature), running without human approvals by using soft stops and self-summarization.

Breakthrough in Hands-Off Feature Delivery

OpenAI's Codex CLI 0.128.0 /goal command enables fully autonomous execution of complex tasks. Typing /goal ship the 18 features in BACKLOG.md before standup triggered the agent to plan, implement 14 of 18 features, pass CI builds, open PRs, and self-review them using GPT-5.5 sub-agents—all without human intervention over 18 hours. This cost $4.20 in credits via ChatGPT Plus, equating to $0.30 per shipped feature. The result: production-ready code waiting for merge, transforming backlog clearance into a fire-and-forget process.

To replicate, reference a clear backlog file like BACKLOG.md and set a deadline like 'before standup.' The agent's planning phase sets up the work, then it iterates independently, proving viable for real workloads where prior agents fail.

Why /goal Outperforms Other Coding Agents

Unlike Claude Code with Sonnet 4.6, Cursor Composer 2, Aider with DeepSeek V4, or Grok 4.3 long-horizon—which require permissions for deps, installations, or stall on context limits—/goal operates at 'soft stop' boundaries. It self-summarizes to manage context, avoiding hard stops, and continues without pings. This long-horizon autonomy stems from more than extended prompts: it's designed for uninterrupted runs, making it the first agent that 'genuinely doesn’t need you.'

Benchmarks across 2024 agents confirm this edge; others demand frequent human input, fragmenting workflows, while /goal sustains momentum through internal checkpoints.

Reshaping Daily Engineering Workflows

Integrate /goal to offload routine shipping: assign backlogs overnight, reclaim time for high-level planning. It shifts workdays from micromanaging agents to strategic oversight, with green CI/PRs ready at open laptop. Trade-off: relies on precise goal phrasing and backlog clarity; unmerged PRs still need final human review for edge cases. For AI engineers, this validates Codex as a production shifter, prioritizing autonomy over hype.

Summarized by x-ai/grok-4.1-fast via openrouter

3987 input / 2149 output tokens in 15302ms

© 2026 Edge