Codex's Computer Use Automates Any Screen-Based App

Codex Delivers Reliable Background Automation

Codex operates any Mac app by observing the screen, clicking, and typing like a human, but runs multiple agents in parallel without hijacking your cursor or focus. This enables workflows like mass-clearing Slack inboxes (triaging hundreds of bot messages), building Spotify playlists from descriptions, catching UI regressions in front-end apps, reproducing browser bugs with screenshots pasted into PRs, running and self-fixing end-to-end tests, or automating legacy dashboards without APIs. Users report real adoption: daily Git/commit/issue/calendar recaps written to Notion with to-dos in Apple Reminders; background login routines; webcam-based slouch detection triggering stretch videos.

Compared to Claude, Codex finishes tasks in 2 minutes versus Claude's 5-6 minutes, moves at near-human speed in familiar software, and handles any desktop app (not just Chrome). GPT-5.4's native computer use scores mid-70s on OS World benchmark, surpassing human baseline for GUI control. Reliability stems from backing up on unexpected modals without fumbling, enabling hands-off execution—you queue 3-4 tasks, walk away, and return to completions.

Chronicle (research preview for ChatGPT Pro on Mac) enhances this by periodically capturing screens, processing on OpenAI servers, and generating local Markdown memories for context. This trains agents on your workflows, app preferences, and muscle memory, though it sends unencrypted captures (unavailable in EU/UK/Switzerland).

OpenAI Builds Universal Bodies, Anthropic Bets on Ecosystems

OpenAI views models as brains and prioritizes "bodies" for real-world action. Codex's body uses graphical interfaces directly—no APIs needed—covering all screen-based software, including legacy enterprise tools, unmaintained internal apps, and vendor portals. Agents auto-select tools (files, plugins, browser, code) based on outcomes, minimizing mode-switching friction.

Anthropic's Claude focuses on knowledge work (synthesis, research, analysis) via structured interfaces: co-work (point at folder for multi-step tasks), MCP servers, 30k+ cloud integrations, plugins, and Conway (leaked always-on environment with sidebar UI, webhooks, extensions, browser control). This excels where ecosystems provide agent-ready hooks but falters on long-tail software without them.

Trade-offs: Anthropic's explicit scopes/permissions ensure deliberate control but add friction; OpenAI's implicit approach assumes users describe outcomes, letting agents escape to computer use. OpenAI doesn't require vendor cooperation; Anthropic needs MCP adoption to scale.

Acquisitions drive edges: OpenAI bought 12-person Software Applications Inc. (creators of unreleased Sky Mac AI interface) in Oct 2025—team from Workflow (now Apple Shortcuts) and Apple vets (Safari, WebKit, etc.)—enabling seamless Mac integration like non-robotic motion paths and permission handling. Anthropic's Recept buy sped Windows control.

Future Bets and Practical Choices

Both converge on persistent, ambient, event-driven agents across devices. OpenAI's path: agentic platform, computer work, personal AGI; cuts like Sora/drug discovery to focus. Monetizes compute via super-apps (ChatGPT for chat, Codex for agents). Anthropic pushes MCP for standards.

Watch: Conway announcement (validates ecosystem bet or signals pivot); MCP velocity (e.g., Salesforce integrations—if thin wrappers fail, UI-driving wins).

Use Codex for cross-app ops, legacy tools, parallel long-runs (Slack/email triage, bug repro, visual testing)—gap widens with Chronicle. Lean Claude for scoped knowledge work with integrations or dev-friendly coding (multi-agent deploys). Run both; Codex defaults for interface-friction bottlenecks, now automating anything with a screen.