Codex: Build Full SE Systems with Agents & Plugins
Transform Codex from code assistant to complete software engineering agent using frontier models, plugins for tools like Playwright/ImageGen, automations for Slack/Gmail, and subagents for parallel code review/debugging—demos show building games and syncing data autonomously.
Codex Architecture: Models Power a Unified Agent Harness
Codex operates as a full software engineering agent, not just a code writer—it explores codebases, runs commands/tests, and handles engineer workflows. Built on frontier models like GPT-5.3 (previous), Spark (fast variant), GPT-5.4 (state-of-the-art), GPT-5.4 Mini (new, for short tasks/subagents). Improvements include websockets for 1.75x faster tokens and Fast Mode for 2x more speed on top. A unified agent harness wraps models for tool execution, environment setup, behavior evaluation, and embedded safety.
Interact via Codex app (projects/work trees for multi-tasking without context switches, native Git support, Mac/Windows sandboxes), CLI, IDE extensions, Slack/GitHub. App supports work trees: e.g., separate branches for features/bugs/Q&A in one project. Recent features: better automations, mini models for cost-efficient subagents, plugins bundling skills/apps/MCP servers.
"Codex is our open software engineering agent. So it's not just a coding agent. It can do much more than that. It can run commands. It can run tests. It can explore code bases. It can really do everything that a software engineer would do."
Key principle: Model-harness flywheel—better models + faster serving directly boost all surfaces. Trade-off: Larger models excel at long/complex tasks; minis for quick/parallel ones. Prerequisite: Basic OpenAI API familiarity; workshop assumes laptops for following demos.
Plugins: Bundle Skills, Apps, and MCPs for Reusable Workflows
Plugins package skills (reusable instructions/scripts/resources for repetitive processes), apps (connections to services like Notion/Linear/Figma), and MCP servers (expose external tools) into installable bundles for nuanced model matching. Avoid manual setup—add one plugin, get everything.
Demos:
- Game Studio Plugin: Bundles Playwright Interactive (headless browser for clicking/navigating/screenshot analysis) + ImageGen (asset generation). Prompt: "Build platformer game with brick platforms." Codex generates sprites (e.g., 5 character variants), assembles game, debugs visually. Took ~1 hour autonomously; output: playable game with custom assets. Iterate by feeding personal images.
- Google Drive Plugin: Access Drive spreadsheets. Analyzed codebase YAML (57 Codex events), updated sheet with name/date/city in 2 minutes.
Create skills on-the-fly: Ask Codex to package workflows. For web/game dev, pre-built plugins save repetition. Principle: Visual tools like Playwright fix blind code changes—agent sees/interacts with UI. Common mistake: Over-relying on text prompts without visuals; use interactive browser to verify.
Quality criteria: Plugins should reduce setup time, enable end-to-end (e.g., gen → debug → deploy). Exercise: Install Game Studio, prompt a simple app/game; inspect work tree.
"Skills are essentially reusable instructions packaged for specific processes... every time you have a sort of neat workflow that is always the same, you can package that into a skill."
Automations: Background Cron Jobs with App/Plugin Integration
Set non-interactive tasks to run scheduled/background: Connect apps/plugins, define instructions, frequency (e.g., daily 9AM), project. Codex executes autonomously.
Demos:
- Slack: Daily summary of replies (flag time-sensitive/urgent), topic-bucketing since yesterday, important channels alert. "Check messages I should reply to... bucket per topic."
- Gmail: Scan for legit/time-sensitive replies amid high volume—saves hours/day.
- Custom: "Create automation to scan Slack for Codex use cases, list for website." Codex proposes popup for approval/scheduling.
Manual setup: Select apps (Slack), instructions, frequency, project. Runs in app sandbox. Principle: Offload repetitive monitoring/data tasks; combine with codebase access for syncs (e.g., repo → Drive). Trade-off: Live demos can be chatty—use Spark for speed.
Common mistake: Vague instructions—specify bucketing/prioritization. Fits early in workflow: Automate intake before manual review.
"Automations is again something that you can just set up using apps... set it to run on a scheduled time. So for example... every day at a certain time and it's just an instruction that Codex will run in the background."
Subagents and Parallel Execution: Custom Personas for Speed/Safety
Subagents parallelize tasks with specialized models/permissions/tools/personas. Use minis for cost/speed on short runs; mains for complex. E.g., spawn subagents for review/research/debug while main oversees.
Demos: Review persona files—subagents handle parallel checks. Custom creation: Define model (e.g., Mini), tools, permissions. Bleeding-edge: Guardian approvals (human gate for actions), hooks (custom triggers), personality settings.
Code Review: GitHub integration—explores/pulls, suggests fixes. Security: Cloud Code plugin, native sandboxes (Windows first). 3M weekly users (tripled since Jan).
Principle: Parallelism scales solo work; personas enforce safety (e.g., read-only subagents). Mistake: No permissions—risks unsafe executes. Quality: Measurable speed/cost wins; evaluate via work trees.
Exercise: In app, spawn subagent for bug hunt; approve via Guardian.
"Subagents... allow you to parallelize a particular feature or bug request... at a faster rate all whilst making sure that you don't pay as much cost."
Key Takeaways
- Start with Codex app for multi-project/work tree support; CLI/IDE for targeted use—reduces context switches.
- Install plugins like Game Studio/Google Drive to bundle visuals/data tools; prompt end-to-end (gen → test → sync).
- Build automations for daily drudgery (Slack/Gmail summaries)—specify priorities/frequency for reliability.
- Use subagents with Mini models for parallel review/debug; set custom personas/permissions for control.
- Leverage Fast Mode/Spark for speed; always embed safety via harness/Guardians—test in sandbox.
- For games/web: Combine ImageGen + Playwright Interactive; iterate visually, not just code.
- Scale with GitHub/Slack integrations; monitor via work trees for quality.
- Experiment: Recreate demos on your repo—measure time saved vs. manual.