Equivalent Speed but Varied Code Quality in Generation
Both Codex (on GPT-4o?) and Claude Code generated a fresh Laravel app implementing a 'teams' feature—hiding categories across teams, visible only within—from a commit-specific prompt on unreleased functionality. Claude finished in 8 minutes, Codex in 9. Core logic worked: Team 2 couldn't see Team 1 categories. Codex edged UI with grouped menu items, cards, and borders; Claude's was plainer. Edit Claude's auto-shortened prompts via Ctrl+G Vim mode for precision.
Cross-Reviews Expose Asymmetric Bugs and Security Gaps
Claude reviewing Codex code flagged 12 issues:
- Critical: Category deletion silently cascades to delete all posts; no delete confirmation.
- Performance: Excessive DB queries for team data (e.g., repeated checks for access).
- UX/Features: No pagination (debatable preference).
- Security: No max validation on team ID (fillable, risky if mishandled).
- Best practices: Mix of Flux UI and Livewire components; unused import; potential slug uniqueness gaps.
Codex reviewing Claude code found 6:
- Critical: Posts accept category IDs from any team via direct POST (UI hides them, but backend lacks validation—test by switching teams and POSTing).
- Reliability: Team detection inconsistent (URL vs. user session), risking 404s.
- Validation: Weak category uniqueness; factory issues.
- Other: Missing tests, questionable assumptions, suggested refactors.
Claude spotted more (12 vs. 6), but bugs were unique per reviewer—e.g., cascades and confirmations vs. cross-team exploits.
Second LLM Opinions Mimic Pair Programming for Better Code
Don't favor one model; different LLMs catch blind spots from training/approaches. Run 'plan mode' on both before implementing to align. Like human reviews, it doubles time (reviews: Claude 1:13 min, Codex ~2x slower) and API costs—use separate agents or multi-model tools like OpenCode. Always seek a second opinion: generates diverse fixes, enforces validations, and prevents silent disasters. Test in your projects—which model reviews best?