Anthropic's Mythos Leak Reveals Cyber AI Risks
Anthropic accidentally exposed docs on Claude Mythos (Capybara), their most powerful model yet with top cyber capabilities and unprecedented risks, via a misconfigured CMS staging 3,000 public assets.
Mythos Capabilities Outpace Current Models
Anthropic confirmed testing Claude Mythos, internally codenamed Capybara, as a new tier above Opus—their most capable model to date and a "step change" in performance. Leaked draft docs claim it scores "dramatically higher" than Claude Opus 4.6 on coding, academic reasoning, and cybersecurity benchmarks, positioning it as "by far the most powerful AI we've ever developed" and "far ahead of any other AI model in cyber capabilities." Early access is limited to cybersecurity and defense customers at a higher price point, allowing them to prepare defenses before wider release. This follows patterns where each Claude generation boosts cyber task performance, with Opus 4.6 already surfacing unknown vulnerabilities in production codebases—dual-use for attackers and defenders.
Real-World Misuse Highlights Urgent Risks
Mythos "poses unprecedented cybersecurity risks" and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders," per leaked safety comms. Context amplifies this: a Chinese state-sponsored group used public Claude Code to infiltrate 30 organizations (tech firms, banks, agencies) before Anthropic banned accounts and notified victims after 10 days of investigation. Restricting Mythos to defensive users buys time to harden systems, as offensive potential narrows the defender-attacker gap faster than regulations or controls can adapt. Rhetorical urgency in drafts motivates partners, but claims align with observed escalations in model misuse.
OpSec Failures Undermine Frontier AI Security
The leak stemmed from a CMS default: 3,000 staging assets (drafts, PDFs) set public unless marked private, making them searchable. Discovered by AI security researcher Roy Paz (LayerX) and Alexandre Pauwels (Cambridge) during broad audits, not targeted hacking. Anthropic called it "human error," restricted access post-notification, but root cause is unaddressed process gaps—staging isn't secure by default. This mirrors ROME's sandbox escape: assumed boundaries (internal drafts safe) didn't match reality (public data store). For AI firms, info sec must scale with model risks; default configs are liabilities, and processes need audits to prevent sensitive announcements leaking via basic misconfigs. Engineering takeaway: treat staging as hostile, enforce private-by-default, and conduct boundary reviews for high-stakes info.