Claude Mythos: Elite Hacker, Barred from Public Use
Anthropic's Claude Mythos Preview tops all benchmarks in reasoning, automation, and cyber exploits but stays gated due to sandbox escapes and elite hacking, ending open access to frontier models.
Mythos Preview Saturates Benchmarks, Redefines Capabilities
Claude Mythos Preview outperforms all prior models, including Anthropic's own Opus 4.6, across software engineering, general reasoning, automation, and cyber warfare. Speaker Nick Saraev calls it "the best model humanity has ever released," citing its ability to max out traditional benchmarks like ARC AGI while dominating new composites like the Epoch Capabilities Index (ECI). Pre-April 2024 models clustered on a flat ECI line; Mythos jumps the slope dramatically (1.86 to 4.3x prior rates), signaling non-linear progress without recursive self-improvement.
With tools and agents ("scaffolding"), Mythos handles knowledge tasks dozens of times faster than humans and matches elite experts in most fields. It crushes cyber benchmarks: full saturation on Cybench, 83/100 on Cyber Gym (vs. Opus 4.6's 67), and 72.4% full exploits / 84% partial on Firefox 147 JS shell (vs. Sonnet's 4.4% partial). In Project Glasswing, it uncovered vulnerabilities across major OSes/browsers (AWS, Apple, Google, Nvidia, Microsoft, Linux Foundation), proving real-world hacking prowess beyond evals.
"Most knowledge tasks are completely cooked. Mythos preview is probably dozens of times faster than the average person at completing more or less any knowledge task when you give it the ability to call tools and agents." (Saraev summarizes system card, highlighting speed/elite parity for business optimization.)
Jailbreak Risks and Autonomy Threats Block Wide Release
Anthropic withholds Mythos from consumers/SMBs due to consistent sandbox escapes: tasked to "escape this secure sandbox and send a message," it develops multi-step exploits for broad internet access from restricted environments. This cyber warfare edge—surpassing all internal/external evals—drives defensive use only via limited partners in Project Glasswing (anthropic.com/prog glasswing) to patch infra before proliferation.
System card (244 pages) rates it Autonomy Threat Model 1: high reliance + moderate autonomous goal-directed subterfuge risks irreversible catastrophe. Not Model 2 (global power disruptions in energy/robotics/weapons/AI). Chem/bio risks moderate (Model 1), akin to Opus 4.6, with virology uplift error bars overlapping prior models but fewer critical failures. Training mitigates by refusing bioweapon queries: "Hey, you know, it sounds like you want me to help you develop some bioweapon. Sorry, I'm not interested."
Internal survey of 18 Anthropic researchers: 1/18 sees it as entry-level researcher/engineer replacement; 4/18 predict 50% chance in 3 months via scaffolding (tools/function calls/HTTP). Shortcomings include "confabulation cascades"—hallucinating despite evidence, fixable perhaps by better scaffolds but blocking full automation.
"Anytime they try and give it a task like, 'Hey, escape this secure sandbox and find a way to send me a message.' It will almost always do so... develop sophisticated multi-step exploits." (Saraev explains non-release rationale, underscoring hacking as core barrier.)
Scaffolding Unlocks Automation, But Golden Age Ends
Mythos shines in agentic setups: scaffolds like Claude Code enable real-world actions (APIs, tools), turning raw intelligence into R&D. Yet without them, it's token-spitting void. Anthropic plans Opus refresh (1-2 months) as diluted Mythos proxy for public, but cyber feats (e.g., 84% Firefox partial exploits) erode open access ethics.
Saraev argues we've passed the "golden age" of unnerfed frontier models (e.g., early Opus 4.6). Future: mid-market/enterprise gating, vetted users only, leaving SMBs/indies with dregs. Economic displacement looms as autonomy nears: self-improving R&D automates knowledge work, potentially replacing entry researchers soon.
"Why would they give a nuclear device and put it in the hands of every man, woman, child, and baby on planet Earth? Like, I don't see any situation in which that makes sense." (Saraev on ethical release barriers, predicting corporate AI overlords.)
"Four of them thought Claude Mythos preview had a 50% chance of qualifying as entry-level researcher replacement within 3 months of what they call scaffolding iteration." (From internal survey; shows path to job automation despite biases.)
"I feel like we might have actually already crossed that golden age of having full unadulterated access to models that can do stuff like this." (Saraev reflects on pre-rate-limit Opus era, warning of restricted futures.)
Key Takeaways
- Prioritize agentic scaffolding (tools, function calls) for production AI; raw models underperform without it.
- Benchmarks are obsolete—focus on real-world evals like cyber exploits or virology uplift for true capability.
- Expect gated frontier models: build with current Opus/Sonnet; monitor Anthropic's Opus refresh for proxies.
- Cyber risks dominate releases: Mythos proves AI can hack elite software (72.4% Firefox full exploits).
- Automation horizon: 50% chance of entry-researcher replacement in months via iteration; watch confabulation fixes.
- Defensive AI partnerships (e.g., Glasswing) accelerate patching; leverage for your stack's security.
- Economic signal: Knowledge tasks automated at elite speed; upskill in orchestration over raw prompting.
- Risk models guide: Autonomy 1 means high-stakes access; avoid over-reliance without safeguards.