In shortAnthropic's Claude Mythos Preview scores 93.9% on SWE-bench verify—beating rivals by 13+ points—but is restricted to partners like Apple due to zero-day vulnerability discovery risks.
Claude Mythos Tops Benchmarks But Stays Locked for Security
Filed by Department of Product · Published
1 MIN READ · SUMMARY
Video description
Anthropic has revealed Claude Mythos Preview — a new frontier model it's calling too powerful for public release. Instead, it's being made available exclusively to a select group of partners including Apple, Google, Microsoft, and NVIDIA under an initiative called Project Glasswing.
We also cover Meta's internal "Claudeonomics" leaderboard turning token usage into office status, new data on GitHub commits exploding 14x year-on-year, Perplexity's ARR surging past $450M, and Google's Product Director making the case that Go-to-Market is becoming the essential skill in the AI age.
➡️ Subscribe for weekly product briefings and more analysis: https://departmentofproduct.substack.com
Follow on Substack Notes: https://substack.com/@richholmes
🔗LINKS
Project Glasswing announcement — https://www.anthropic.com/glasswing
Claude Mythos Preview system card — https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf
Felix Rieseberg on Mythos being a "step function change" — https://x.com/felixrieseberg/status/2041586309966524919
Simon Willison on why the pause "sounds necessary" — https://simonwillison.net/2026/Apr/7/project-glasswing/
Ethan Mollick on security risks — https://x.com/emollick/status/2041578945531830695
Meta's internal AI token leaderboard — https://www.theinformation.com/articles/meta-employees-vie-ai-token-legend-status?rc=77sebk
Jensen Huang on token spending — https://embed.businessinsider.com/jensen-huang-500k-engineers-250k-ai-tokens-nvidia-compute-2026-3
Zapier's AI fluency framework — https://x.com/wadefoster/status/2038979630590509553
Linear's COO on token-maxxing — https://x.com/cjc/status/2041299419845599489
Google's Product Director on GTM as the essential skill — https://x.com/jacalulu/status/2041160452672004189
The SaaS chat bar trend — https://x.com/rabi_guha/status/2040082295563169852
Simon Willison on GitHub commits — https://simonwillison.net/2026/Apr/4/kyle-daigle/
Ramp: monthly AI spend grew 4x — https://ramp.com/3-steps-to-manage-ai-spend
Perplexity ARR tops $450M — https://ca.finance.yahoo.com/news/perplexity-arr-tops-450m-pricing-132500539.html
AI and software engineering jobs — https://www.businessinsider.com/ai-isnt-killing-software-coding-jobs-booming-trueup-2026-4
Substack article on new product development processes - https://departmentofproduct.substack.com/p/the-new-product-development-operating
Claude Mythos Preview achieves 93.9% on SWE-bench verify (vs. 80.8% Claude Opus 4.6, 80.6% Gemini 3.1 Pro) and 77.8% on tougher SWE-bench Pro (24-point lead over GPT 5.4/Opus 4.5). This enables finding thousands of zero-days across OSes/browsers, including a 27-year-old OpenBSD remote crash flaw, 16-year-old FFmpeg bug missed by 5M tests, and Linux privilege escalation. Anthropic's $100M-token Project Glasswing limits access to Apple, Google, Microsoft, NVIDIA for defensive patching, prioritizing safety over public release—experts like Simon Willison call the pause necessary, Ethan Mollick predicts more such restrictions. Product teams gain a prompt to audit codebases aggressively, but expect accelerated AI adoption once widened, elevating security audits for CTOs.
Meta's Claudonomics leaderboard ranks 85K employees by token use, awarding 'token legend'/'session immortal' badges to top burners, turning consumption into prestige. Nvidia's Jensen Huang flags alarm if $500K engineers don't burn $250K tokens yearly, as upfront AI investment cuts long-term costs. Zapier measures hires on token use/AI fluency; Linear COO critiques it like ranking marketers by spend. Use token-maxing to justify AI budgets—track ROI via saved dev time—but pair with output metrics to avoid waste, as Mythos could spike usage further.
Google Product Director argues AI eases building, shifting focus to 'should you build?' and vertical-specific GTM: tailor landing pages, onboarding, defaults, suggestions via generative AI for personalized experiences. SaaS trend: chat bars (Linear, PostHog, Tier) replace static homepages, admitting one-size-fits-all UIs fail diverse users—next: agents composing interfaces. Builders prioritize GTM roadmaps with AI personalization to cut acquisition costs 2-3x over generic funnels.
GitHub commits hit 275M/week (14x YoY, on pace for 14B yearly vs. 1B in 2025); AI PRs 4x to 17M in 6 months; Claude commits 25x to 2.5M/week. Ramp data: AI spend 4x YoY, 15% of software budgets. Perplexity ARR jumps to $450M+ (from $305M) via 'computer' feature orchestrating models for projects. Despite 52K Q1 layoffs (AI-linked), 67K software jobs open (+30% YoY, highest in 3+ years). Ship faster by integrating agents into repos—Perplexity proves multi-model coordination drives PMF at scale.