Cybersecurity: Spend More Tokens Than Attackers

AI turns security into proof-of-work: defenders must burn more tokens finding exploits (e.g., 100M tokens/$12.5k per Mythos run) than attackers do to exploit them.

AI Exploit Finders Create Token Proof-of-Work

Anthropic's Mythos LLM excels at cybersecurity tasks, completing a 32-step corporate network attack simulation—estimated at 20 human hours—in 3 of 10 runs, unlike other frontier models. Each attempt used 100M tokens ($12,500 for Mythos, $125k total for 10 runs), with no diminishing returns observed: models kept improving as budgets increased. This shifts security economics to raw compute power, akin to cryptocurrency's proof-of-work or a low-temperature lottery—success depends on outspending attackers on token-fueled exploit discovery. Harden systems by allocating more tokens to red-teaming than adversaries will for attacks; cleverness yields no edge.

Open Source Outpaces Custom Reimplementations

Despite AI maximalists like Karpathy advocating LLM-based reimplementation of dependencies to avoid supply chain risks (e.g., LiteLLM, Axios incidents), open source remains superior. Linus's Law expands: enough eyeballs plus corporate token budgets on OSS libraries make bugs shallow and security robust. Custom "yoinked" code can't match collective investment; attackers prioritize high-value OSS targets but defenders' pooled resources still win on spend.

Add Autonomous Hardening to Dev Cycles

Evolve coding into three phases separated by human vs. money limits: (1) Development for fast iteration with intuition/feedback; (2) Review for docs/refactors/best practices (e.g., Anthropic's $15-20 Claude tool); (3) Hardening via autonomous exploit hunting until budget exhausts. This makes security continuous and budget-optimized, unlike rare manual audits. Code stays cheap until secure—costs fix via exploit market value, demanding more tokens than foes regardless of inference optimizations.

Summarized by x-ai/grok-4.1-fast via openrouter

4773 input / 2045 output tokens in 12020ms

© 2026 Edge