Anthropic Leaks Mythos: Top Claude Amid Cyber Risks
Anthropic's leaked Mythos model tops Opus in reasoning/coding/cyber; Meta's Tribe V2 predicts brain activity from media; Gwen Claw self-evolves for tasks; Alibaba's C950 CPU boosts agent inference 30%.
Mythos Exposes Claude's Next Tier and Cyber Threats
Anthropic's accidental leak of ~3,000 assets revealed Claude Mythos (internal: Capiara), a new model tier above Opus, Haiku, and Sonnet. Already trained, it's in early access for organizations, delivering step-change gains in reasoning, coding, and cybersecurity—Anthropic's most capable system yet. High compute costs delay public release. Key risk: superior cyber skills enable faster vulnerability exploits than patches, based on real incidents like a Chinese state-linked group using Claude to hit 30 organizations (tech, finance, government) over 10 days. Strategy: limit to cyber teams for defense prep, plus enterprise events like a UK CEO retreat with policymakers. Trade-off: power boosts attacks, demanding controlled rollout over broad access.
Tribe V2 Predicts Brain Responses Across Modalities
Meta's Tribe V2 unifies video/audio/text to forecast fMRI brain activity, trained on 451.6 hours from 25 people (movies, podcasts, videos) and evaluated on 1,117.7 hours from 720. It models 20,484 cortical points and 882 subcortical voxels over 100-second windows, using Llama 3.2 3B (text), V-JEPA 2 Giant (video), and Wav2Vec 2.0 (audio) via transformers. Beats prior methods; zero-shot on new subjects hits group correlation ~0.4 on Human Connectome 7T (2x median real data). One-hour fine-tune per new user yields 2-4x better than linear models. Applications: in-silico experiments recover brain landmarks (e.g., fusiform face area, PPA for places, Broca's for language); final layer self-organizes into auditory/language/motion/default/visual networks. Impact: simulates experiments cheaper/faster than real fMRI.
Gwen Claw Evolves for Reliable Task Execution
Gwen Claw agent fixes agent failures in dynamic tasks (e.g., iterative Excel edits) via 3-layer memory: stable identity (broad context), long-term background (history), dynamic trajectory (live state). Context slimming prunes junk to cut token costs and stabilize long runs. Runs in real local browsers (cookies/logins intact) vs. isolated demos. Self-evolution loop: log failures/feedback, analyze roots, optimize for retries—improves over use, not fixed post-launch. Integrates with Huawei Celia, Telegram, WhatsApp; supports private deploys. Outcome: handles pauses/reorders/inserts without resets, bridging chat smarts to production execution.
Alibaba C950 CPU Targets Agent Inference
Alibaba's RISC-V-based SchwanC950 CPU optimizes multi-step agent inference (sequential workloads) over GPU training focus, claiming 30%+ gains vs. mainstream via customization. For data centers, not direct sales—bolsters Alibaba Cloud amid US chip curbs. Builds on T-Head's XuanTie C910; enhances supply chain control/cost resilience without royalties (vs. ARM). Value: sustains agent services under hardware constraints.