Anthropic's Mythos: Major LLM Leap Confirmed
Anthropic's Claude Mythos delivers dramatic gains in coding, reasoning, and cybersecurity over Opus, but prioritizes cautious rollout via early access for risk assessment.
Mythos Marks Capability Jump with Safety-First Release
Anthropic confirmed a leaked draft blog post revealing Claude Mythos, described as their most powerful model to date—a 'step change' over Claude 3 Opus. It scores dramatically higher on software coding, academic reasoning, and cybersecurity benchmarks. Named to evoke interconnected knowledge, Mythos is larger and more compute-intensive, making it expensive to run. Instead of rapid release, Anthropic starts with early access for a small group of customers focused on cybersecurity applications, gathering real-world risk data before broader rollout. This gradual approach addresses potential near-term cyber threats and efficiency challenges. The leak exposed nearly 3,000 unpublished Anthropic assets, fueling speculation—Mythos may have been codenamed 'Capiara' earlier.
Voice AI Evolves to Human-Like Conversations
Google's Gemini 3.1 Flash Live introduces continuous, real-time dialogue to voice models, fixing turn-based stilted interactions and poor interruption handling. It excels on audio benchmarks, including multi-step function calling for agentic actions like processing alphanumeric product codes in noisy environments. Deployed by customers like Home Depot, it boosts complex command handling. Implications include superior mobile voice agents—potentially powering Apple's upgraded Siri—and ending frustrating experiences with devices like current Siri.
Practical AI Tools and Strategic Shifts
Shopify launched Tinker, a free mobile app with 100+ AI tools for e-commerce merchants, generating logos, product photos, and ad videos. Organized by outcome with example previews, it uses natural language and reference images to auto-generate prompts, slashing learning curves and friction. This could normalize AI positively among small businesses by boosting income—e.g., 30% monthly rises—amid rising entrepreneurship.
OpenAI upgraded Codex with plugins for pre/post-coding tasks like planning and workflows, resetting usage limits across plans to counter Anthropic's tighter 5-hour session caps during peak hours (5-11am PT weekdays). OpenAI shelved 'adult mode' indefinitely due to 12% age detection failure, staff concerns over emotional dependence, and advisory council opposition—focusing on coding/enterprise instead. This avoids high costs for marginal upside in a crowded adult AI space. Rumors swirl of Anthropic's Q4 IPO push, pressuring OpenAI to go public first; killing underperforming projects like Sora shows smart sunk-cost avoidance amid heating competition.