OpenAI's GPT-OSS: Open-Weight MoE Models for Local Agents

GPT-OSS Delivers Agentic Reasoning to Consumer Hardware

OpenAI's first open-weight models since GPT-2, gpt-oss-120B and gpt-oss-20B, use Mixture-of-Experts (MoE) architecture with wide-vs-deep design, bias units in attention, and a custom swiglu variant. Trained on 2.1 million H100 GPU hours, they match o4-mini reasoning for agentic tasks under Apache 2.0 license. Run 120B on 60GB GPUs (desktops) or 20B on 12GB (phones); community tests show 30 tokens/sec on gpt-oss-120B, 17 TPS quantized, and >45 t/s via llama.cpp's -n-cpu-moe on GGUF. Test via gpt-oss.com playground. New Harmony format (open-sourced on GitHub) updates ChatML with message channels for structured outputs.

Trade-offs: Strong on reasoning but risks hallucinations, per @sama, @rasbt, @SebastienBubeck evaluations. Model card notes ~117B active params for 120B variant.

Claude 4.1 Opus and Genie 3 Push Coding and Simulation Limits

Anthropic's Claude 4.1 Opus claims world's best coding performance, leaked pre-announce, with community benchmarks confirming dominance (e.g., LMArena discussions). DeepMind's Genie 3 generates realtime world simulations with navigation and 1-minute+ consistency, though demos may be cherry-picked.

Builders gain local coding agents without API costs and simulation tools for game/embodiment prototypes, but expect eval needed for production reliability.

Community Benchmarks Highlight Run Local, Eval Hard

Reddit/Discord recaps (r/LocalLlama, Unsloth, LM Studio, etc.) focus on quantization (GGUF Q5_k_m), inference speeds (45 sec/image for Qwen-Image at 1024 res, CFG=4/20 steps), and integrations (llama.cpp MoE offload at 45 t/s). Discussions flag OpenAI OSS limits vs. closed models, GPU needs (NVMe offload, multi-GPU CUDA), and tools like LM Studio 0.3.21 supporting gpt-oss. EleutherAI notes scaling insights; Hugging Face covers HF CLI accelerations. Key: Prioritize GGUF for desktop deployment, benchmark your agent workflows—e.g., -n-cpu-moe 2 boosts CPU MoE to 45 t/s.