Internal Adoption as a Leading Indicator
OpenAI's internal data reveals a significant shift in how AI is utilized within the organization. Between November 2025 and June 2026, median Codex output tokens increased by 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal. This surge suggests that the most effective path to production adoption involves supporting persistent workflows, human-in-the-loop review cycles, and specialized tooling rather than relying on general-purpose chat interfaces.
The Shift to Long-Horizon Agent Infrastructure
Engineering efforts are increasingly focused on agentic persistence rather than just minimizing interactive latency. New infrastructure providers like Sail ($80M raised) and Hyperagent are optimizing for workloads that run for days or weeks, providing dedicated cloud environments for browser and code execution. This aligns with a broader industry trend: using general-purpose chat for one-off answers and specialized, persistent agents for repeatable, multi-step tasks.
Evolving Evaluation and Synthetic Data
Public benchmarks are increasingly viewed as compromised due to data leakage (models retrieving solutions from the internet or git history). Research is shifting toward:
- No-internet evaluation harnesses: Stricter environments to prevent benchmark hacking.
- Autodata loops: Using agentic loops to generate, analyze, and meta-optimize synthetic training data, which has improved pass rates in legal and math tasks from 62.1% to 79.6%.
- Test-time compute via curation: Techniques like those from Datology demonstrate that data curation can improve model efficiency by 35x by inducing concision without sacrificing performance.
Open Model Ecosystem
Open models continue to close the gap with frontier labs. Notable releases include:
- GLM-5.2 Max: Reached 1595 on Code Arena (Frontend) and demonstrated high agentic reliability with zero failed runs across 84 tests.
- Ornith-1.0: A family of models (9B to 397B MoE) utilizing self-improving RL to optimize task-specific scaffolds.
- Liquid AI LFM2.5-230M: An ultra-small model optimized for low-latency tool use, achieving ~1400 tok/s locally via WebGPU.