Internal AI Adoption & The Rise of Agentic Workflows

Internal Adoption as a Leading Indicator

OpenAI's internal data reveals a significant shift in how AI is utilized within the organization. Between November 2025 and June 2026, median Codex output tokens increased by 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal. This surge suggests that the most effective path to production adoption involves supporting persistent workflows, human-in-the-loop review cycles, and specialized tooling rather than relying on general-purpose chat interfaces.

The Shift to Long-Horizon Agent Infrastructure

Engineering efforts are increasingly focused on agentic persistence rather than just minimizing interactive latency. New infrastructure providers like Sail ($80M raised) and Hyperagent are optimizing for workloads that run for days or weeks, providing dedicated cloud environments for browser and code execution. This aligns with a broader industry trend: using general-purpose chat for one-off answers and specialized, persistent agents for repeatable, multi-step tasks.

Evolving Evaluation and Synthetic Data

Public benchmarks are increasingly viewed as compromised due to data leakage (models retrieving solutions from the internet or git history). Research is shifting toward:

No-internet evaluation harnesses: Stricter environments to prevent benchmark hacking.
Autodata loops: Using agentic loops to generate, analyze, and meta-optimize synthetic training data, which has improved pass rates in legal and math tasks from 62.1% to 79.6%.
Test-time compute via curation: Techniques like those from Datology demonstrate that data curation can improve model efficiency by 35x by inducing concision without sacrificing performance.

Open Model Ecosystem

Open models continue to close the gap with frontier labs. Notable releases include:

GLM-5.2 Max: Reached 1595 on Code Arena (Frontend) and demonstrated high agentic reliability with zero failed runs across 84 tests.
Ornith-1.0: A family of models (9B to 397B MoE) utilizing self-improving RL to optimize task-specific scaffolds.
Liquid AI LFM2.5-230M: An ultra-small model optimized for low-latency tool use, achieving ~1400 tok/s locally via WebGPU.

Internal Adoption as a Leading Indicator

The Shift to Long-Horizon Agent Infrastructure

Evolving Evaluation and Synthetic Data

Open Model Ecosystem

More from Agents & Orchestration

Personality Prompting in Multi-Agent Teams: Task-Dependent Impact

ParallelKernelBench: Frontier LLMs Struggle with Multi-GPU Kernels

CoCoDA: Co-Evolve DAGs to Scale Tool-Augmented Agents

Uber's OpenAI-Powered Multi-Agent AI Optimizes Earnings and Booking