Vertical Models Beat Frontiers via Experience Data

Post-training open-weight models on proprietary interaction data—like Intercom's Apex for customer service or Cursor's Composer 2 for coding—outperforms frontier LLMs on speed, cost, accuracy, signaling durable moats at the model layer.

Domain Post-Training Closes Performance Gaps

Specialized post-training on high-quality, proprietary interaction data vaults open-weight base models past frontier LLMs for vertical tasks. Intercom's Apex, built for customer service, achieves 28% higher resolution rates, 65% fewer hallucinations, faster inference, and lower costs than GPT-4.5 or Claude-3.5 Opus, leveraging billions of human-agent interactions for evals and fine-tuning. Cursor's Composer 2 starts from Qwen 2.5 (open-source), applies reinforcement learning with 75% of compute on post-training, beats Opus 4.6 on coding benchmarks while costing less to run. Decagon routes 80% of traffic through a network of in-house specialized models for detection, orchestration, response, and evaluation, optimizing each layer independently for speed and quality. This flywheel—usage data refines evals, evals improve models, better models generate more data—creates compounding edges unavailable to generalists.

Bitter Lesson Evolves: Experience Trumps Brute-Force Scale

Rich Sutton's 2019 Bitter Lesson holds: general compute/data methods beat human-knowledge encodings across chess, Go, vision, speech, and language. BloombergGPT (50B params, finance-tuned from scratch) lost to larger generals, proving specialization alone fails without scale. But limits on pre-training data shift emphasis to post-training, where vertical firms' 'last-mile' experience data (millions of real interactions) acts as scalable, non-human-knowledge fuel. Sutton himself predicted this on Dwarkesh Patel's podcast: systems learning from experience will supersede human-injected knowledge, extending the lesson. Unlike early domain hacks, this uses brute-force learning on experiential data, aligning with Sutton's thesis while enabling smaller, task-speciated models (per Andrej Karpathy's analogy to animal brain diversity).

Disruption Forces Full-Stack AI and Model Moats

API reliance erodes as vertical SaaS firms (Pinterest, Airbnb, Notion, Cursor, Intercom, hundreds more) train in-house open models that are better, faster, cheaper than vendor APIs—echoing cloud markup shifts. Durable differentiation migrates down-stack: app-layer clones easily, but proprietary evals/data lock in model superiority. Frontier labs (OpenAI, Anthropic) over-serve with general intelligence unnecessary for niches like coding or CS; open-weights suffice as bases. Labs must counter via specialized models, data partnerships, or M&A for evals—not every firm succeeds (needs post-training expertise), but those with scale (e.g., Cursor's financial need to cut API burn) will experiment aggressively. Result: hyperspecific providers compete head-on, majority of workflows go in-house/open-source over APIs.

Video description
Analysis contrasts Sutton's Bitter Lesson with a rising era of vertical AI models trained on last-mile interaction data, exemplified by Intercom's Apex and Cursor's Composer Two. Post-training on proprietary interaction datasets and reinforcement learning on curated quality data can elevate open-weight base models to meet or exceed frontier-model performance for specific tasks. Resulting effects include model speciation, a shift to in-house fine-tuning on open models, erosion of API-based moats, and a renewed premium on proprietary evaluation data. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Get it ad free at http://patreon.com/aidailybrief Learn more about the show https://aidailybrief.ai/

Summarized by x-ai/grok-4.1-fast via openrouter

6183 input / 1544 output tokens in 14966ms

© 2026 Edge