2026 Thesis: Coding Agents Break Containment

Agent Infra Stabilizes on Skills and Harnesses

swyx, drawing from curating AIE Europe tracks, identifies OpenClaw, harness engineering, and context engineering as top AI builder concerns, alongside evals, observability, GPUs, and multimodality. Harrison Chase's recent claim of AI infra stability resonates partially: LangChain reinvented itself yearly—from chains to LangGraph to agents—but now "skills" emerge as the minimal viable packaging. A skill is just a Markdown file with attached scripts, enabling agents (LLMs with tools, file systems, and retrieval) to integrate simply.

Jacob Effron notes infra firms chase whack-a-mole building patterns, unlike sticky app companies like Sierra or LlamaIndex, which act as outsourced AI teams for enterprises. swyx agrees: vertical apps thrive by applying state-of-the-art methods robust to model shifts, while horizontals reinvent as AI-era cloud primitives like sandboxes for massive workloads. Trade-off: Developers switch hot tools easily, eroding defensibility even for databases.

"It feels like we’ve landed at skills, which is like the minimal viable format. I don’t see how it can be more simple than that," says swyx, cautioning against overcommitting to stability theses amid potential real-time, subagent, or memory adaptations.

Agent Labs Playbook: From Frontier Models to Domain Specialization

Application companies like Cursor and Cognition follow an "agent lab" playbook: start with frontier models, specialize via domain tweaks, then train in-house once user data justifies cost/latency savings. Users opt for these models (e.g., Composio 2, Devin 1.6) in free markets, proving real value beyond marketing—especially for search, where domain-specific fine-tuning shines.

swyx highlights infrastructure easing this: Thinking Machines' Tinker, Primary's lab tools. It's a Bitter Lesson reversal—bootstrap on giants, distill for defined, high-volume/low-variance workloads. Jacob probes DIY RL: swyx sees it tied to quality-cost trade-offs, with every 10x inference speedup unlocking experiences.

Custom chips (Cerebras, Grok's Talu, MatX) fuel bullishness on open models: Cognition and OpenAI deploy them for thousands of tokens/second vs. <100 on NVIDIA, eroding quality-speed trade-offs. "Thousands of tokens per second instead of less than a hundred," swyx notes, explaining non-NVIDIA hype.

Coding Wars: Parabolic Growth, Sticky Products, and Endgame

Coding is AI's largest, fastest category: Anthropic (Claude Code), OpenAI (Codex), Cursor, Cognition ride the wave. swyx calls it "capability exploration" and "token-maxing," rewarding spenders in an experiment-heavy phase. First magical experiences drive stickiness—Claude Code vs. Codex shows products matter more than raw capability.

End state: Two majors (e.g., Cursor/Anthropic), niche tail; disruption possible from Microsoft, Mistral, xAI, Chinese labs. Apps retain workflow/last-mile edges vs. labs eyeing verticals like finance/healthcare. Coding previews all AI markets: foundation vs. app collisions, parabolic scaling.

Consumer AI plateaus on frequency/design (ChatGPT), but coding feels daily-essential. Valuations unbound: Billion ARR in a year, trillion caps break startup norms. "Coding has become one of the largest and fastest-growing categories in AI," swyx states.

Selling to Agents, Memory as Wedge, and Startup Pressures

Agent-first world prioritizes APIs/docs over UX—"agent experience may mostly just be good developer experience by another name." Pretraining incumbents compound edges. Memory/personalization next: Current models favor mention frequency; future product choice hinges on personalized recall.

Foundation labs threaten mid-size startups/SaaS most—early founders safe as ambitious builds attract lab hires; traditional SaaS faces "vibe coding" pressure. swyx spent six figures on event software, tempted by cheap AI rebuilds, but teams debate fragility vs. AI-native speed.

"Mid-size startups. Yes," swyx replies when Jacob asks who worries him most from lab encroachment.

Bottlenecks, Biosafety, and Next Frontiers

10T+ models are temporary rationing pre-bigger clusters; labs distill privates, scale insufficient alone. Memory slowest scaler: Context windows lag workflows despite million-token claims.

swyx changed minds: Bullish open models, top agent startups diverge from median, fine-tuning viable. Post-coding: Consumer agents, computer use, "coding agents breaking containment." Dark factories (zero-human-review code) demand new testing. RL/post-training (synthetic rubrics, Doctor GRPO, multi-turn) domain-specializes deeply.

World models next for "lived understanding"—Fei-Fei Li's spatial intelligence, Good Will Hunting analogy: LLMs know via reading, lack experience. Biosafety/security: swyx raised biosafety with Anthropic's Mike Krieger, who countered security bigger; restricted releases differ labs.

"The same way that 2025 was a year coding agents 2026 is coding agents breaking containments to do everything else," swyx's core thesis.

Key Takeaways

Curate agent stacks around skills (Markdown + scripts) for minimal viable tooling; expect memory/subagent evolutions.
Follow agent lab playbook: Frontier → domain specialize → distill own models on user data for cost/latency wins.
Bet on custom chips (Cerebras/Grok) + open models for 10x+ inference speedups unlocking apps.
Build sticky coding products first—magical UX trumps raw capability; coding templates vertical AI markets.
Prioritize agent-sellable APIs/docs; personalize memory to differentiate in frequency-biased models.
Target vertical apps as outsourced AI teams; horizontals as AI-cloud (sandboxes); avoid mid-size limbo.
Rebuild SaaS with AI cautiously—balance speed vs. fragility in internal culture wars.
Push world models/memory for next intelligence leap beyond token-maxing.

Notable quotes:

"Skills are the minimal viable format... a markdown file with some scripts attached." — swyx on agent packaging

"You are the outsource AI team... if they didn’t have you, they would’ve to hire in house." — swyx on vertical apps

"Every 10x speedup can unlock new product experiences." — swyx on inference hardware

"Agent experience may mostly just be good developer experience by another name." — swyx on APIs for agents

"Today’s LLMs may know everything by reading it all, but still lack the lived experience." — swyx on world models (Good Will Hunting analogy)