Today in AI engineering, design & research.
A reading room of curated AI summaries. The signal, distilled. One short brief when something good lands; the rest waits here for you.
Today's reading — editor's picks
GPT-Realtime-2 Brings GPT-5 Reasoning to Voice Agents
OpenAI's GPT-Realtime-2 delivers 128K context, parallel tool calls, adjustable reasoning (minimal to xhigh), and tops benchmarks at 96.6% Big Bench Audio, enabling responsive voice agents that handle interruptions and long sessions.
$254/Month for AI VPs Handling 70% of Ops
SaaStr's custom AI VPs for Marketing (10K, $95/mo) and CS (Qbee, $160/mo) on Replit replace 70% of human operational work costing $500K-800K/year, with full stack at $2,300/mo driving 47% YoY revenue growth.
Build Production AI Agents Live at SaaStr AI 2026
SaaStr AI Annual 2026 (May 12-14) features live builds of AI VPs for marketing/CS costing $95/mo with 70% hour reductions, plus hands-on Replit workshops to ship your own agents in 30 mins—no code needed.
One short email when something good lands.
No daily firehose. No sponsored slop. Just the few summaries each week that move the needle for AI engineers and design engineers — picked by humans, sent at 7am.
The stream — chronological
GPT-Realtime-2 Brings GPT-5 Reasoning to Voice Agents
OpenAI's GPT-Realtime-2 delivers 128K context, parallel tool calls, adjustable reasoning (minimal to xhigh), and tops benchmarks at 96.6% Big Bench Audio, enabling responsive voice agents that handle interruptions and long sessions.
$254/Month for AI VPs Handling 70% of Ops
SaaStr's custom AI VPs for Marketing (10K, $95/mo) and CS (Qbee, $160/mo) on Replit replace 70% of human operational work costing $500K-800K/year, with full stack at $2,300/mo driving 47% YoY revenue growth.
Build Production AI Agents Live at SaaStr AI 2026
SaaStr AI Annual 2026 (May 12-14) features live builds of AI VPs for marketing/CS costing $95/mo with 70% hour reductions, plus hands-on Replit workshops to ship your own agents in 30 mins—no code needed.
Net New Customers: B2B's Truest Health Metric
Track quarterly net new customer counts over revenue or NRR—it's decelerating in app SaaS (e.g., Atlassian) but accelerating in AI infra (Cloudflare +40% YoY, Twilio +42%), exposing the AI bifurcation.
Production AI Agents: Block Bad Pitches, Isolate DBs, Specialize SDRs
SaaStr runs 20+ agents turning revenue from -19% to +47% YoY; audit by 'would you buy?', use contained platforms like Replit to prevent DB deletions, hire marketers to execute AI VP ideas.
AI Agent QBee Cuts SaaStr CS Hours 70% Internally + Externally
SaaStr's custom AI agent QBee handles repetitive CS tasks for 150+ sponsors, saving 65% internal hours and 75% external sponsor hours—total 70% reduction, 3x human productivity boost, with happier customers.
DAU/MAU Tops ARR as B2B AI Success Metric
In B2B AI, DAU/MAU and hours per user predict renewal/expansion better than ARR; Harvey's 50% DAU/MAU and 12 hours/month/user fuel 6x YoY net new ARR while exposing stealth churn.
Mag7's $700B AI Capex Bet Powers Palantir's 145% Rule of 40
Mag7 reported $540B revenue and $700B 2026 AI capex in capitalism's most aggressive quarter; Palantir's RPO surged 134% to $4.45B with 145% Rule of 40 by enabling $20-100M enterprise AI overhauls; SaaS reaccelerates via AI base monetization + new customers.
Think 2026: AI Maturity, CEO Trust & Governance Shift
Panelists at IBM Think 2026 highlight AI's enterprise maturity via end-to-end agents like Bob, 64% CEO trust in AI decisions per IBV study, and urgent need for governance learned from cloud era.
Mozilla's Agentic AI Pipeline Uncovers 271 Firefox Vulns
Using Claude Mythos Preview in an agentic pipeline that self-verifies via custom test cases, Mozilla found 271 unknown Firefox 150 vulnerabilities—some 20 years old—driving total fixes to 423 in April vs. 76 prior record.
NEO Automates Full ML Pipelines in VS Code from One Prompt
Install NEO VS Code extension to generate synthetic datasets, train models, deploy APIs, and build UIs autonomously for ML tasks like chat moderation, using local files with optional cloud integrations for privacy.
OpenAI Realtime API GA: 128K Voice Agents + Translate/STT
Build production voice apps now with GA Realtime API: GPT-Realtime-2 handles multi-step reasoning (128K context, 5 effort levels, 96.6% Big Bench Audio), GPT-Realtime-Translate for 70+ languages ($0.034/min), GPT-Realtime-Whisper for streaming STT ($0.017/min).
Weekend AI Agent Powers HR, Finance, Marketing Unexpectedly
Ship minimal AI tools fast: Pulsar, a weekend scraper for dev trends, surfaced market insights that reshaped strategy and integrated into finance comp analysis, HR onboarding, and marketing calendars.
Bun's Fast Runtime Risks AI Agent Pivot
Bun shines as a speedy JS runtime, package manager, and server tool, but Anthropic's ownership signals evolution toward AI agent features like sandboxing, potentially alienating web devs.
Freebuff: Free AI Coder 3x Faster Than Claude Code
Freebuff delivers a zero-config, ad-supported AI coding agent using GLM 5.1 and free models like DeepSeek v4 Pro, achieving 83% Evol score—3x faster and more reliable than Claude Code without rate limits.
Sell Custom AI Agents to Local Biz: Claude + Poppy Stack
Build AI chat widgets for local businesses using Poppy for knowledge hubs and Claude Code for scraping—deploy via API, charge $1,000–$1,500 setup + monthly subs for updates.
AI Clears Healthcare Referral Backlogs with Instant Scheduling
Specialty practices process thousands of faxed referrals manually, causing delays; Basata's AI extracts data from faxes, uses voice agents to call and book patients instantly, handling 500k referrals to date.
Bun Shifts to Anthropic-Optimized AI Agent Toolkit
After Anthropic's acquisition, Bun adds AI-friendly APIs like headless web view and image manipulation, expanding beyond Node.js compatibility into agent tools while retaining performance edge.
Copy This Lean AI Stack + Frameworks to Beat Overwhelm
Stick to S-tier daily drivers (Claude Code in VS Code + Glido); use tiered stack and decision framework—test new tools only if they solve real pain points in real scenarios, accepting a 20% productivity dip only if it leads to net gains.
Use Claude Code + Codex Together for Best AI Coding
Reject AI tool tribalism: Run Claude Code inside Codex's desktop app terminal for seamless dual-agent coding—plan in one, review/build in the other, leveraging both models' strengths without loyalty to any vendor.
Stealth CloakBrowser Automation in Colab with Persistence
Run Playwright-style stealth Chromium automation in Google Colab by isolating sync APIs in a worker thread; customize contexts with viewport=1365x768, persist localStorage via storage_state.json or profile dirs, and inspect undetectable signals like webdriver=false.
SaaStr's 20+ AI Agents: Train Hard, Replace Mediocre Humans
SaaStr went from AI laggards to leaders with 21 production agents by rigorously training off-the-shelf tools, outperforming vendors' top users and replacing underperforming staff—proving consistent iteration beats massive datasets.
Data-Centric Design Rules for Complex Apps
Center interaction design on data landscapes: learn Python and users' jobs, let data structure UIs, strip chrome, design empty states, and bridge mental/data models to align interfaces with real-world tasks.
Codex Chrome Extension Automates Browsers via Natural Language
Install OpenAI's Codex extension on Chromium browsers like Brave to control web tasks—navigate sites, post queries—with plain English commands, as demoed debugging an LLM Council app.
TokenSpeed Beats TensorRT-LLM 9-11% on Agentic Coding Inference
TokenSpeed open-source engine optimizes agentic workloads with long contexts (>50K tokens) and multi-turn convos, delivering 9% lower latency and 11% higher throughput than TensorRT-LLM at 70-100 TPS/user on NVIDIA B200.
DeepSeek-TUI: Viral Open-Source Claude Code Rival
DeepSeek-TUI, a Rust-based terminal AI coding agent powered by DeepSeek V4's 1M-token context, hit 10k+ GitHub stars in days as a cheap, customizable alternative to Claude Code, built by a music/law student using AI-assisted coding.
Pit: Ex-Voi Founders' $16M AI for Enterprise Automation
Pit builds custom AI software to automate enterprise back-office processes like telecom and healthcare ops, using Pit Studio for process guidance and Pit Cloud for secure deployment; raised $16M seed led by a16z.
Anthropic's Compute Deal and Agents Challenge OpenAI
Anthropic secures all xAI/SpaceX Colossus compute to end constraints, doubles Claude usage limits, launches enhanced Managed Agents—positioning Claude Code/Co-work as coding OS and cloud agents as scalable team infra vs. OpenAI.
AI Agents Expose IDP Flaws Built for Humans
Internal Developer Platforms (IDPs) assume human interpreters for ambiguities like unclear errors and tribal knowledge; AI agents fail because they execute exactly as interfaces allow, demanding explicit, machine-readable contracts to avoid disasters like deleting entire databases.
Marketing Brain: AI Vault for 18k Keyword SEO Strategies
Marketing Brain uses Claude Code and DataForSEO to mine 18,000+ unique keywords from top 10 competitors, generating compounding 30/60/90-day white-hat SEO plans in an Obsidian vault via the FLOW framework.
Showing 30 of 2197