TOPIC · 504 summaries

AI & LLMs

The deepest channel on Edge. Foundation models, agent architectures, retrieval systems, evals, and the moving line between research and production.

This pillar covers the work that determines what AI products can actually do. New model releases get filed here when they shift capability or cost in a meaningful way, alongside the harder material from the labs and the practitioners who turn it into shipping software. Read it for primary sources rather than recap blogs: lab papers and notes, retrieval benchmarks, agent traces, eval methodology, and the long-form essays that hold up six months later.

Two threads run through everything filed here. The first is what is genuinely new at the model layer: capability cliffs, training recipes, alignment work, the shape of the next deployment cycle. The second is what works in production: which patterns of context engineering and tool use compound across teams, where retrieval beats fine-tuning and where it loses, what the operational tax of running an agentic system actually looks like.

The summaries below are sorted by recency. The pillar refreshes as new entries land.

№ 01

Filed under AI & LLMs

504

Google Cloud Tech2026-05-07

Fix AI Agent Forgetting with 3 Memory Patterns

Gemini File Search 2.0 Cuts Multimodal RAG to 4 API Calls

IBM Granite Speech 4.1: 3 ASR Models for Accuracy, Features, Speed

Martell's AI Tier List: Tools That 10x Business ROI

Teach AI Values' Why Before What for Stronger Alignment

Guarantee LLM Outputs Match Exact Taxonomies with Tries

Groq-Powered Research Agent with LangGraph Sub-Agents

AI Agents Blur Vibe Coding into Pro Engineering

Customize VS Code Copilot Agents for Repeatable Workflows

MCP Apps: Interactive Branded UI in AI Chats

Bulletproof Taste: Rejections Beat AI Gingerbread

Gemma 4 MTP Drafters: 3x Faster Inference, No Quality Loss

AI Coders Default to Hardcoded Keyword Rules

GPU Bandwidth Limits LLM Speed, Not FLOPS

Inworld TTS-2 Uses User Audio for Adaptive Conversations

Agent 365: Govern Sprawling AI Agents Securely

Modular LLM Agent: Skills, Registry, Dynamic Routing

637MB LLM Runs Offline on Base MacBook Air, Works Surprisingly Well

Secure AI Agents via MCP Toolbox Custom Tools

Claude's Agentic OS Chains Skills into Full Workflows

Run Gemma 4 Agents On-Device with LiteRT Stack

CopilotKit's AG-UI Enables Dynamic AI Agent UIs in Apps

Consumer AI's Anticipation Gap Blocks True Assistants

Claude Code as Second Brain, Video Editor, and More

Build Knowledge Bases from Agent Failures

Gemini API Webhooks Replace Polling for Long-Running AI Jobs

Local AI Agent Stack: Ollama as LLM, MCP as Libraries

Databricks RAG: Low-Dim Qwen3 + Rerank for 89% Recall@10

Persist RAG Memory Across Turns with Lakebase PostgresSaver

Train GPT-2 LLM from Scratch on Laptop

7 Signs to Switch Browser AI to Desktop Agents

Top Search/Fetch APIs for AI Agents: Tools & Tradeoffs

Scale GenAI to Billions of Rows in BigQuery at 94% Less Cost

Fix Prompt Fragility by Decomposing Agents into Microservices

Verifier Agent Crushes AI Coding Review Bottleneck

CLI for Simple Tasks, MCP for Complex Gaps in AI Agents

LangGraph Builds Resilient Multi-Agent LLM Debate for Drift Tests

High Reasoning Trumps Newer Models for Precise Code

DeepSeek V4 + Claude Code Proxy for 76% Cheaper Coding

Codex /goal Autonomously Shipped 14/18 Features Overnight

5 LLM Agent Patterns for Reliable, Bloat-Free Workflows

Tiny LLMs and On-Device Agents via LiteRT-LM on Edge Hardware

Agentic Commerce Hands Power to Buyer Agents

Yin-Yang LLM Pipeline Cuts Noise in Code Scanning

Context Engines: Fix Agent Context to Cut Tokens 50%

Cut AI Agent Costs 70% with Manifest Router

Free NVIDIA NIM API Unlocks Kimi K2.6 for Agentic Coding

AI Agent Memory: 4 Dimensions, Benchmarks, Tool Tiers

SageMaker Fine-Tuning: LoRA Beats QLoRA on Cost-Perf Balance

Fix Tokenization Drift by Matching SFT Token Patterns

Frontier LLMs Split: Claude Deontological, Grok Consequentialist

Mistral Vibe Remote Agents Run Coding Tasks in Cloud at 77.6% SWE-Bench

10 New OSS Tools to Supercharge Claude Code

Multi-Agent AI Pipeline for Systems Biology Analysis

Codex CLI Beats Claude Code on Cost and Autonomy

DeepSeek's Visual Primitives: 10x KV Cache Efficiency

Parse, Analyze, Visualize Hermes Agent Traces for Fine-Tuning

H2E: Deterministic Safety via Riemannian Multimodal Fusion

Free Claude Code Proxy: 80-90% Quality at 2-5% Cost

Replit Stays Independent with 300% NRR and Secure AI Coding

Autodata: Agents Create Superior Synthetic Training Data

TRL Code Guide: SFT to GRPO LLM Alignment on T4 GPU

Reward Queries to Fix RAG Agent Failures

6 Agentic Patterns from Claude Design for Vertical Apps

Fairies: AI Agents as Canvas Collaborators

Codex Beats Claude Code: 4x Efficiency, Desktop Wins

RTX 5090 vs Mac Studio vs DGX Spark: Local AI Stack Guide

Ship Reliable AI Agents: Braintrust Hands-On

Build AI Workflows, Not Just Prompts

Composable Specialists Beat Monoliths for Enterprise AI

Qwen-Scope SAEs Unlock Actionable LLM Internals

AI Coding: From Flow State to Review Mode

AI Subsidy End Forces Usage Pricing and Cost Audits

Agent Harness: 9 Components Beyond Frameworks

Claude Code's 90-Day Sprint: 35 Updates to Autonomous OS

AI Token Spend Surges 10x: Measure ROI Before Cutting

Gemma Chat: Offline Vibe Coding with Gemma 4 on Mac

GPT-5.5 + Codex Beats Claude with 3-5x Coding Efficiency

Gemini Exports Editable Slides, Docs, Sheets, PDFs, Word, Excel

VOID Erases Video Objects While Rewriting Physics