#devops-cloud
Every summary, chronological. Filter by category, tag, or source from the rail.
Replay Logs Fail Agents: Use VM Snapshots Instead
Replay durability constrains agent code with growing logs; split into context logs (DB durable) and execution snapshots (14MB Firecracker VMs, <1s save/100ms restore) for multi-day sessions.
AI EngineerOpenAI's Real-Time Voice AI Powers Agents, Backed by MRC Networking
OpenAI's GPT-Realtime-2 enables live voice agents with GPT-4o reasoning, 128k context, parallel tools, and 96.6% audio accuracy; MRC networking spreads data across paths for 131k-GPU clusters with microsecond failure recovery.
AI RevolutionToken Bucket Fails at Window Boundaries—Use Sliding Window
Token bucket rate limiting lets clients burst 40 requests across a minute boundary despite 100/min limit; sliding window counters prevent this by tracking requests in the last N seconds from now, enforcing even distribution.
AI Agents Expose IDP Flaws Built for Humans
Internal Developer Platforms (IDPs) assume human interpreters for ambiguities like unclear errors and tribal knowledge; AI agents fail because they execute exactly as interfaces allow, demanding explicit, machine-readable contracts to avoid disasters like deleting entire databases.
Manual Deployment Unlocks Foundry Hosted Agents
Deploy Foundry hosted agents by building container images in ACR, setting up Foundry Project with RBAC, creating via Azure SDK with env vars and resources (cpu=0.25, mem=0.5Gi), then assigning Azure AI User RBAC to Agent ID—avoids azd preview failures.
Migrate MongoDB to Firestore Serverless Seamlessly
Firestore's MongoDB-compatible API lets you reuse existing code, drivers, and aggregation pipelines on a serverless DB with real-time queries for AI agents and five-nines availability.
Agent 365: Govern Sprawling AI Agents Securely
Microsoft Agent 365 acts as a control plane to observe, govern, and secure AI agents across Microsoft tools, local devices, multi-cloud platforms, and SaaS partners, addressing agent sprawl with discovery, policy controls, and runtime blocking—now generally available at $15/user/month.
Claude Managed Agents: Infra-Free Deployment at $0.08/Hour
Anthropic's Claude Managed Agents offloads agent infra, security, and scaling to their cloud for $0.08 per session-hour + tokens, letting you build via API—but vendor lock-in and costs demand ROI checks.
KodeKloudScale GenAI to Billions of Rows in BigQuery at 94% Less Cost
BigQuery's optimized mode distills LLMs into lightweight models using embeddings, slashing token use by 94% (55M to 3M) and query time from 16min to 2min on 34k images or 50k voice commands, scaling to billions of rows.
Google Cloud TechEngineer AI Context Like Code: Full Lifecycle
Treat AI agent context as code with a Context Development Lifecycle—Generate, Evaluate, Distribute, Observe—to create reliable, scalable prompts that drive better agent outputs via testing, sharing, and feedback loops.
AI EngineerFlink Treats Batch as Streaming for Unified Low-Latency Processing
Apache Flink processes unbounded streams and bounded batches with one engine using operators, state, windows, and exactly-once guarantees, eliminating dual codebases for real-time apps like recommendation engines handling millions of events.
Harness-as-a-Service Fuels Reliable AI Agents
Big tech earnings reveal explosive AI cloud growth amid compute shortages. Harness-as-a-Service platforms like Cursor SDK and managed agents provide sandboxed runtimes, shifting agent building from DIY harnesses to scalable infrastructure.
Vercel Sandbox Firewall Enables Postgres Connections
Vercel Sandbox now supports outbound Postgres connections to hosted DBs like Neon and Supabase by detecting TLS upgrades during negotiation—no code changes required, just add DB host to allowed domains.
Scale 60M req/mo solo on Cloud Run for $180
Solo builder scales feature flag SaaS RocketFlag to 60M requests/month across regions using Go on Cloud Run, batch DB writes to Firestore/BigQuery, and Cloud Armor—total Dec bill $180 USD (252 AUD) with zero SRE time.
Google Cloud TechScaling LLM Inference: KV Cache, Batching, Spec Decoding & Multi-LoRA
Production LLM serving shifts from training's throughput focus to inference's memory-bound latency challenges, solved by PagedAttention (96% util), continuous batching, EAGLE-3 (up to 6.5x speedup), and FastLibra for multi-LoRA (63% TTFT cut).
Healthcare LLM Rate Limits: 2 Fail, 1 Works
Simple per-user rate limits on LLM APIs fail to stop credential stuffing attacks (causing $47K bills) and block critical clinical workflows; context-aware throttling with priority and anomaly detection is the only production-ready solution.
Zrok: Open-Source ngrok Fix for Secure Localhost Sharing
Zrok enables one-command sharing of localhost apps, files, TCP/UDP services publicly or privately via tokens—zero-trust on OpenZiti beats ngrok's limits, random URLs, and public exposure without port forwarding.
Better StackSnowflake-Native Fraud ML Pipeline: Train to Monitor
Build end-to-end fraud detection with XGBoost in Snowflake ML—data loading to drift monitoring—avoiding data gravity, handling 0.5-2% imbalance via scale_pos_weight=27.6, achieving ROC-AUC=0.7275 and optimal F1=0.5874 at threshold=0.58.
Run S3-Compatible MinIO Locally to Cut Dev Costs
Deploy MinIO via Docker on your laptop for S3-compatible object storage using unchanged boto3 Python code, solving AWS S3 cost, latency, and lock-in issues for local dev and AI/RAG pipelines.
Better StackEnterprise Registry Unifies MCP & A2A Agents at Scale
Build private MCP and A2A registries enriched with enterprise metadata to enable discovery, governance, lineage, and standardized deployment across global teams building AI agents.
AI EngineerScale RAG to Production: Fix 8 Anti-Patterns with 5 Pillars
RAG fails in production due to 8 anti-patterns like vector-only retrieval and stateful pods; counter them with 5 pillars—governance, core hardening, retrieval smarts, agent actions/memory, and security/FinOps—for reliable, observable systems.
AI Chokepoints: Chips, Power Reshape Global Race
Frontier AI shifts from diffusible software to physical chokepoints in chips, helium, HBM/DRAM, power delivery, concentrating capability in few geographies like the US.
Fixing ML Pipelines for Databricks Constraints
Databricks free workspaces block public DBFS, continuous triggers, and large models—use Unity Catalog volumes, micro-batch streaming, vector_to_array for probs, and top-50k user subsets to ship reliably.
Scaled SaaS to 25K Users/$8K MRR: Social + Tech Fixes
Grew Yorby.ai to 25K users/$8K MRR using 40 organic creator accounts; fixed Supabase connection pooling with a dedicated GCE server; pivoted ICP from prosumer creators to low-churn agencies via PostHog analysis.
Sandbox AI-Generated Code with Capability Security
Run untrusted LLM-generated code in isolates or containers using capability-based security: explicitly allow only needed access to block hallucinations, leaks, and injections.
Secure MCP Servers for Production with 5 Principles
Design MCP servers for agents using 5 principles to shrink attack surface and block OASP top 10 threats; deploy remotely via HTTP with OAuth 2.1, preferring CIMD over DCR for dynamic client auth.
Gemini CLI: Context to CI/CD for Production AI Agents
Gemini CLI turns natural language 'vibe coding' into full ADK agents with context engineering, skills, hooks, tests, and automated Cloud Run deployment—proving AI can handle end-to-end dev without manual coding.
Google Cloud TechSecure Agentic AI with Tokens & Delegation
Prevent credential replay, rogue agents, and overpermissioning in agentic flows using verifiable agent identities, delegation tokens, token exchanges at each hop, scoped permissions, and secure vaults for last-mile access.
Copilot Injects Ads into 11K GitHub PRs
Microsoft's GitHub Copilot added ad-like promotions for Raycast to 11,400 pull requests, prioritizing AI usage over fixing GitHub's 90 incidents in 90 days and 90.84% uptime.
The PrimeTimeBuild Graph RAG Multi-Agents for Multimodal Data
Step-by-step workshop to ingest images/videos/text into Cloud Spanner graph DB, add embeddings for Graph RAG search, orchestrate multi-agents with ADK, and enable long-term memory—all using Google Cloud for real-time survivor matching.
Google Cloud TechShowing 30 of 40