#cloud
Every summary, chronological. Filter by category, tag, or source from the rail.
Ditch preferred_username for Azure AD Guest Auth
Using preferred_username as identity anchor worked for employees but failed silently for all B2B guests, causing 403 errors post-launch. Anchor on oid instead for reliable identification.
Secure AI Agents via MCP Toolbox Custom Tools
MCP Toolbox prevents confused deputy attacks by letting developers pre-write constrained SQL tools with bound parameters, separating agent flexibility from app-controlled security for runtime agents.
Google Cloud TechSageMaker Fine-Tuning: LoRA Beats QLoRA on Cost-Perf Balance
LoRA cuts trainable params by 96% vs full fine-tuning, balancing cost savings and accuracy on Llama2-7B/Mistral7B; QLoRA saves 8x memory but trains slower due to dequantization overhead.
Bigtable Scales Petabytes for Real-Time NoSQL Workloads
Bigtable auto-scales to hundreds of petabytes and millions of ops/sec with low latency, powering Google Search/YouTube/Maps; ideal for time series, ML features, and streaming via Flink/Kafka integrations.
Google Cloud TechScale PyTorch DDP Multi-Node on AWS EC2: Infra-First Guide
Multi-node DDP demands identical environments, data access, and open security groups across EC2 instances; use torchrun launcher with DDPManager for minimal code changes and reliable gradient sync via NCCL.
TPUs Dominate at Infrastructure Scale Over Per-Chip GPU Wins
Google's TPU v8t (training) and v8i (inference) lag Nvidia GPUs per chip but deliver superior performance at scale—9600-chip superpods hit 121 exaFLOPS FP4—via cube topology and Virgo networking, optimizing for AI's bandwidth-heavy workloads.
Next '26: Build Agents with ADK, Skills, and Gemini
Google Cloud Next '26 demos production multi-agent systems using open-source ADK for any language/model, modular skills for efficient context, and tools like MCP servers—open-sourced Race Condition repo for marathon planning.
Google Cloud TechBatch Size Unlocks 1000x LLM Inference Efficiency
Reiner Pope deduces frontier LLM training and serving mechanics from roofline analysis, revealing batch size as the core driver of latency-cost tradeoffs, with optimal batches of ~2000 tokens amortizing weights for massive gains.