Resilient LLM Streaming: Jitter, Breakers, 90s Checks
After 50k AI page generations, boost streaming success from 92% to 99%+ by treating networks as foes: jittered backoff stops thundering herds, 90s health checks catch silent stalls, circuit breakers prevent self-DOS.
Dual Transports Share One Resilience Layer for Any Request
Match transport to request: use native EventSource for one-shot GET-style streams like initial AI page generation (audience research, scraping, copywriting, builder phases), which needs no client input during multi-minute output. Switch to fetch + ReadableStream for POST-heavy edits (user prompts up to 5MB with images), parsing SSE manually. Layer identical defenses on both—90s no-data timeout, 5-failure circuit breaker, jittered exponential backoff (base * 2^attempt, capped at 2/3/5s progressively, +0-50% random jitter)—to handle corporate proxies killing idle connections at 60s, 5G handoffs, or hotel WiFi header rewrites breaking SSE.
Jitter desynchronizes retries: without it, 50 tabs on flaky enterprise WiFi retry in waves (200ms, 400ms, etc.), DDoSing your backend; with full jitter, peak load drops by client count. Code it as exponentialDelay + Math.random() * exponentialDelay * 0.5. This absorbs storms where pure backoff synchronizes failures.
Heartbeat Health Checks Catch 'Open but Silent' Connections
EventSource stays 'OPEN' for 5+ minutes with zero bytes, fooling dev tools while users see frozen cursors from proxy buffering, crashed servers leaving sockets open, or silent TCP drops. Counter with client-side heartbeat: track lastHeartbeat per byte received, check every 10s; reconnect if 90s elapses without data (tuned above slowest legit gap of 60s in research scraping—cargo-cult 30s and you restart valid jobs).
Surface honestly: show 'Our AI is crafting the next step' during stalls (no typing, not done), keeping users patient vs. hiding brokenness. Worst-case detection: 100s post-stall start, balancing over-eager reconnections (annoy slow generations) against undetected death.
Circuit Breakers and Error Matrices Prevent Endless Loops
Local retries alone self-DOS on backend outages; add global circuit breaker: after 5 consecutive failures, pause 60s before one try (reset counter on success). From Release It!, this evidences systemic issues without page-crashing loops.
Filter retries via matrix: never on 4xx (400/401/403/404/422—auth/validation fails identically); always on 429/503/5xx (transient); default retry unknown/AbortError/TypeError:Failed to fetch (can't distinguish network blip from backend 502). Costly lesson: retrying expired 401 JWTs stacked 50 toasts.
Wire browser 'online'/'offline' events: offline → cleanup, no retries; online → reset counters/attempts, reconnect once (network failures ≠ backend faults, preserving retry budget).
Outcomes: 92% → 99%+ Success, Demo-Proof Reliability
These patterns—shared resilience, jitter, 90s checks, breakers, matrices, events—make pages finish first-try on uncontrolled networks, where prompts rarely bottleneck. Users notice completion, not 'better AI'. Lift for any blinking-cursor LLM UI; networks kill more demos than models.