Resilient LLM Streaming: Jitter, Breakers, 90s Checks

Match transport to request: use native EventSource for one-shot GET-style streams like initial AI page generation (audience research, scraping, copywriting, builder phases), which needs no client input during multi-minute output. Switch to fetch + ReadableStream for POST-heavy edits (user prompts up to 5MB with images), parsing SSE manually. Layer identical defenses on both—90s no-data timeout, 5-failure circuit breaker, jittered exponential backoff (base * 2^attempt, capped at 2/3/5s progressively, +0-50% random jitter)—to handle corporate proxies killing idle connections at 60s, 5G handoffs, or hotel WiFi header rewrites breaking SSE.

Jitter desynchronizes retries: without it, 50 tabs on flaky enterprise WiFi retry in waves (200ms, 400ms, etc.), DDoSing your backend; with full jitter, peak load drops by client count. Code it as exponentialDelay + Math.random() * exponentialDelay * 0.5. This absorbs storms where pure backoff synchronizes failures.

Heartbeat Health Checks Catch 'Open but Silent' Connections

EventSource stays 'OPEN' for 5+ minutes with zero bytes, fooling dev tools while users see frozen cursors from proxy buffering, crashed servers leaving sockets open, or silent TCP drops. Counter with client-side heartbeat: track lastHeartbeat per byte received, check every 10s; reconnect if 90s elapses without data (tuned above slowest legit gap of 60s in research scraping—cargo-cult 30s and you restart valid jobs).

Surface honestly: show 'Our AI is crafting the next step' during stalls (no typing, not done), keeping users patient vs. hiding brokenness. Worst-case detection: 100s post-stall start, balancing over-eager reconnections (annoy slow generations) against undetected death.

Circuit Breakers and Error Matrices Prevent Endless Loops

Local retries alone self-DOS on backend outages; add global circuit breaker: after 5 consecutive failures, pause 60s before one try (reset counter on success). From Release It!, this evidences systemic issues without page-crashing loops.

Filter retries via matrix: never on 4xx (400/401/403/404/422—auth/validation fails identically); always on 429/503/5xx (transient); default retry unknown/AbortError/TypeError:Failed to fetch (can't distinguish network blip from backend 502). Costly lesson: retrying expired 401 JWTs stacked 50 toasts.

Wire browser 'online'/'offline' events: offline → cleanup, no retries; online → reset counters/attempts, reconnect once (network failures ≠ backend faults, preserving retry budget).

Outcomes: 92% → 99%+ Success, Demo-Proof Reliability

These patterns—shared resilience, jitter, 90s checks, breakers, matrices, events—make pages finish first-try on uncontrolled networks, where prompts rarely bottleneck. Users notice completion, not 'better AI'. Lift for any blinking-cursor LLM UI; networks kill more demos than models.

Dual Transports Share One Resilience Layer for Any Request

Heartbeat Health Checks Catch 'Open but Silent' Connections

Circuit Breakers and Error Matrices Prevent Endless Loops

Outcomes: 92% → 99%+ Success, Demo-Proof Reliability