PostHog's Playbook to Fix LLM Codegen Failures

Feed Fresh Context to Counter Model Rot

LLMs snapshot the world 6-18 months ago, causing 'model rot' in fast-moving projects where APIs and patterns change. Stuffing current markdown docs directly into context outperforms basic RAG due to large context windows. PostHog Wizard detects integration type (e.g., framework/language), fetches hot-off-the-press docs from posthog.com, and slides them in. This fixed early failures like invented APIs/keys, turning primitive agents into reliable integrators serving 15,000 monthly runs and earning unprompted praise on Bluesky/Twitter.

Shape Integrations with Token-Efficient Model Airplanes

Trained on messy repos, LLMs propose workable but suboptimal architectures. Maintain 'model airplanes'—minimal, non-production apps with correct PostHog patterns across frameworks/languages (e.g., login tracking). These are token-cheap facsimiles (UI 'O'-shaped but dummy-functional) that agents reference to complete integrations consistently, avoiding 15,000 unique setups and support nightmares. Flatten them into markdown via a context service for skill files, ensuring agents see the exact shape without full app bloat.

Agents improvise wildly if given the full plan upfront, creating claw-code holes then polishing randomly. Sequence prompts narrowly: (1) Locate business-value files (login/Stripe/churn signals—easy via code shadows); (2) List events/descriptions without coding; (3) Implement PostHog using prior lists + docs. This breadcrumbs to thoughtful, uniform modifications, scaling reliably without sorcerer's apprentice variance.

Interrogate Agents and Lock Tools for Reliability

Human frailties (fragmentary context, contradictory tools, lang mismatches like JS-on-Python) sabotage agents—e.g., missing tools halted hundreds of runs. At run-end, prompt: 'What could we improve for success?' to uncover issues cheaply. For shenanigans on user machines, ban raw .env reads (no cloud leaks); build tools checking/writing keys only. Wizard uses Claude agent SDK in a CLI with free PostHog inference, wrapping securely.

Build with 90% Prompts, Not Code

Code depreciates (new models ignore it), but prompts amplify with better LLMs. Wizard is 90% markdown (docs/skills), 8% markdown tools, rest agent harness—letting the 'octopus' wriggle freely. Step back: sequence info via prompts instead of over-scaffolding code, yielding happy users from 5,000+ monthly.

Feed Fresh Context to Counter Model Rot

Shape Integrations with Token-Efficient Model Airplanes

Breadcrumb Tasks to Prevent Erratic Paths

Interrogate Agents and Lock Tools for Reliability

Build with 90% Prompts, Not Code

More on Edge

Malleable Evals: Adaptive Testing for Changing AI Agents

Agentic Search Powers 80% of LLM Context Engineering

3 Steps to Custom Claude Code Agentic OS

Fix Prompt Fragility by Decomposing Agents into Microservices