PostHog's Playbook to Fix LLM Codegen Failures
Use fresh docs to fight model rot, model airplanes for patterns, task breadcrumbing to limit paths, agent interrogation for errors, locked tools for safety, and 90% prompts over code for reliability—powering 15k monthly integrations.
Feed Fresh Context to Counter Model Rot
LLMs snapshot the world 6-18 months ago, causing 'model rot' in fast-moving projects where APIs and patterns change. Stuffing current markdown docs directly into context outperforms basic RAG due to large context windows. PostHog Wizard detects integration type (e.g., framework/language), fetches hot-off-the-press docs from posthog.com, and slides them in. This fixed early failures like invented APIs/keys, turning primitive agents into reliable integrators serving 15,000 monthly runs and earning unprompted praise on Bluesky/Twitter.
Shape Integrations with Token-Efficient Model Airplanes
Trained on messy repos, LLMs propose workable but suboptimal architectures. Maintain 'model airplanes'—minimal, non-production apps with correct PostHog patterns across frameworks/languages (e.g., login tracking). These are token-cheap facsimiles (UI 'O'-shaped but dummy-functional) that agents reference to complete integrations consistently, avoiding 15,000 unique setups and support nightmares. Flatten them into markdown via a context service for skill files, ensuring agents see the exact shape without full app bloat.
Breadcrumb Tasks to Prevent Erratic Paths
Agents improvise wildly if given the full plan upfront, creating claw-code holes then polishing randomly. Sequence prompts narrowly: (1) Locate business-value files (login/Stripe/churn signals—easy via code shadows); (2) List events/descriptions without coding; (3) Implement PostHog using prior lists + docs. This breadcrumbs to thoughtful, uniform modifications, scaling reliably without sorcerer's apprentice variance.
Interrogate Agents and Lock Tools for Reliability
Human frailties (fragmentary context, contradictory tools, lang mismatches like JS-on-Python) sabotage agents—e.g., missing tools halted hundreds of runs. At run-end, prompt: 'What could we improve for success?' to uncover issues cheaply. For shenanigans on user machines, ban raw .env reads (no cloud leaks); build tools checking/writing keys only. Wizard uses Claude agent SDK in a CLI with free PostHog inference, wrapping securely.
Build with 90% Prompts, Not Code
Code depreciates (new models ignore it), but prompts amplify with better LLMs. Wizard is 90% markdown (docs/skills), 8% markdown tools, rest agent harness—letting the 'octopus' wriggle freely. Step back: sequence info via prompts instead of over-scaffolding code, yielding happy users from 5,000+ monthly.