PostHog's Playbook to Fix LLM Codegen Failures

Use fresh docs to fight model rot, model airplanes for patterns, task breadcrumbing to limit paths, agent interrogation for errors, locked tools for safety, and 90% prompts over code for reliability—powering 15k monthly integrations.

Feed Fresh Context to Counter Model Rot

LLMs snapshot the world 6-18 months ago, causing 'model rot' in fast-moving projects where APIs and patterns change. Stuffing current markdown docs directly into context outperforms basic RAG due to large context windows. PostHog Wizard detects integration type (e.g., framework/language), fetches hot-off-the-press docs from posthog.com, and slides them in. This fixed early failures like invented APIs/keys, turning primitive agents into reliable integrators serving 15,000 monthly runs and earning unprompted praise on Bluesky/Twitter.

Shape Integrations with Token-Efficient Model Airplanes

Trained on messy repos, LLMs propose workable but suboptimal architectures. Maintain 'model airplanes'—minimal, non-production apps with correct PostHog patterns across frameworks/languages (e.g., login tracking). These are token-cheap facsimiles (UI 'O'-shaped but dummy-functional) that agents reference to complete integrations consistently, avoiding 15,000 unique setups and support nightmares. Flatten them into markdown via a context service for skill files, ensuring agents see the exact shape without full app bloat.

Agents improvise wildly if given the full plan upfront, creating claw-code holes then polishing randomly. Sequence prompts narrowly: (1) Locate business-value files (login/Stripe/churn signals—easy via code shadows); (2) List events/descriptions without coding; (3) Implement PostHog using prior lists + docs. This breadcrumbs to thoughtful, uniform modifications, scaling reliably without sorcerer's apprentice variance.

Interrogate Agents and Lock Tools for Reliability

Human frailties (fragmentary context, contradictory tools, lang mismatches like JS-on-Python) sabotage agents—e.g., missing tools halted hundreds of runs. At run-end, prompt: 'What could we improve for success?' to uncover issues cheaply. For shenanigans on user machines, ban raw .env reads (no cloud leaks); build tools checking/writing keys only. Wizard uses Claude agent SDK in a CLI with free PostHog inference, wrapping securely.

Build with 90% Prompts, Not Code

Code depreciates (new models ignore it), but prompts amplify with better LLMs. Wizard is 90% markdown (docs/skills), 8% markdown tools, rest agent harness—letting the 'octopus' wriggle freely. Step back: sequence info via prompts instead of over-scaffolding code, yielding happy users from 5,000+ monthly.

Summarized by x-ai/grok-4.1-fast via openrouter

6916 input / 1553 output tokens in 20285ms

© 2026 Edge