SPDD: Governable LLM Coding for Teams

Thoughtworks' Structured Prompt-Driven Development (SPDD) treats prompts as versioned artifacts via REASONS Canvas and CLI workflow, scaling AI assistants from solo speedups to team-safe, reusable code generation.

Scaling AI Coding Beyond Individuals

Individual developers gain speed from LLM assistants, but teams face amplified issues: ambiguous requirements turn into scaled bugs, reviews drown in diffs, integration fails despite generation, and production risks rise with change volume. Thoughtworks' Global IT teams developed SPDD to make AI-generated changes governable, reviewable, and reusable. Instead of ad-hoc chats, SPDD elevates prompts to first-class artifacts in version control, capturing intent, design, and constraints upfront. This shifts focus from "generate more code" to aligning business needs with predictable outputs.

"It's like buying a Ferrari and driving it on muddy roads: the engine is powerful, but your arrival time is determined by road conditions and traffic." This analogy from authors Wei Zhang and Jessie Xia highlights why local productivity doesn't yield system throughput without process fixes.

SPDD creates a closed loop: business input → abstraction → execution → validation → release, with prompts and code evolving together. Divergences trigger prompt-first fixes, turning reviews into intent checks rather than bug hunts. Over time, prompts accumulate domain knowledge into reusable libraries, reducing team variability.

REASONS Canvas: Prompt Structure for Predictability

The REASONS Canvas structures prompts into seven parts, forcing clarity before code generation:

  • R: Requirements – Problem, Definition of Done (DoD).
  • E: Entities – Domain model, relationships.
  • A: Approach – Solution strategy.
  • S: Structure – System fit, components, dependencies.
  • O: Operations – Concrete, testable steps.
  • N: Norms – Naming, observability, defensive coding.
  • S: Safeguards – Invariants, perf limits, security.

Abstract parts (R,E,A,S) define intent and design; O executes; N/S govern. Reviewers validate one artifact, not chats or partial code. This anchors LLM outputs, curbing non-determinism. Compared to spec-driven development, SPDD evolves prompts as living specs alongside code, per Birgitta Böckeler's "spec-anchored" category.

"The canvas aligns intent and boundaries before code is generated, moving uncertainty to the left." Prompts compound expertise across iterations, starting new work from governed baselines.

SPDD Workflow: Versioned Prompts Meet Code Discipline

Implemented via openspdd CLI (https://github.com/gszhangwei/open-spdd), the workflow mirrors code practices: commit, review, gates. Key commands:

CommandPurpose
/spdd-storySplits requirements into INVEST user stories (optional).
/spdd-analysisScans codebase for domain context, risks, strategy.
/spdd-reasons-canvasBuilds full REASONS prompt.
/spdd-generateProduces code per operations/norms/safeguards.
/spdd-api-testGenerates cURL tests (optional).
/spdd-prompt-updateUpdates prompt on requirement changes.
/spdd-syncSyncs code changes back to prompt.

Rule: Fix prompts first on divergence. This enforces alignment, with prompts as collaboration hubs for devs and product owners.

Billing Engine Enhancement: End-to-End SPDD in Action

Starting from a simple token-usage biller (https://github.com/gszhangwei/token-billing/tree/iteration-1-end), SPDD enhanced it for model-aware, multi-plan pricing:

  • Enhancement needs: Add modelId to /api/usage; dynamic rates (e.g., fast-model $0.01/1K tokens, reasoning-model $0.03/1K); Standard plan (quota + overage); Premium (no quota, split prompt/completion billing); extensible via Strategy/Factory.

Step synthesis:

  1. /spdd-story on enhancement idea yields two stories (Standard + Premium), consolidated to one with Given/When/Then ACs (e.g., Standard: 100K quota, 90K used, 30K fast-model → $0.20 overage; Premium: 10K prompt + 20K completion reasoning-model → $1.50).
  2. Clarify: Core logic (routing by plan), boundaries (no CRUD/subscriptions), DoD scenarios.
  3. /spdd-analysis scans code, outputs domain concepts (e.g., quota, overage), strategy (Strategy pattern respecting ISP/SRP), risks/edges (e.g., negative tokens).
  4. Review analysis: Aligns on OOP, surfaces edges; accept as-is.
  5. /spdd-reasons-canvas generates prompt.
  6. /spdd-generate produces code: API updates, plan strategies, rate lookups.
  7. Generate/review tests; deploy.

Result: Production-ready feature matching ACs, with prompts/versioned code for reuse. Example repo shows full artifacts (https://github.com/gszhangwei/token-billing/compare/iteration-1-start...iteration-1-end).

Trade-offs surface: Prompts need tweaks for non-determinism; abstraction-first delays details but uncovers issues early.

"When reality diverges, fix the prompt first — then update the code." This rule prevents drift, ensuring prompts record current reality.

Three Core Skills for SPDD Success

Developers need:

  • Alignment: Review analysis/prompts against business understanding; pair PO/dev for stories.
  • Abstraction-first: Define intent/design before ops; simulate via AI to spot issues.
  • Iterative review: Check prompts pre-code; sync divergences; refine for reuse.

These skills break "expert-only" barriers, enabling juniors via governed patterns.

Fitness and Trade-offs

SPDD fits enhancements on brownfield codebases with clear domain/models. Assess: High ambiguity? Use it. Simple CRUD? Skip for direct generation. Trade-offs: Upfront prompt time (offset by reuse); LLM variance needs reviews; scales best with shared prompt libraries.

"By following the same structure, every prompt becomes governable in the same way."

Key Takeaways

  • Treat prompts as versioned first-class artifacts to scale AI from solo to team.
  • Use REASONS Canvas for structured prompts: abstract intent first, then execute with governance.
  • Implement via CLI like openspdd: analysis → canvas → generate → sync.
  • Always fix prompts before code on divergence; review intent over diffs.
  • Build skills in alignment (business sync), abstraction-first (design simulation), iterative review.
  • Start on enhancements: clarifies domain, accumulates reusable patterns.
  • Expect non-determinism: tweak prompts, but gains compound over iterations.
  • Measure success by team throughput, not lines generated: safer reviews, less rework.

Summarized by x-ai/grok-4.1-fast via openrouter

8681 input / 2592 output tokens in 20305ms

© 2026 Edge