GSD vs Superpowers vs Claude Code: Real Build-Off

Baseline Claude Code built a full agency site fastest (15min, 200k tokens) with decent output; Superpowers added visual planning (1hr, 250k tokens); GSD was thorough but slowest/expensive (1.75hr, 1.2M tokens) with bugs.

Tool Differences: Planning Depth vs Agility

GSD and Superpowers are orchestration layers atop Claude Code (Anthropic's coding agent), tackling context rot in complex projects via sub-agent decomposition and planning. Superpowers emphasizes test-driven development (TDD) with 'red-green-refactor' cycles—no production code without a failing test first—and a visual companion for iterative design previews across four aesthetic options (e.g., 'warm editorial' vs 'electric lime'). It uses git worktrees, auto-loads 14+ skills based on context, and offers inline vs sub-agent execution.

GSD prioritizes explicit state via markdown files (project.md, requirements.md, roadmap.md, state.md, phases) as a 'north star' amid sub-agent resets. It spawns parallel researcher agents for stack, features, architecture, and pitfalls (e.g., 75k+ tokens each), synthesizes with cheaper models like Sonnet, and uses rigid /gsd commands (e.g., /gsd new, /gsd next).

'No production code without a failing test first.' (Superpowers TDD skill, highlighting its strict process to minimize bugs.)

'With so much sub-agent execution... we always want some sort of northstar telling us where we are.' (GSD's state emphasis, explaining markdown-heavy approach for complex coordination.)

Baseline Claude Code skips orchestration, executing plans directly—fast but prone to context overflow on big tasks.

Test: Agency Site with Blog Generator

Task: Build Chase AI site in Next.js-like stack (implied). Features: (1) Landing page (hero, about, services, lead form); (2) Blog list/view; (3) Hidden /studio page scraping YouTube/article URLs, extracting transcripts/thumbnails, generating posts via Anthropic SDK in 'ex-Marine pilot turned AI consultant' voice. No auth for demo. Open decisions: transcript fetch (e.g., yt-dlp?), thumbnails, services list, design taste, error handling.

Prompt left wiggle room to test initiative: e.g., GSD proposed services (consulting options) and YouTube strategy; Superpowers offered three URL fetch options with pros/cons (recommendation: Puppeteer) and thumbnail plans.

Planning Phase: Time and Token Explosion

Claude Code planned in ~10min, 50k tokens—straight to execution.

Superpowers: 40min, 200k tokens. Brainstormed, spec'd (key judgment calls section), implementation plan (28 tasks, 2500 lines). Visual companion spun dev servers for side-by-side hero/about previews, letting user pick (e.g., Option C: centered hero). Fluid chat interface auto-invokes skills.

GSD: 40min, ~600k tokens (459k+ tracked). Four parallel researchers (stack:75k, features:33k, arch:51k, pitfalls:61k), then multi-doc outputs (8 phases, 65 requirements). Sonnet for synthesis. Overkill for 'straightforward' site but scales to novel work.

Tradeoff: Depth costs 4-12x tokens/time vs baseline. Superpowers lighter than GSD but adds visuals users love.

'This is one of my favorite parts of superpowers... you can see everything all at once.' (Visual companion praise, showing interactive design edge over text-only planning.)

Execution: Hands-On vs Fire-and-Forget

Claude Code: Total 15min, 200k tokens. Direct plan-to-code.

Superpowers: Inline execution (skipped sub-agents for speed on 'straightforward' tasks). +15min, +50k tokens (total 1hr, 250k). Verified working features, flagged manual needs (e.g., API key update), summarized judgment calls (e.g., /writing nav for studio, security by obscurity).

GSD: Phased (/gsd next per phase), user input/discussion each step. >1hr execution, 600k tokens (total 1.75hr, 1.2M). Hands-on alignment but 'annoying' for simple tasks.

Superpowers fluid (chat-driven); GSD rigid (slash commands). Baseline: Pure speed.

First-Pass Outputs and Fixes

All produced functional bases, but AI-taste designs ('AI slop').

  • GSD: Plain black/orange, basic blog. /studio 404—blog generator broken. Required fix.
  • Superpowers: Matched visual companion (warm editorial). Blog with images. Working but unremarkable.
  • Claude Code: Similar AI-generic frontend; blog previews shown (truncated).

No outputs 'blow you away' without taste guidance. GSD's depth didn't prevent bugs; Superpowers' TDD/visuals aided iteration.

Tradeoffs surfaced: Orchestration shines for complexity (sub-agents prevent rot) but overkill here—baseline competitive on output, crushes on efficiency. For production, add skills (e.g., frontend design) to baseline.

'Claude Code as a rule kind of sucks at front end design if you don't give it really really good instructions.' (Core limitation all shared; tools mitigate via planning but not taste.)

When Each Wins

  • Baseline Claude Code: Simple/known tasks. 75% cheaper/faster. Scale with plugins/skills.
  • Superpowers: Balanced for web/apps needing design/iteration. Visuals + TDD justify 25% premium.
  • GSD: Novel/complex (e.g., custom arch). Research pays off long-term despite 6x cost.

Unexpected: Baseline viable contender—don't default to heavy layers.

Key Takeaways

  • Start with vanilla Claude Code + targeted skills; add orchestration only for context-heavy projects.
  • Use Superpowers' visual companion for frontend—side-by-side previews beat descriptions.
  • GSD's researcher agents for unknowns, but cap at 10% budget to avoid token bloat.
  • Always explicit taste/aesthetic prompts; AI defaults to generic 'slop.'
  • Track tokens/time: Inline > sub-agents for <30 tasks; phased for precision.
  • Install: Superpowers via /plugin; GSD one-command setup.
  • Test TDD in tools: Failing test-first minimizes regressions.
  • For blog generators: Puppeteer/yt-dlp for scrape; Anthropic SDK for voice-gen.

Summarized by x-ai/grok-4.1-fast via openrouter

9056 input / 2891 output tokens in 24992ms

© 2026 Edge