Automate YouTube Thumbnails with Claude Code Agents
Build agentic workflows in Claude Code using YouTube API for trend research, Ideogram for custom poses, and NanoBanana for compositing thumbnails—replacing manual Figma work for 5 weekly videos.
Agentic Workflows Replace Manual Thumbnail Creation
Agentic workflows enable AI agents to autonomously reason, plan, select tools, and iterate toward a goal with minimal human input, unlike rigid scripts. The cycle involves giving a high-level goal (e.g., "generate optimized YouTube thumbnail"), triggering reasoning to form a plan, tool usage (APIs like search or image gen), error correction via replanning, and output delivery. For YouTube creators producing 5 videos weekly, this automates thumbnails previously made manually in Figma: research trending videos in a niche (e.g., "agentic workflows"), analyze top 5-15 results for views/titles/thumbnails, incorporate video title/description/brand assets (poses like happy/sad/neutral photos), generate custom images matching trends, and composite into final thumbnails with logos/text/colors.
Start by sketching the workflow (human goal → research → analysis → generation → compositing), screenshot it, and prompt Claude/ChatGPT: "Generate Claude Code skill for this agent using pasted sketch + agent structure article." Download generated files (Python scripts, tools), create a project folder, open in Cursor IDE, install Claude Code extension.
API Setup Drives Autonomous Research and Generation
Configure four APIs in .env:
- YouTube Data API v3: Enable in Google Cloud Console, copy key. Agent queries recent videos (past week/month) by keyword, fetches 5-15 top results with views/titles/descriptions/thumbnails, downloads images, analyzes why they perform (e.g., Jeff Su's video: high views due to bold text/contrasting face).
- Ideogram API: $20 min credit; generates new poses/faces referencing brand photos (e.g., replicate trending pose like hand-under-chin, matching hair/eyes/wristband).
- NanoBanana (Gemini): Specify "nanobanana from Gemini" in prompts; composites elements (backgrounds, text, logos, poses) into thumbnails.
- Anthropic (Claude): Powers agent reasoning in Claude Code.
Prompt agent: "Research 10-15 top + 5 recent videos for keyword, analyze thumbnails, use my title/description/poses folder, generate via Ideogram if needed, composite in NanoBanana." Outputs 5+ thumbnails mimicking trends but personalized (e.g., your face in excited/praying pose over trending layouts). Iterate: "Change logos" or "Refine poses"—agent replans/tools autonomously.
Trade-offs: Initial Ideogram faces may mismatch (brown eyes vs. yours); refine prompts with references. YouTube API setup hardest but enables data-driven optimization over guesswork.
Local Frontend Enables Iterative Visual Refinement
Prompt Claude Code: "Build clean localhost frontend (white/black, simple) to run agent: inputs for description/trending keyword/channel URL/scan past 5 videos/transcript, preview poses, generate/refine." Key features:
- Inputs: Keyword search (e.g., "framer mcp"), channel scan, title/context/transcript, pose selection from assets.
- Generation: Produces thumbnails (e.g., dark studio, sad face right, text left: "Claude Code did this").
- Refine Tools: Upload images (Google Claude logo URL), highlight/mask areas ("remove YT letters/add piercing"), text changes ("move dotted lines behind head/turn 'agent workflow' orange"), clone stamp (alt-click source, paint/apply to match backgrounds—erases mismatches seamlessly).
Demo outcomes: From Jeff Su trend, generates you praying at desktop with exact wristband detail; adds logos (Ideogram/Google/Gemini/YouTube); edits text flows around hair. Download project/files/prompts from Gumroad; join Discord for collaboration. This cuts thumbnail time from hours to minutes, scaling for niches like AI/Framer, with easy extensions (inbox triage, autodrafts).