Fully Automate Video from Script Using Claude + HeyGen
Nate Herk built an overnight video production pipeline: Claude orchestrates ElevenLabs voice cloning, HeyGen Avatar V5 avatars, and Remotion editing—turning 5-hour manual work into automated clips from raw scripts.
Why Automate Video Production Now
Manual video creation bottlenecks at recording, editing, and motion graphics—eating 5 hours per video. HeyGen's new Avatar V5 crossed the uncanny valley with natural gestures and lip-sync trained on 10M+ facial data points, enabling digital twins from 15 seconds of webcam footage or 10GB uploads. Paired with ElevenLabs' professional voice cloning (requiring 30min-2hrs audio) and Claude's orchestration, it shifts the bottleneck to scripting and ideas. Nate tested hundreds of avatars/scripts, proving it produces course lessons indistinguishable from real recordings except for minor glitches like eye darts or arm artifacts, which vanish in facecam crops.
"The avatar has crossed the uncanny valley. HeyGen Avatar 5 is trained on 10 million plus data points for facial expressions and it creates you a digital twin from just 15 seconds of a webcam clip." (Nate on why V5 changes everything—eliminates lighting, noise, or scheduling issues like fire trucks interrupting recordings.)
Building Realistic Avatars and Voice Clones
Start with HeyGen: Record 15s script or upload footage (Nate used 10GB for his seated avatar). Auto-voice clone is poor; import ElevenLabs clone instead. In ElevenLabs, create professional clone with 30min+ clean audio (Nate: 2hrs), tweak stability/similarity/style (sweet spot after iterations). Generate audio chunks (45-60s to avoid degradation past 1min or 5k chars). Upload to HeyGen AI Studio, select Avatar V5 for generation (30s-1min per clip, capped at 3min). Result: Natural head tilts, swallows, but occasional exaggerations or artifacts.
V5 vs prior: Older Avatars (3/4) had robotic lips/gestures; V5 learns personal movements. API hack needed since V5 unsupported: Claude generates Avatar 4 videos, then Playwright script revises to V5 in dashboard and downloads.
"In ElevenLabs it sounds phenomenal... I went through tons and tons of different iterations of the best settings, and I got to a place that I feel like sounds the most like me with my inflection." (Nate on voice tuning—ElevenLabs alone excels, HeyGen import degrades it.)
Claude as Orchestration Layer
Claude Code handles full pipeline: Scans Google Drive scripts, chunks into 45-60s sentences (e.g., Lesson 5.0 → 4 parts), generates ElevenLabs audio, feeds HeyGen (via API for V4 + Playwright upgrade), then FFmpeg/Remotion for stitching/transcription/timed graphics. Input: "Process lessons 5.0-5.4"; Output: Overnight edited video with synced pops (e.g., text at exact timestamps). Separate projects for HeyGen Studio (chunking/API) and Remotion (styling/backgrounds) to iterate; merging into single "skill" next.
Rejected manual chunking/pasting (too tedious for 10min scripts). Claude researched APIs, automating what required camera op, AV, editor, reader roles. Every run improves via conversation history.
"AI can orchestrate the entire production pipeline... it turns a 5-hour pipeline into an overnight job that I didn't even need to be awake for." (Nate on Claude's agentic power—connects tools end-to-end without clicks.)
Costs, Limitations, and Workarounds
HeyGen: Avatar V5 capped 3min (chunk long scripts); API lacks V5 (Playwright workaround). ElevenLabs: Degrades >1min. Total: Cheaper than studio/gear (24% cite expense as barrier; 91% businesses use video). Nate shares exact HeyGen Studio project/docs free in Skool community for replication.
Tradeoffs: Minor imperfections (eye darts, red triangles on arms); not for main YouTube (keeps personal touch) but ideal for shorts/courses/ads. Bottleneck shifts to ideas—best content wins amid AI flood.
Addressing Common Objections
"Fake/authenticity": Script/voice/face are yours; skip chair/waiting. Fine for non-personal channels (e.g., TikTok news).
"AI slop flood": Exists already (LinkedIn/X); quality ideas filter through—bad content stays bad.
"Kills editor jobs": Evolves roles—apply expertise (e.g., SEO specialist couldn't build own automation without domain knowledge).
"The script is yours, the voice is yours, the face is yours. The only thing missing is that you don't have to sit in a chair or you don't have to wait for the stars to align." (Nate rebutting authenticity concerns—retains human core where it counts.)
Key Takeaways
- Train HeyGen Avatar V5 with 10GB+ footage for best personal mimicry; use webcam for quick starts.
- ElevenLabs professional clone (30min+ audio) + tweaks beats HeyGen's auto-voice; chunk 45-60s.
- Claude Code: Prompt to research APIs, chunk scripts at sentence ends, orchestrate ElevenLabs → HeyGen → Remotion.
- Workaround API gaps with Playwright for dashboard edits; plan for V5 API release.
- Limit clips to 1min for quality; stitch via FFmpeg/Remotion for pro graphics synced to transcription timestamps.
- Test 100+ iterations per project; separate orchestration/editing initially, merge later.
- Use for courses/shorts/ads, not core personal brand; frees time for strategy/ideas.
- Costs beat traditional production; share pipelines in communities for collab.
- Objection-proof: Humans own ideas—AI handles production drudgery.