AI Pipeline Clips Videos to Viral Shorts in 10 Minutes
Use Whisper for transcription, Claude Opus to select viral moments, YOLO for face tracking, and Remotion for edits to automate long-form video to shorts pipeline, processing 89-min podcasts into styled clips with uploads via Surf Agent in 5-15 minutes.
Pipeline Turns Hour-Long Videos into Retention-Optimized Shorts
Start with any source video (podcasts, interviews, reactions), extract audio via FFmpeg, and transcribe with local Whisper for timestamped text. Feed transcripts to Claude Opus (version 4.7) to identify 3 high-engagement moments—scoring for virality from 1-hour content like an 89-minute Diary of a CEO episode. Apply YOLO for face detection and light Active Speaker Detection (ASD) to track who speaks, enabling dynamic reframing from 16:9 to vertical shorts that follow the active face. Finish with Remotion code-based edits: auto-captions, zooms, flashes, meme sound effects, and music overlays boost retention without manual tweaks. Total runtime: 5-15 minutes per batch, yielding ready MP4s that rival hand-edited clips.
This setup improves every 6 months as models advance—creator's channel hit 20k subs and 8.9M views on prior automated shorts, proving production viability.
Tool Stack Enables Hands-Off Processing
Core models handle decisions autonomously: Claude reads full transcripts, skims for 'strong moments,' and outputs clip timestamps; YOLO excels at multi-face detection in podcasts; ASD ensures seamless speaker switches in interviews. Remotion shines for programmatic effects, avoiding brittle no-code tools. For deployment, self-host n8n on Hostinger VPS (KVM 2: 2 vCPUs, 8GB RAM, 100GB disk, 8TB bandwidth; 10% off with ALLABOUTAI code atop 70% discount) via one-click install—triggers workflows like schedule-to-Gmail in seconds. Upload via Surf Agent: browser automation picks title (e.g., "A doctor just exposed what's happening to male fertility"), sets private visibility, and publishes without APIs, runnable on Mac Mini.
Trade-offs: Works best on face-heavy content; struggles with prop-focused segments (e.g., vials demo) or low-face reactions, but still produces usable clips. No manual intervention post-URL input.
Results Across Video Types Validate Scalability
Processed Diary of a CEO (fertility trajectory clip with clear visuals, effects); Charlie Moist Penguin reaction (precise head tracking in rants about Twitch mods); multi-speaker interviews (smooth YOLO/ASD switches on trivia questions); traditional talks ($10 coffee compounding to $150 in 40 years at 7% returns); gambling stream reactions (context switches on addiction admissions). All output engaging 1-minute shorts with pro styling—e.g., fertility clip captures drama despite minor framing hiccups. Creator plans mass posting to test views, iterating on Whisper+YOLO+ASD+Remotion combo. Run locally or scale via n8n/Hostinger for channels needing clip volume without editors.