Prompt Gemini 3.1 Flash TTS for Expressive Voices
Access Gemini 3.1 Flash TTS via `gemini-3.1-flash-tts-preview` model ID; use structured prompts with scene, director notes, and accent specs to generate custom, energetic audio outputs.
Model Access Delivers Prompt-Controlled Audio
Google's Gemini 3.1 Flash TTS, available through the standard Gemini API with model ID gemini-3.1-flash-tts-preview, generates audio files exclusively from text prompts. This enables precise control over voice delivery, outperforming basic TTS by incorporating scene context and stylistic directives, ideal for production-ready voiceovers like radio promos.
Structured Prompts Shape Voice, Pace, and Accent
Build prompts with these sections for vivid results:
- AUDIO PROFILE: Name and scenario summary, e.g., 'Jaz R. "The Morning Hype"'.
- THE SCENE: Vivid environmental details to set energy, like a 'glass-walled studio overlooking the moonlit London skyline' with 'blindingly bright' lights and 'ON AIR' tally.
- DIRECTOR'S NOTES: Specify style ('Vocal Smile' for bright tone), dynamics (high projection, punchy consonants), pace (energetic, bouncing cadence), and accent (e.g., Brixton, London Estuary).
- SAMPLE CONTEXT: Positions the voice, e.g., for 'Top 40 radio' with '11/10 infectious energy'.
- TRANSCRIPT: Use tags like
[excitedly]or[shouting]for delivery cues.
This format produces grinning, high-energy speech synced to fast music, eliminating dead air.
Accent Tweaks and Testing Tools Yield Instant Variations
Changing 'Brixton, London' to 'Newcastle' or 'Exeter, Devon' in prompts reliably shifts accents while preserving energy—tested outputs confirm fluid, localized delivery. For rapid iteration, use the vibe-coded UI at https://tools.simonwillison.net/gemini-flash-tts: input API key, select multi-speaker modes (e.g., 'Puck (Upbeat)' for Joe, 'Kore (Firm)' for Jane), format scripts with exact speaker names, and generate/download WAV files. Example script: 'Joe: How's it going today Jane? Jane: yawn Not too bad, how about you?' outputs 6-second conversations.