This comprehensive tutorial guides you through voice generation and dubbing features, enabling you to create professional narration and multi-character dialogue for your videos.
1. What is Voice Generation?
Voice Generation (TTS - Text-to-Speech) transforms your script text into natural-sounding narration, bringing your stories to life with professional-quality audio.
1.1 When to Use Voice Generation
Voice generation is ideal for:
- Video narration: Add voiceover to your story or video content
- Multi-character dialogue: Assign unique voices to different characters
- Quick iteration: Rapidly regenerate audio when updating scripts
- Professional quality: Produce broadcast-quality narration without recording equipment
2. Two Ways to Generate Voice
You can generate voice in two locations, each suited for different workflows:
2.1 Editor Page
Quick voice generation with basic controls for rapid iteration.
2.2 Dubbing Page
Advanced voice editing with fine-grained control over sentences, pauses, and voice selection.
3. Quick Start: Generate Voice from Editor
The Editor provides fast access to voice generation for your entire project or selected shots.

3.1 How to Regenerate Voice in Editor
- Click Regenerate All Audio in the top header
- Choose narrator voice and character voices from dropdowns
- Preview voice samples by clicking the play button
- Click Regenerate to apply voices to all shots
3.2 Per-Shot Voice Regeneration
For quick A/B testing of different voices:
- Select a shot in the timeline
- Find the audio section in the right panel
- Click Regenerate Audio for that specific shot
Use case: Perfect for quickly testing different voice options without affecting the entire project.
4. Dubbing Page: Your Advanced Voice Studio
The Dubbing Page is your main workspace for detailed voice editing and multi-character dialogue management.

4.1 Page Layout Overview
The Dubbing Page consists of two main areas:
Left Area
- Narration List: Shows narration text for all shots—click to select
- Timeline Controls: Playback controls and audio splitting timeline
Right Panel (Three Tabs)
- Narration Settings: Adjust narrator voice and parameters
- Character Voices: Assign voices to different characters
- Lip Sync: Generate lip-synced videos
5. Understanding Roles and Voice Assignment
Roles help you organize voices for different characters and narrators, ensuring consistency throughout your project.

5.1 How Roles Work
- Narrator: Default voice for narration text
- Characters: Assign different voices to speaking characters
- Role-to-voice mapping: Bind a specific voice to each role
- Global application: Set a voice for a role and apply to all matching shots
5.2 Assigning Voices to Roles
- Click the Character Voices tab in the right panel
- Select a role from the role list (Narrator or character name)
- Click the Select Voice button to open the voice library
- Choose a voice from the library
- The voice automatically applies to all shots for that role
Pro tip: Assign voices to all roles before generating audio to ensure consistency.
6. Browse and Select Voices
The voice library provides hundreds of voices to choose from, each with unique characteristics and styles.

6.1 Voice Library Features
Powerful Filtering:
- By style: Narrative, dramatic, promotional, casual
- By language: English, Chinese, Japanese, and more
- By gender: Male, female, neutral
Preview & Favorites:
- Preview voices: Click the play button to hear samples
- Favorites: Star your favorite voices for quick access
6.2 How to Select a Voice
- In the Character Voices panel, click Select Voice
- In the voice library modal, use filters to narrow down options
- Click the play button on voice cards to preview
- Click Select to assign the voice to the current role
Selection tips:
- Listen to multiple voices before deciding
- Consider the character’s personality and age
- Test voices with actual script text when possible
7. Multi-Character Voice Generation
Generate audio for all shots with one click, applying the appropriate voice to each character automatically.
7.1 How to Use Multi-Character Voice Generation
- Ensure all roles have assigned voices
- Click the Multi-Character Voice button above the timeline
- The system generates audio for all shots with narration text
- Wait for generation to complete (may take a few minutes)
- Preview all audio on the timeline when complete
Generation time: Depends on project length—typically 2-5 minutes for most projects.
8. Upload Custom Audio
Have your own recorded audio? Upload it directly and use timeline tools to split it across shots.
8.1 Upload Workflow
- Click the Upload Audio button
- Select your audio file (supports common formats: MP3, WAV, M4A)
- Audio is automatically added to the timeline
- Use timeline markers to split audio into different shots
Use cases:
- Professional voice recordings
- Pre-recorded narration
- Audio from external sources
9. Timeline Controls and Audio Splitting
The timeline is the core feature of the Dubbing Page, providing precise control over audio playback and splitting.
9.1 Timeline Features
Playback Controls:
- Play/Pause: Control audio playback
- Jump buttons: Quickly navigate to previous or next shot
- Playback speed: Adjust speed (0.5x - 2x)
- Volume control: Adjust playback volume
Visual Timeline:
- Shows audio segments for all shots
- Displays waveforms for visual reference
- Indicates shot boundaries
9.2 Audio Splitting Feature
Split a complete audio file into individual shots using timeline markers:
- Drag markers on the timeline to appropriate positions
- Markers automatically align to shot boundaries
- After adjusting, click the Apply Split button
- The system splits the complete audio by marker positions into each shot
Pro tip: Use the waveform visualization to identify natural break points in your audio.
10. Adjust Voice Parameters
Fine-tune voice generation with stability parameters to match your content style.

10.1 Stability Parameter
Click the stability button above the timeline to adjust voice stability:
Stability Levels:
- High stability (1.0): More consistent and steady delivery
- Best for: Professional narration, news, educational content
- Medium stability (0.5): Balanced delivery
- Best for: Storytelling, general videos
- Low stability (0.0): More dynamic and expressive
- Best for: Dramatic dialogue, emotional moments
10.2 When to Adjust Stability
Increase stability when:
- Voice sounds too variable or inconsistent
- You need professional, steady narration
- Recording educational or instructional content
Decrease stability when:
- Voice sounds too robotic or stiff
- You want more emotional expression
- Creating dramatic or theatrical content
11. Preview and Export
After generating audio, preview it with your video to ensure quality and timing.

11.1 Preview Workflow
- Click a shot in the narration list
- Use timeline playback controls to listen to audio
- Observe audio waveforms and shot segments on the timeline
- Adjust voice or regenerate as needed
11.2 Export Tips
Quality Checklist:
- All shots have generated audio (no missing audio)
- No audio glitches or unnatural pauses
- Voice matches character personality
- Audio levels are consistent
Next Steps:
- Preview the full video before final export
- Return to the Editor page for final video editing
- Export your completed video with professional audio
12. Common Workflow Example
Here’s a typical dubbing workflow from start to finish:
12.1 Complete Dubbing Workflow
- Prepare script: Write narration text for each shot in the Editor
- Enter Dubbing Page: Access from the project navigation menu
- Assign character voices: Select appropriate voices for narrator and characters
- Generate audio: Click “Multi-Character Voice” to generate for all shots
- Preview and adjust: Review audio and make adjustments as needed
- Adjust stability: Fine-tune parameters for specific shots if needed
- Apply lip sync: Generate lip sync for applicable shots (optional)
- Final check: Play through entire timeline to verify quality
- Return to Editor: Go back for final video editing
- Export video: Render final video with professional audio
13. Tips for Better Voice Quality
13.1 Voice Selection
- Choose unique voices: Select distinct voices for different characters to avoid confusion
- Maintain consistency: Keep the same voice for each character throughout
- Match personality: Choose voices that fit character age, personality, and role
13.2 Technical Quality
- Adjust stability: Fine-tune based on content type
- Preview often: Listen frequently during editing to catch issues early
- Use timeline visualization: Check if audio segments are reasonable
13.3 Workflow Efficiency
- Assign all voices first: Complete voice assignment before generating
- Batch generate: Use multi-character generation for efficiency
- Save favorites: Star frequently used voices for quick access
14. Troubleshooting FAQ
14.1 Voice Sounds Unstable or Inconsistent
Solution:
- Increase Stability parameter to 1.0
- If too stiff, try a different voice with more natural variation
- Check if the script has unusual formatting or characters
14.2 Audio Generation Failed
Solution:
- Verify the shot has narration text
- Ensure voices are assigned to all roles
- Try regenerating audio for that specific shot
- Check your internet connection
14.3 Characters Sound Too Similar
Solution:
- Assign more distinct voices to each role
- Choose voices with clear differences in gender, age, and style
- Use the voice library filters to find contrasting voices
14.4 Lip Sync Results Not Ideal
Solution:
- Ensure shot is single front-facing person with clear face
- Verify audio quality is good
- Try adjusting audio and regenerating lip sync
- Check that the shot meets lip sync requirements
15. Lip Sync Feature
Lip sync allows you to match character mouth movements perfectly with audio, creating natural and realistic videos.

15.1 What is Lip Sync?
Lip Sync is an AI technology that automatically adjusts mouth movements in your video to precisely match audio. This makes it look like the person is actually speaking the words.
15.2 Best Use Cases
Lip sync works best in these scenarios:
- Single front-facing person: Shot contains only one person facing the camera
- Clear facial features: Person’s face is clearly visible without obstructions
- Dialogue scenes: Interviews, monologues, educational videos
15.3 Limitations
Important constraints:
- Best for single front-facing person shots (multiple people or side profiles may not work well)
- Requires both video and audio in the shot
- Face must be clearly visible without obstructions
- Works best with human faces (not animated characters)
15.4 How to Use Lip Sync
- In the Dubbing Page, select a shot with both video and audio
- Click the Lip Sync tab in the right panel
- Click the Match lip sync to video button
- Wait for AI processing (usually takes a few minutes)
- Preview the lip-synced video when processing completes
- Apply to final video if satisfied
15.5 Lip Sync Workflow
Typical workflow:
- Generate audio first: Use dubbing features to generate audio for the shot
- Check shot suitability: Ensure shot is single front-facing person with clear face
- Apply lip sync: Click generate in the Lip Sync panel
- Preview results: Review the lip-synced effect
- Fine-tune if needed: Adjust audio and regenerate if results aren’t ideal
- Export video: Export final video when satisfied
15.6 Tips and Tricks
- Complete audio first: Ensure audio is generated and satisfactory before applying lip sync
- Choose suitable shots: Not all shots are suitable—select clear front-facing shots for best results
- Be patient: Lip sync requires AI processing and may take several minutes
- Try multiple times: If first attempt isn’t ideal, adjust audio and regenerate
16. Summary and Next Steps
With these tools, you can create professional-quality narration from script to final video:
Key Takeaways:
- Use Editor for quick voice generation
- Use Dubbing Page for advanced multi-character dialogue
- Assign voices to roles for consistency
- Adjust stability parameters to match content style
- Use lip sync for realistic dialogue scenes
Build Your Voice Library:
- Save favorite voices for future projects
- Document successful voice and parameter combinations
- Create reusable templates for consistent results
Ready to create? Start experimenting with voice generation and discover how professional narration can transform your videos!