Minimax Speech 02 Turbo
Real-time Text-to-Audio synthesis with emotional expression and multilingual support.
Real-time Text-to-Audio synthesis with emotional expression and multilingual support.
Model Overview
A powerful Text-to-Audio (T2A) model designed for real-time applications, offering high-quality voice synthesis, a wide range of emotional expressions, and extensive multilingual capabilities.
Best At
This model excels at generating speech for real-time applications where low latency is crucial. It's also highly capable in producing varied emotional tones and supporting over 30 languages with native accents.
Limitations / Not Good At
While optimized for speed, the 'turbo' version might not offer the absolute highest fidelity compared to specialized high-definition models for applications like audiobooks. Extensive character counts (up to 5000) might introduce slightly more latency.
Ideal Use Cases
- Real-time voice assistants and chatbots 🤖
- Dynamic character voices for games 🎮
- Instantaneous audio feedback in applications
- Live narration for streams or events
- Multilingual customer support audio
Input & Output Format
Text prompt → Audio file (URI)
Performance Notes
Designed for low latency, making it ideal for real-time interactions. Offers controls for speed, pitch, volume, and emotion to fine-tune the output.
Text
StringText to convert to speech
Text
StringText to convert to speech. Every character is 1 token. Maximum 5000 characters. Use <#x#> between words to control pause duration (0.01-99.99s).
Pitch
NumberSpeech pitch
0Speed
NumberSpeech speed
1Volume
NumberSpeech volume
1Bitrate
NumberBitrate for the generated speech
128000Channel
StringNumber of audio channels
monoEmotion
StringSpeech emotion
autoVoice Id
StringDesired voice ID. Use a voice ID you have trained (https://replicate.com/minimax/voice-cloning), or one of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl
Wise_WomanSample Rate
NumberSample rate for the generated speech
32000Language Boost
StringEnhance recognition of specific languages and dialects
NoneEnglish Normalization
BooleanEnable English text normalization for better number reading (slightly increases latency)
falseOutput
InferredOutput
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Audio / MinimaxInput
Output