ElevenLabs TTS V3
Generate high-quality text-to-speech audio using ElevenLabs' advanced Eleven-v3 model. Customize voice, stability, speed, and more.
Generate high-quality text-to-speech audio using ElevenLabs' advanced Eleven-v3 model. Customize voice, stability, speed, and more.
Model Overview
Text-to-audio converter that generates realistic and expressive speech from text using the ElevenLabs Eleven-v3 model.
Best At
Creating natural-sounding voiceovers for audiobooks, narrations, and interactive applications. Offers extensive voice customization options to match desired tones.
Limitations / Not Good At
Limited language support (only English currently). May struggle with highly non-standard pronunciation or complex emotional tones beyond predefined voices.
Ideal Use Cases
Website voiceovers, video narration, interactive voice assistants, and custom audio content creation.
Input & Output Format
Input: Text string and optional voice parameters. Output: Audio file in MP3 format and optional word-level timestamps.
Performance Notes
On-demand generation with response times scaling with text length. Continuous speech options (previous_text, next_text) improve consistency for long audio streams.
Text
StringThe text to convert to speech
Previous Text (Optional)
StringThe text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
Next Text (Optional)
StringThe text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
Text
StringThe text to convert to speech
Next Text
StringThe text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
Speed
NumberSpeech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality.
1Style
NumberStyle exaggeration (0-1): Amplifies the distinctive speaking style of the original voice. It adds extra effort and latency, and can make the output slightly less stable, so it’s best kept at 0 unless a dramatic effect is needed.
0Stability
NumberVoice stability (0-1): Controls how consistent the voice is. Lower values give a wider emotional range and more varied pacing, but can sound erratic. Higher values produce a steadier, more monotone delivery that usually requires fewer iterations to hit the desired tone.
0.5Similarity Boost
NumberSimilarity boost (0-1): The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
0.75Voice
StringThe voice to use for speech generation
21m00Tcm4TlvDq8ikWAMLanguage Code
StringLanguage code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 and Flash v2.5 support language enforcement. For other models, an error will be returned if language code is provided.
Previous Text
StringThe text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation.
Voice Control
StringAdvanced
StringOutput
InferredOutput
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Audio / ElevenlabsInput
Output