Adds lip-sync to any video using an audio file or text. Enables changing the speech of a person in a video to match the provided audio or text input.

Model Overview

Lip-sync video generation tool that alters a person's speech in a video to match the provided audio file or text. Converts your text or audio into lip movements for the video's subject.

Best At

Perfect for creating talking-head videos with synced speech from new audio tracks or text-to-speech. Works best with clear facial shots of people speaking, 2-10 second clips, and high-quality audio.

Limitations / Not Good At

Requires input videos to be between 2-10 seconds in duration and 720p to 1080p resolution. Audio files must be under 5MB and in compatible formats. Cannot use both video_url and video_id in the same request. Also, if using text, a voice_id is required.

Ideal Use Cases

Blog intros with custom voiceovers, animated character lip-syncing, product demos, multilingual video translations.

Input & Output Format

Input:

video_url: A URL to a video file (mp4 or mov) of 2-10 seconds and 720p-1080p.
audio_file: An archive file (mp3, wav, m4a, aac) under 5MB.
text: Free text for lip-sync (requires voice_id).

Output:
A video file (mp4) with lip-synced content, provided as a URI.

Performance Notes

designed for short videos (2-10 seconds). Processing speed depends on Replicate's infrastructure. Output video resolution matches input.

Kling Lip Sync

Model Overview

Best At

Limitations / Not Good At

Ideal Use Cases

Input & Output Format

Performance Notes

Video Url

Audio Url

Output

Nodespell

Keywords

Subscribe