Veo 3.1
Improved video generation model with higher fidelity, context-aware audio, and supports image references and frame interpolation.
Improved video generation model with higher fidelity, context-aware audio, and supports image references and frame interpolation.
Model Overview
A text-to-video generator that creates high-fidelity videos with context-aware audio. It builds on Veo 3 with improved quality.
Best At
- Generating high-quality videos from text descriptions.
- Maintaining subject consistency when using reference images.
- Smooth video transitions via last frame interpolation.
- Creating natural audio that matches the generated video content.
Limitations / Not Good At
- Reference images are limited to 16:9 aspect ratio and 8-second duration.
- Last frame option is omitted when reference images are used.
- Output is always a video file; no separate image or audio output.
- Specific input image resolutions required for different modes.
Ideal Use Cases
- Marketing and product demos with consistent branding.
- Social media video creation (short vertical or horizontal videos).
- Smooth transitions from images to video content.
- Videos with contextually relevant background vocals or sounds.
Input & Output Format
- Input: Text prompt (required) combined with optional parameters: aspect ratio, duration, starting image, last frame, reference images, negative prompt, resolution, and audio generation flag.
- Output: URI pointing to the generated MP4 video file.
Performance Notes
- High quality video generation requires substantial compute resources.
- Generation time may be longer than other modalities due to video synthesis.
Prompt
StringText prompt for video generation
Image
StringInput image to start generating from. Ideal images are 16:9 or 9:16 and 1280x720 or 720x1280, depending on the aspect ratio you choose.
Last Frame
StringEnding image for interpolation. When provided with an input image, creates a transition between the two images.
Reference Images
String1 to 3 reference images for subject-consistent generation (reference-to-video, or R2V). Reference images only work with 16:9 aspect ratio and 8-second duration. Last frame is ignored if reference images are provided.
Seed
NumberRandom seed. Omit for random generations
-1Prompt
StringText prompt for video generation
Duration
NumberVideo duration in seconds
8Resolution
StringResolution of the generated video
720pAspect Ratio
StringVideo aspect ratio
16:9Generate Audio
BooleanGenerate audio with the video
trueNegative Prompt
StringDescription of what to exclude from the generated video
Output
InferredOutput
Type
Node
Status
Official
Package
Nodespell AI
Category
AI / Video / GoogleInput
Output