Back to Nodes
Grok Imagine Reference To Video

Grok Imagine Reference To Video

Official

Generate videos guided by one or more reference images using xAI’s Grok Imagine video model.

Nodespell AI
AI / Video / Xai

Generate videos guided by one or more reference images using xAI’s Grok Imagine video model.

Model Overview

Grok Imagine R2V is a reference-to-video workflow. Instead of treating an image as the first frame, it uses one or more reference images as visual direction for style, subjects, and composition while the prompt describes the motion and scene.

Best At

  • Character and style consistency across a new generated clip.
  • Combining multiple visual references into one coherent moving scene.
  • Prompt-guided video generation where reference images should influence the outcome without locking the first frame.

Limitations / Not Good At

  • It is not image-to-video and does not accept source videos.
  • Resolution currently tops out at 720p.
  • Pricing scales linearly with output duration, so longer clips cost more even when the references stay the same.

Ideal Use Cases

  • Giving a generated video the look of specific character sheets or mood boards.
  • Combining several reference stills into one motion concept.
  • Style-directed short-form video ideation.

Input & Output Format

  • Input: required prompt plus required reference_images; optional aspect_ratio, duration, and resolution.
  • Output: generated video asset returned on response.

Performance Notes

  • Replicate bills this model per second of output video.
  • Shorter clips are the fastest way to iterate when refining references and motion prompts.
Inputs (2)

Prompt

String

Text prompt describing the video to generate.

RequiredMulti InputMin: 0Max: 100

Reference Images

String

Reference images used as style and content guidance.

RequiredMulti InputMin: 1Max: 7
Parameters (4)

Prompt

String

Text prompt describing the video to generate.

Required
Default:

Aspect Ratio

String

Aspect ratio of the generated video.

Default: 16:9

Duration

Number

Duration of the video in seconds.

Default: 8

Resolution

String

Resolution of the generated video.

Default: 480p
Outputs (1)

Output

Inferred

Generated video output.

Nodespell Team

Type

Node

Status

Official

Package

Nodespell AI

Category

AI / Video / Xai

Input

TextImage

Output

Video

Keywords

Video GenerationPrompt ConditioningConditional GenerationStyle Control
Use in Workflow