Kling Text-to-Video

Model: Kling Text-to-Video (GMI API)
Description: Generates a video from a text prompt using Kling models (Standard/Pro, multiple versions). Supports duration control, CFG scale, and negative prompts.

Inputs

Required

prompt (STRING)
Text prompt describing the video content. Max 2500 characters.
negative_prompt (STRING)
Text describing what should be avoided in the generated video. Max 2500 characters.
cfg_scale (FLOAT)
Controls how strongly the model follows the prompt.
Options: 0.0, 0.5, 1.0
aspect_ratio (STRING)
Defines output video shape. Options: 16:9, 9:16, 1:1
duration (STRING)
Video length in seconds. Options: 5, 10
model (STRING)
Kling model variant used for generation.

Hidden

unique_id (STRING)
Internal execution tracking ID for async requests.

Outputs

VIDEO (VIDEO)
Generated video tensor output.
VIDEO_URL (STRING)
Public URL where generated video is hosted.
FILE_PATH (STRING)
Local saved file path of generated video.

Kling Image2Video

Model: Kling Image-to-Video (GMI API)
Description: Generates a video from a reference image and text prompt.

Inputs

Required

prompt (STRING)
Text prompt describing motion and scene. Max 2500 characters.
negative_prompt (STRING)
Defines unwanted elements in output video. Max 2500 characters.
cfg_scale (FLOAT)
Controls prompt adherence strength.
duration (STRING)
Video length in seconds. Options: 5, 10
model (STRING)
Kling model version used for generation.

Optional

start_frame_image (IMAGE)
Image tensor used as the starting frame of the video.
start_frame (STRING)
URL of reference image used as starting frame.

Hidden

unique_id (STRING)
Internal request tracking identifier.

Outputs

VIDEO (VIDEO)
Generated video tensor output.
VIDEO_URL (STRING)
Public URL to generated video.
FILE_PATH (STRING)
Local file path where video is saved.

Kling 2_6 Motion Control

Model: Kling 2.6 Motion Control
Description: Transfers motion from a reference video to a character image.

Inputs

prompt (STRING)
Optional text guiding style, lighting, or environment. Motion is derived from reference video.
video_url (STRING)
Reference video used for motion extraction. Must contain human subject motion.
image (IMAGE)
Required reference image used for character appearance.
image_url (STRING)
URL version of reference image.
character_orientation (STRING)
Controls motion alignment mode. Options: video, image
mode (STRING)
Quality mode selection. Options: std, pro
keep_original_sound (STRING)
Determines whether original audio is preserved. Options: yes, no

Outputs

VIDEO (VIDEO)
Generated motion-transfer video.
VIDEO_URL (STRING)
Public URL of generated video.
FILE_PATH (STRING)
Local storage path of output video.

Kling Reference To Video

Model: Kling Reference-to-Video
Description: Generates video using reference video and optional image conditioning.

Inputs

prompt (STRING)
Optional guidance prompt for style or scene control.
video_url (STRING)
Reference motion video input.
aspect_ratio (STRING)
Output video shape. Options: 16:9, 9:16, 1:1
image (IMAGE)
Optional reference image(s) for conditioning.
image_url (STRING)
URL version of reference images.
duration (STRING)
Output video duration in seconds.
mode (STRING)
Quality mode. Options: std, pro
keep_original_sound (STRING)
Preserves audio from reference video. Options: yes, no

Outputs

VIDEO (VIDEO)
Generated video tensor.
VIDEO_URL (STRING)
Hosted video URL.
FILE_PATH (STRING)
Local saved file path.

Kling V3 Image To Video

Model: Kling V3 Image-to-Video
Description: Generates video from image and prompt with optional tail frame support.

Inputs

prompt (STRING)
Main motion and scene description.
image (IMAGE)
Required input image used as starting frame.
image_url (STRING)
URL version of input image.
negative_prompt (STRING)
Elements to exclude from generated video.
image_tail (IMAGE)
Optional end-frame image for motion guidance.
image_tail_url (STRING)
URL version of end-frame image.
duration (STRING)
Video length in seconds. Range: 3–15.
sound (STRING)
Controls audio generation. Options: on, off

Outputs

VIDEO (VIDEO)
Generated video output.
VIDEO_URL (STRING)
Public URL of generated video.
FILE_PATH (STRING)
Local saved video path.

Kling V3 Text To Video

Model: Kling V3 Text-to-Video
Description: Generates video purely from text prompt.

Inputs

prompt (STRING)
Primary text prompt describing video content.
negative_prompt (STRING)
Elements to exclude from video.
duration (STRING)
Video length in seconds. Range: 3–15.
aspect_ratio (STRING)
Output video shape. Options: 16:9, 9:16, 1:1
sound (STRING)
Audio generation toggle. Options: on, off

Outputs

VIDEO (VIDEO)
Generated video tensor output.
VIDEO_URL (STRING)
Public video URL.
FILE_PATH (STRING)
Local saved file path.

Kling Edit Video

Model: Kling O1 Edit Video
Description: Edits existing video using text instructions and optional reference images.

Inputs

prompt (STRING)
Instruction describing how video should be edited.
video_url (STRING)
Input video to be edited.
image (IMAGE)
Optional reference images (up to 4).
image_url (STRING)
URL version of reference images.
aspect_ratio (STRING)
Output video format.
duration (STRING)
Output video duration in seconds.
mode (STRING)
Quality mode. Options: std, pro

Outputs

VIDEO (VIDEO)
Edited video output tensor.
VIDEO_URL (STRING)
Public URL of edited video.
FILE_PATH (STRING)
Local saved file path.

Kling 3 Motion Control

Model: kling-3-motion-control
Description: Transfers motion from a reference video to a character image.

Inputs

video_url (STRING)
Reference video used for motion extraction (3–30s).
image (IMAGE)
Required character reference image used as the motion target.
prompt (STRING)
Optional text prompt to guide style, lighting, or environment of the result.
image_url (STRING)
URL version of the reference image.
character_orientation (STRING)
Controls alignment between character and motion source. Options: video, image.
mode (STRING)
Quality mode selection affecting output fidelity. Options: std, pro.
keep_original_sound (STRING)
Determines whether original audio is preserved. Options: yes, no.

Outputs

video (VIDEO)
Generated motion-transfer video.
video_url (STRING)
Publicly accessible URL of the generated video.
file_path (STRING)
Local filesystem path where the video is saved.

LLM Node

Model: configurable (e.g. moonshotai/Kimi-K2.5)
Description: Calls large language models with optional multimodal inputs (text, image, video).

Inputs

Required

model (STRING)
Identifier of the LLM model used for generation.
prompt (STRING)
User input text or instruction sent to the model.

Optional

temperature (FLOAT, default: 1.0)
Controls randomness of output generation (0 = deterministic, 1 = creative).
max_tokens (INT, default: 5120)
Maximum number of tokens the model is allowed to generate.
system_prompt (STRING, default: “You are a helpful AI assistant. Provide direct, concise answers without showing your thinking process.”)
Defines model behavior and response style.
image_url (STRING)
Optional image input for vision-capable models.
video_url (STRING)
Optional video input for multimodal understanding.

Outputs

content (STRING)
Generated response text from the language model.

Ltx2 Fast Image To Video

Model: ltx-2-fast-image-to-video
Description: Converts a static image into a motion video with optional audio generation.

Inputs

Required

prompt (STRING)
Text description defining motion, style, and animation behavior.
duration (INT, default: 6)
Length of the generated video in seconds.
resolution (STRING, default: 1920x1080)
Output video resolution. Options: 1920x1080, 2560x1440, 3840x2160.
image (IMAGE)
Input image used as the base frame for animation.

Optional

image_url (STRING)
URL version of the input image.
fps (INT, default: 25)
Frame rate of generated video. Options: 25, 50.
generate_audio (BOOLEAN, default: True)
Enables or disables AI-generated audio.
camera_motion (STRING)
Defines camera movement style applied to video generation.

Outputs

video (VIDEO)
Generated animated video tensor.
video_url (STRING)
Public URL of the generated video.
file_path (STRING)
Local path where the video is stored.

Ltx2 Pro ImageToVideo

Model: ltx-2-pro-image-to-video
Description: High-fidelity image-to-video generation with improved visual quality and stability.

Inputs

Same as GMILtx2FastImageToVideoNode
(All inputs share identical meaning, but with higher-quality generation backend.)

Outputs

video (VIDEO)
High-quality generated video.
video_url (STRING)
Public URL for the generated video.
file_path (STRING)
Local saved video path.

Ltx2 Fast TextToVideo

Model: ltx-2-fast-text-to-video
Description: Generates video directly from a text prompt.

Inputs

Required

prompt (STRING)
Text description of the desired video.

Optional

duration (INT, default: 6)
Length of video in seconds.
resolution (STRING, default: 1920x1080)
Output resolution of the video.
fps (INT, default: 25)
Frame rate of generated video.
generate_audio (BOOLEAN, default: True)
Enables AI-generated audio track.
camera_motion (STRING)
Defines camera movement behavior in generated video.

Outputs

video (VIDEO)
Generated video output.
video_url (STRING)
Public URL of video.
file_path (STRING)
Local storage path.

Ltx2 Pro TextToVideo

Model: ltx-2-pro-text-to-video
Description: Premium-quality text-to-video generation with enhanced realism and detail.

Inputs

Same as GMILtx2FastTextToVideoNode
(Identical parameters with improved model quality.)

Outputs

video (VIDEO)
video_url (STRING)
file_path (STRING)

Ltx2 Pro Retake

Model: ltx-2-pro-retake
Description: Edits a specific segment of an existing video (audio, video, or both).

Inputs

Required

video_url (STRING)
Source video to be edited.
start_time (FLOAT, default: 0)
Start time of edit segment in seconds.
duration (INT, default: 5)
Length of segment to modify.

Optional

prompt (STRING)
Instruction describing how the segment should be changed.
mode (STRING, default: replace_audio_and_video)
Edit operation mode. Options: replace_audio_and_video, replace_audio, replace_video.

Outputs

video (VIDEO)
Edited video output.
video_url (STRING)
Public URL of edited video.
file_path (STRING)
Local saved path.

Ltx2 Pro Audio-to-Video

Model: ltx-2-pro-audio-to-video
Description: Generates video driven by audio input, optionally guided by image or prompt.

Inputs

Required

audio / audio_url (AUDIO or STRING)
Audio input file (2–20 seconds).

Optional

prompt (STRING)
Text guidance for scene generation.
image / image_url (IMAGE or STRING)
Optional first-frame visual reference.
resolution (STRING, default: 1920x1080)
Output resolution.
guidance_scale (FLOAT, default: 5)
Strength of prompt adherence.

Outputs

video (VIDEO)
Generated video.
video_url (STRING)
Public video URL.
file_path (STRING)
Local saved file path.

Luma Image-to-Video

Model: Luma-Ray2
Description: Generates video from text prompts with optional image conditioning, frame control, and configurable output settings.

Inputs

Required

prompt (STRING)
Text prompt describing video content.

Optional

model (STRING, default: Luma-Ray2)
Model variant used for generation.
negative_prompt (STRING, default: "")
Specifies elements to exclude from output.
duration (STRING, default: 5)
Video length. Options: 5, 9.
aspect_ratio (STRING, default: 16:9)
Output format ratio.
resolution (STRING, default: 1080p)
Output resolution.
loop (BOOLEAN, default: False)
Enables seamless looping.
frame0_image_url (STRING)
First-frame conditioning image.
frame1_image_url (STRING)
Last-frame conditioning image.
seed (INT, default: 0)
Random seed (0 = random).

Outputs

video (VIDEO)
Generated video tensor.
video_url (STRING)
Public URL of generated video.
file_path (STRING)
Local saved video path.

Minimax Hailuo Video

Model: Minimax-Hailuo
Category: Model Library/General Video Models
Description: Generates video using Minimax-Hailuo model (text-to-video or image-to-video).

Inputs

prompt_text (STRING, multiline)
Text prompt describing video content.
model (COMBO, default: Minimax-Hailuo-2.3)
Model variant selection.
duration (COMBO, default: 6)
Video duration in seconds.
image (IMAGE, optional)
Image input for conditioning animation.
first_frame_image (STRING, optional)
URL-based image input.
seed (INT, default: 0)
Random seed for reproducibility.
resolution (COMBO, default: 768P)
Output resolution.
prompt_optimizer (BOOLEAN, default: True)
Enhances prompt understanding and expansion.
fast_pretreatment (BOOLEAN, default: False)
Enables faster preprocessing pipeline.

Outputs

VIDEO (VIDEO)
Generated video output.
VIDEO_URL (STRING)
Public URL of generated video.
FILE_PATH (STRING)
Local saved file path.

Minimax Text To Video

Model: Minimax-Hailuo
Category: Model Library/General Video Models
Description: Generates video from text-only prompts using Minimax-Hailuo.

Inputs

prompt_text (STRING)
Text prompt describing desired video.
model (COMBO, default: Minimax-Hailuo-2.3)
Model variant selection.
duration (COMBO, default: 6)
Video duration.
seed (INT, default: 0)
Random seed.
resolution (COMBO, default: 768P)
Output resolution.
prompt_optimizer (BOOLEAN, default: True)
Improves prompt interpretation.
fast_pretreatment (BOOLEAN, default: False)
Enables faster preprocessing.

Outputs

VIDEO (VIDEO)
VIDEO_URL (STRING)
FILE_PATH (STRING)

Minimax Image To Video

Model: Minimax-Hailuo
Category: Model Library/General Video Models
Description: Generates video from image + text prompt using Minimax-Hailuo.

Inputs

prompt_text (STRING)
Text describing motion and scene transformation.
model (COMBO, default: Minimax-Hailuo-2.3)
Model variant selection.
duration (COMBO, default: 6)
Video duration.
image (IMAGE)
Input image used for animation.
first_frame_image (STRING)
URL version of input image.
seed (INT, default: 0)
Random seed.
resolution (COMBO, default: 768P)
Output resolution.
prompt_optimizer (BOOLEAN, default: True)
Enhances prompt processing.
fast_pretreatment (BOOLEAN, default: False)
Faster preprocessing mode.

Outputs

VIDEO (VIDEO)
VIDEO_URL (STRING)
FILE_PATH (STRING)

Pixverse v5_5 t2v

Model: pixverse-v5.5-t2v
Description: Generates a video from a text prompt using Pixverse v5.5.

Inputs

Required

prompt (STRING)
Text description of the video content to generate.

Optional

aspect_ratio (STRING, default: 16:9)
Sets output video shape ratio.
duration (STRING, default: 5)
Length of the video in seconds (5, 8, 10).
quality (STRING, default: 540p)
Output resolution of the video.
negative_prompt (STRING)
Describes elements to avoid in generation.
generate_audio_switch (BOOLEAN, default: False)
Enables AI-generated audio.
generate_multi_clip_switch (BOOLEAN, default: False)
Enables multi-shot / cinematic transitions.
thinking_type (STRING, default: auto)
Controls prompt optimization behavior.
seed (INT, default: 0)
Random seed for reproducibility (0 = random).

Outputs

VIDEO (VIDEO)
Generated video as a binary/video object used in ComfyUI pipelines.
VIDEO_URL (STRING)
Public URL where the generated video is hosted.
FILE_PATH (STRING)
Local filesystem path where the video is saved.

Pixverse v5_5 i2v

Model: pixverse-v5.5-i2v
Description: Generates a video from a single image and optional prompt.

Inputs

Required (one image source required)

image (IMAGE)
Input image from ComfyUI. Used as the primary reference frame.

image_url (STRING)
URL to the reference image (used if IMAGE input is not provided).
prompt (STRING)
Text description guiding motion, style, and animation.

Optional

aspect_ratio (STRING, default: 16:9)
Output video aspect ratio.
duration (STRING, default: 5)
Video length in seconds.
quality (STRING, default: 540p)
Output resolution.
negative_prompt (STRING)
Elements to exclude from generation.
generate_audio_switch (BOOLEAN, default: False)
Enables audio generation.
generate_multi_clip_switch (BOOLEAN, default: False)
Enables cinematic transitions.
thinking_type (STRING, default: auto)
Controls prompt reasoning optimization.
seed (INT, default: 0)
Random seed for reproducibility.

Outputs

VIDEO (VIDEO)
Generated animated video.
VIDEO_URL (STRING)
Public hosted URL of the video.
FILE_PATH (STRING)
Local file path of saved output.

Pixverse v5_5 Transition

Model: pixverse-v5.5-transition
Description: Creates a video transition between two images.

Inputs

Required (both frames required)

first_frame_image (IMAGE) or first_frame_image_url (STRING)
Starting frame image for transition.
last_frame_image (IMAGE) or last_frame_image_url (STRING)
Ending frame image for transition.
prompt (STRING)
Text describing how the transition should behave.

Optional

duration (STRING, default: 5)
Video duration in seconds.
quality (STRING, default: 540p)
Output resolution.
negative_prompt (STRING)
Elements to avoid in transition.
generate_audio_switch (BOOLEAN, default: False)
Enables audio generation.
seed (INT, default: 0)
Random seed for reproducibility.

Outputs

VIDEO (VIDEO)
Transition video output.
VIDEO_URL (STRING)
Public URL of generated video.
FILE_PATH (STRING)
Local saved file path.

Pixversev 5_6 t2v

Model: pixverse-v5.6-t2v
Description: Generates a video from a text prompt using Pixverse v5.6.

Inputs

Required

prompt (STRING)
Text prompt describing full video content (max 2048 characters).

Optional

aspect_ratio (STRING, default: 16:9)
Output video aspect ratio.
duration (STRING, default: 5)
Video length in seconds (note: 10s not supported at 1080p).
quality (STRING, default: 540p)
Output resolution.
negative_prompt (STRING)
Content to exclude from generation.
generate_audio_switch (BOOLEAN, default: False)
Enables audio generation.
style (STRING, default: none)
Visual style preset (none, anime, 3d_animation, clay, comic, cyberpunk).
thinking_type (STRING, default: auto)
Controls prompt reasoning optimization.
seed (INT, default: 0)
Random seed.

Outputs

VIDEO (VIDEO)
Generated video file object.
VIDEO_URL (STRING)
Hosted video URL.
FILE_PATH (STRING)
Local saved video path.

Pixversev 5_6 i2v

Model: pixverse-v5.6-i2v
Description: Generates a video from a single image using Pixverse v5.6.

Inputs

Required (one image source required)

image (IMAGE)
ComfyUI image input used as the main reference frame.

image_url (STRING)
URL of the reference image.
prompt (STRING)
Text prompt guiding animation and scene behavior.

Optional

aspect_ratio (STRING, default: 16:9)
Output aspect ratio.
duration (STRING, default: 5)
Video duration in seconds.
quality (STRING, default: 540p)
Output resolution.
negative_prompt (STRING)
Elements to exclude.
generate_audio_switch (BOOLEAN, default: False)
Enables audio generation.
style (STRING, default: none)
Visual style preset.
thinking_type (STRING, default: auto)
Controls prompt reasoning behavior.
seed (INT, default: 0)
Random seed.

Outputs

VIDEO (VIDEO)
Generated animated video.
VIDEO_URL (STRING)
Public URL to video.
FILE_PATH (STRING)
Local saved file path.

Pixverse v5_6 Transition

Model: pixverse-v5.6-transition
Description: Generates a transition video between two images using Pixverse v5.6.

Inputs

Required (both frames required)

first_frame_image (IMAGE) or first_frame_image_url (STRING)
Starting frame image.
last_frame_image (IMAGE) or last_frame_image_url (STRING)
Ending frame image.
prompt (STRING)
Text describing transition behavior.

Optional

duration (STRING, default: 5)
Video duration.
quality (STRING, default: 540p)
Output resolution.
negative_prompt (STRING)
Elements to avoid.
generate_audio_switch (BOOLEAN, default: False)
Enables audio generation.
seed (INT, default: 0)
Random seed.

Outputs

VIDEO (VIDEO)
Generated transition video.
VIDEO_URL (STRING)
Public video URL.
FILE_PATH (STRING)
Local file path.

Reve Create

Model: reve-create-20250915
Description:
Generates an image from a text prompt using the Reve text-to-image model.

Inputs

prompt (required)
- Type: STRING
- Description: Text prompt describing the image to generate.
aspect_ratio (optional)
- Type: ENUM
- Options: 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 1:1
- Default: 3:2
- Description: Controls the aspect ratio of the generated image.

Outputs

image
- Type: IMAGE
- Description: Generated image tensor output for ComfyUI workflows.
image_url
- Type: STRING
- Description: Public URL of the generated image.
file_name
- Type: STRING
- Description: Local saved filename of the generated image.

Reve Edit

Model: reve-edit-20250915
Description:
Edits an image using a reference image and a text prompt.

Inputs

prompt (required)
- Type: STRING
- Description: Text prompt describing how the image should be edited.
image (optional)
- Type: IMAGE
- Description: Reference image from ComfyUI used as input.
image_url (optional)
- Type: STRING
- Description: URL of a reference image (used if IMAGE input is not provided).
aspect_ratio (optional)
- Type: ENUM
- Options: "", 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 1:1
- Description: Controls output image aspect ratio.

Outputs

image
- Type: IMAGE
- Description: Edited image tensor output.
image_url
- Type: STRING
- Description: Public URL of the edited image.
file_name
- Type: STRING
- Description: Local saved filename.

Reve Edit Fast

Model: reve-edit-fast-20251030
Description:
Fast version of image editing using a reference image and text prompt.

Inputs

prompt (required)
- Type: STRING
- Description: Text prompt guiding the image edit.
image (optional)
- Type: IMAGE
- Description: Reference image from ComfyUI.
image_url (optional)
- Type: STRING
- Description: URL of reference image if IMAGE input is not used.
aspect_ratio (optional)
- Type: ENUM
- Options: "", 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 1:1
- Description: Controls output image aspect ratio.

Outputs

image
- Type: IMAGE
- Description: Fast-generated edited image tensor.
image_url
- Type: STRING
- Description: Public URL of the generated image.
file_name
- Type: STRING
- Description: Local saved filename.

Reve Remix

Model: reve-remix-20250915
Description:
Generates an image from 1–6 reference images combined with a text prompt.

Inputs

prompt (required)
- Type: STRING
- Description: Text prompt guiding remix generation.
image (optional)
- Type: IMAGE
- Description: 1–6 reference images from ComfyUI.
image_url (optional)
- Type: STRING
- Description: 1–6 reference image URLs (comma-separated).
aspect_ratio (optional)
- Type: ENUM
- Options: "", 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 1:1
- Description: Controls output image aspect ratio.

Outputs

image
- Type: IMAGE
- Description: Generated remixed image tensor.
image_url
- Type: STRING
- Description: Public URL of generated image(s).
file_name
- Type: STRING
- Description: Local saved filename.

Reve Remix Fast

Model: reve-remix-fast-20251030
Description:
Fast version of multi-image remix generation using 1–6 reference images.

Inputs

prompt (required)
- Type: STRING
- Description: Text prompt guiding remix generation.
image (optional)
- Type: IMAGE
- Description: 1–6 reference images from ComfyUI.
image_url (optional)
- Type: STRING
- Description: 1–6 reference image URLs.
aspect_ratio (optional)
- Type: ENUM
- Options: "", 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 1:1
- Description: Controls output image aspect ratio.

Outputs

image
- Type: IMAGE
- Description: Fast-generated remixed image tensor.
image_url
- Type: STRING
- Description: Public URL of generated image(s).
file_name
- Type: STRING
- Description: Local saved filename.

SkyReels Text-to-Video

Model: skyreels-v4-text-to-video
Description: Generates cinematic video from a text prompt using SkyReels V4.

Inputs

prompt (required)
Type: STRING
Description: Text prompt describing the video content.
duration (optional)
Type: INT
Default: 5
Range: 3–15
Description: Duration of the generated video in seconds.
aspect_ratio (optional)
Type: ENUM
Options: 16:9, 4:3, 1:1, 9:16, 3:4
Default: 16:9
Description: Controls the aspect ratio of the output video.
sound (optional)
Type: BOOLEAN
Default: False
Description: Enables audio generation for the video.
mode (optional)
Type: ENUM
Options: std, fast, pro
Default: std
Description: Controls generation quality vs speed.

Outputs

VIDEO (Type: VIDEO) — Generated cinematic video.
VIDEO_URL (Type: STRING) — Public URL of the generated video.
FILE_PATH (Type: STRING) — Local saved file path of the video.

SkyReels Image-to-Video

Model: skyreels-v4-image-to-video
Description: Animates a single image into a cinematic video using SkyReels V4.

Inputs

prompt (required)
Type: STRING
Description: Text prompt guiding the animation.
image (optional)
Type: IMAGE
Description: Input image from ComfyUI (exactly one image required if used).
image_url (optional)
Type: STRING
Description: URL of input image (used if IMAGE input is not provided).
duration (optional)
Type: INT
Default: 5
Range: 3–15
Description: Duration of generated video in seconds.
sound (optional)
Type: BOOLEAN
Default: False
Description: Enables audio generation.
mode (optional)
Type: ENUM
Options: std, fast, pro
Default: std
Description: Controls generation quality vs speed.

Outputs

VIDEO (Type: VIDEO) — Generated animated video.
VIDEO_URL (Type: STRING) — Public URL of the generated video.
FILE_PATH (Type: STRING) — Local saved file path of the video.

Sora 2

Description: Generates video from text prompts using the Sora 2 model with optional reference image support.

Inputs

prompt (required)
Type: STRING
Description: Text prompt for video generation.
input_reference_image (optional)
Type: IMAGE
Description: Optional single reference image input.
input_reference (optional)
Type: STRING
Description: URL version of reference image (0–1 images supported).
model (optional)
Type: STRING
Default: sora-2
Description: Model identifier for generation.
seconds (optional)
Type: ENUM
Default: 4
Options: 4, 8, 12
Description: Video duration.
size (optional)
Type: ENUM
Default: 1280x720
Options: 1280x720, 720x1280
Description: Output resolution.

Outputs

VIDEO (Type: VIDEO) — Generated video output.
VIDEO_URL (Type: STRING) — Public video URL.
FILE_PATH (Type: STRING) — Local saved file path.

Sora 2 Pro

Description: Generates high-quality video using Sora 2 Pro with reference image support.

Inputs

prompt (required)
Type: STRING
Description: Text prompt for video generation.
input_reference_image (optional)
Type: IMAGE
Description: Optional single reference image input.
input_reference (optional)
Type: STRING
Description: URL version of reference image.
model (optional)
Type: STRING
Default: sora-2-pro
Description: Model identifier.
seconds (optional)
Type: ENUM
Default: 4
Options: 4, 8, 12
Description: Video duration.
size (optional)
Type: ENUM
Default: 1792x1024
Options: 1792x1024, 1024x1792, 1280x720, 720x1280
Description: Output resolution.

Outputs

VIDEO (Type: VIDEO) — Generated video output.
VIDEO_URL (Type: STRING) — Public video URL.
FILE_PATH (Type: STRING) — Local saved file path.

Veo3 Video Generation

Description: Generates videos using Google’s Veo 3 models through the GMI gateway, supporting text-to-video and image-conditioned generation.

Inputs

prompt (required) — Type: STRING — Text description of the video.
aspect_ratio (optional) — Type: ENUM — Default: 16:9 — Options: 16:9, 9:16
negative_prompt (optional) — Type: STRING — What should be avoided.
duration_seconds (optional) — Type: INT — Default: 8 — Max 8 seconds.
person_generation (optional) — Type: ENUM — Default: ALLOW — Options: ALLOW, BLOCK
seed (optional) — Type: INT — Default: 0
image (optional) — Type: IMAGE/STRING — Reference image input.
lastFrame (optional) — Type: IMAGE/STRING — Ending frame image.
reference_image (optional) — Type: IMAGE/STRING — Reference image for Veo 3.1.
model (optional) — Type: ENUM — Default: Veo3 — Multiple Veo model options listed.

Outputs

VIDEO (Type: VIDEO) — Generated video output.
VIDEO_URL (Type: STRING) — Public URL.
FILE_PATH (Type: STRING) — Local saved file path.

Vidu Q2 Pro I2V

Model: vidu-q2-pro-i2v
Description: Generates a video from a single reference image using VIDU Q2 Pro I2V model.

Inputs

prompt (required) — STRING — Text prompt (max 2000 chars).
image (optional) — IMAGE — One input image required if used.
image_url (optional) — STRING — URL of input image.
duration (optional) — INT — Default: 5 — Range: 1–10
seed (optional) — INT — Random seed.

Outputs

VIDEO — Generated video output.
VIDEO_URL — Public video URL.
FILE_PATH — Local saved file path.

Vidu Q2 Pro R2V

Model: vidu-q2-pro-r2v
Description: Generates video from multiple reference images and/or videos using VIDU Q2 Pro R2V model.

Inputs

prompt (required) — STRING — Text prompt (max 2000 chars).
images (optional) — IMAGE (batch) — Up to 7 images.
image_urls (optional) — STRING — Comma-separated URLs.
video_urls (optional) — STRING — 1 video (8s) or 2 videos (5s each).
duration (optional) — INT — Default: 5 — Range: 1–8
seed (optional) — INT — Random seed.

Outputs

VIDEO — Generated video output.
VIDEO_URL — Public video URL.
FILE_PATH — Local saved file path.

Vidu Q2 T2V

Model: vidu-q2-t2v
Description: Generates a video from text using VIDU Q2 T2V model.

Inputs

prompt (required) — STRING — Text prompt (max 2000 chars).
duration (optional) — INT — Default: 5 — Range: 1–10
seed (optional) — INT — Random seed.

Outputs

VIDEO — Generated video output.
VIDEO_URL — Public video URL.
FILE_PATH — Local saved file path.

Vidu Q3 Pro I2V

Model: vidu-q3-pro-i2v
Description: Generates video from a single reference image using VIDU Q3 Pro I2V. Supports optional audio.

Inputs

prompt (required) — STRING — Text prompt (max 2000 chars).
image / image_url (required) — One input image required.
duration (optional) — INT — Default: 5 — Range: 1–16
audio (optional) — BOOLEAN — Default: False
seed (optional) — INT — Random seed.

Outputs

VIDEO — Generated video output.
VIDEO_URL — Public video URL.
FILE_PATH — Local saved file path.

Vidu Q3 Pro T2V

Model: vidu-q3-pro-t2v
Description: Generates a video from text using VIDU Q3 Pro T2V model. Supports optional audio.

Inputs

prompt (required) — STRING — Text prompt (max 2000 chars).
duration (optional) — INT — Default: 5 — Range: 1–16
audio (optional) — BOOLEAN — Default: False
seed (optional) — INT — Random seed.

Outputs

VIDEO — Generated video output.
VIDEO_URL — Public video URL.
FILE_PATH — Local saved file path.

Wan Animate Video

Model: Wan2.2-Animate-14B
Description: Generate a video using a reference image and a template video

Inputs

refer_path (required)
Type: STRING
Reference image URL for video generation.
video_path (required)
Type: STRING
Template video URL for video generation.
resolution (optional)
Type: ENUM
Options: 480p, 720p
Default: 480p
Resolution of the output video.

Outputs

VIDEO (VIDEO) — Generated video output
VIDEO_URL (STRING) — Public URL of generated video
FILE_PATH (STRING) — Local saved file path

Wan 2.5 Image-to-Video

Model: wan2.5-i2v-preview
Description: Generate video from image using WAN 2.5 model

Inputs

image (optional)
Type: IMAGE
Input image (ComfyUI tensor). Takes precedence over img_url.
img_url (optional)
Type: STRING
Image URL used if IMAGE is not provided.
prompt (optional)
Type: STRING
Text prompt for video generation.
negative_prompt (optional)
Type: STRING
Negative prompt for video generation.
resolution (optional)
Type: ENUM
Options: 480P, 720P, 1080P
Default: 480P
duration (optional)
Type: ENUM
Options: 5, 10
Default: 5
prompt_extend (optional)
Type: BOOLEAN
Default: True
watermark (optional)
Type: BOOLEAN
Default: False
audio (optional)
Type: BOOLEAN
Default: False
audio_url (optional)
Type: STRING
seed (optional)
Type: INT
Range: 0–2147483647

Outputs

VIDEO (VIDEO) — Generated video output
VIDEO_URL (STRING) — Public URL
FILE_PATH (STRING) — Local saved file path

Wan 2.6 Text-to-Video

Model: wan2.6-t2v
Description: Generate video from text using WAN 2.6 model

Inputs

prompt (required)
Type: STRING
Text prompt for video generation.
negative_prompt (optional)
Type: STRING
audio_url (optional)
Type: STRING
resolution (optional)
Type: ENUM
Options: 720P, 1080P
Default: 1080P
duration (optional)
Type: ENUM
Options: 5, 10, 15
Default: 5
prompt_extend (optional)
Type: BOOLEAN
Default: True
watermark (optional)
Type: BOOLEAN
Default: False
audio (optional)
Type: BOOLEAN
Default: False
seed (optional)
Type: INT
Range: 0–2147483647

Outputs

VIDEO (VIDEO) — Generated video output
VIDEO_URL (STRING) — Public URL
FILE_PATH (STRING) — Local saved file path

Wan 2.6 Image-to-Video

Model: wan2.6-i2v
Description: Generate video from image using WAN 2.6 model

Inputs

image (optional)
Type: IMAGE
Reference image (takes precedence over img_url).
img_url (optional)
Type: STRING
prompt (optional)
Type: STRING
negative_prompt (optional)
Type: STRING
audio_url (optional)
Type: STRING
resolution (optional)
Type: ENUM
Options: 720P, 1080P
Default: 720P
duration (optional)
Type: ENUM
Options: 5, 10, 15
Default: 5
prompt_extend (optional)
Type: BOOLEAN
Default: True
watermark (optional)
Type: BOOLEAN
Default: False
audio (optional)
Type: BOOLEAN
Default: False
seed (optional)
Type: INT
Range: 0–2147483647

Outputs

VIDEO (VIDEO) — Generated video output
VIDEO_URL (STRING) — Public URL
FILE_PATH (STRING) — Local saved file path

Wan 2.6 Reference-to-Video

Model: wan2.6-r2v
Description: Generate video using reference video URLs (multi-character supported)

Inputs

video_urls (required)
Type: STRING
Comma-separated reference video URLs (1–3).
prompt (optional)
Type: STRING (max 1500 chars)
negative_prompt (optional)
Type: STRING (max 500 chars)
size (optional)
Type: ENUM
Options: multiple resolutions
Default: 1920*1080
duration (optional)
Type: ENUM
Options: 5, 10
Default: 5
shot_type (optional)
Type: ENUM
Options: single, multi
Default: single
watermark (optional)
Type: BOOLEAN
Default: False
seed (optional)
Type: INT
Range: 0–2147483647

Outputs

VIDEO (VIDEO) — Generated video output
VIDEO_URL (STRING) — Public URL
FILE_PATH (STRING) — Local saved file path

WAN 2.7 Text-to-Video

Description

Generates a video from a text prompt using the WAN 2.7 T2V model, supporting flexible duration (2–15 seconds), aspect ratio selection, optional audio input, prompt enhancement, and watermark control.

Inputs

prompt: Required text description of the video content (max 1500 characters).
negative_prompt: Optional text describing what to avoid (max 500 characters).
audio_url: Optional external audio file (WAV/MP3, 3–30s, ≤15MB).
resolution: Output resolution tier, either 720P or 1080P (default: 1080P).
ratio: Aspect ratio of output video (16:9, 9:16, 1:1, 4:3, 3:4).
duration: Video length in seconds, between 2 and 15 (default: 5).
prompt_extend: Enables automatic prompt rewriting/enhancement.
watermark: Adds “AI Generated” watermark if enabled.
seed: Random seed for reproducibility.

Outputs

VIDEO: Generated video object.
VIDEO_URL: Hosted URL for the generated video.
FILE_PATH: Local saved file path.

Behavior Notes

This node validates prompt input, submits a WAN 2.7 generation request, and polls until completion. It supports optional audio conditioning and aspect-ratio-aware generation. The final video is downloaded, saved locally, and returned with a preview UI.

WAN 2.7 Image-to-Video

Description

Generates a video from an input image using WAN 2.7 I2V. Supports optional first/last frame conditioning, optional driving audio, flexible duration control, and prompt-based motion guidance.

Inputs

first_frame_image / first_frame_image_url: Optional starting frame (image tensor or URL).
last_frame_image / last_frame_image_url: Optional ending frame (image tensor or URL).
first_clip: Optional reference video for motion guidance.
driving_audio: Optional audio input to guide motion dynamics.
prompt: Optional text prompt (max 1500 characters).
negative_prompt: Optional constraints (max 500 characters).
resolution: Output resolution (720P or 1080P).
duration: Video length in seconds (2–15).
prompt_extend: Enables prompt enhancement.
watermark: Adds watermark overlay.
seed: Random seed for reproducibility.

Outputs

VIDEO: Generated video.
VIDEO_URL: Hosted result URL.
FILE_PATH: Local saved path.

Behavior Notes

This node resolves image inputs from either tensors or URLs, builds a WAN 2.7 I2V request, and generates motion between frames. It supports multimodal conditioning including audio-driven motion and optional temporal guidance via clips.

WAN 2.7 Reference-to-Video

Description

Generates video using multiple reference images and/or reference videos with WAN 2.7. Supports first-frame conditioning, multi-source visual guidance, and flexible motion synthesis across up to 5 total reference assets.

Inputs

first_frame_image / first_frame_url: Optional starting frame.
reference_images: Optional batch of image tensors.
reference_image_urls: Optional comma-separated image URLs.
reference_video_urls: Optional comma-separated video references (max 5 total assets combined).
prompt: Required or optional text prompt (max 1500 characters).
negative_prompt: Optional constraints (max 500 characters).
resolution: Output resolution (720P or 1080P).
ratio: Aspect ratio control (16:9, 9:16, 1:1, 4:3, 3:4).
duration: Video length (2–15 seconds).
prompt_extend: Enables prompt rewriting.
watermark: Adds watermark overlay.
seed: Random seed for reproducibility.

Outputs

VIDEO: Generated video.
VIDEO_URL: Hosted result.
FILE_PATH: Local file path.

Behavior Notes

This node merges multiple visual references (images + videos) into a unified generation context. It validates total reference count (max 5) and ensures at least one valid reference asset exists before generating.

Happy Horse 1.0 Text-to-Video — GMIHHT2VNode

Description

Generates video from a text prompt using the Happy Horse 1.0 model with a focus on high visual fidelity, simple configuration, and short-form generation (3–15 seconds).

Inputs

prompt: Required text description of the video content.
resolution: Output resolution (720P or 1080P).
duration: Video length in seconds (3–15).
watermark: Adds “AI Generated” watermark in bottom-right corner.

Outputs

VIDEO: Generated video.
VIDEO_URL: Hosted video URL.
FILE_PATH: Local saved file path.

Behavior Notes

This is a lightweight T2V node optimized for straightforward generation. It enforces minimum duration constraints (≥3 seconds) and applies default watermarking unless disabled.

Happy Horse 1.0 Image-to-Video — GMIHHI2VNode

Description

Generates video from an image using the Happy Horse 1.0 model. The input image defines the initial frame, and motion is generated based on the prompt.

Inputs

prompt: Required text describing motion and style.
first_frame_image / first_frame_image_url: Required starting image input.
resolution: Output resolution (720P or 1080P).
duration: Video length (3–15 seconds).
watermark: Adds watermark overlay.

Outputs

VIDEO: Generated video.
VIDEO_URL: Hosted URL.
FILE_PATH: Local file path.

Behavior Notes

This node requires a valid first frame (image tensor or URL). It uses the image as a motion anchor and generates consistent temporal transformation guided by the prompt.

GMI Studio User Manual

Documentation Index

​Kling Text-to-Video

​Inputs

​Required

​Hidden

​Outputs

​Kling Image2Video

​Inputs

​Required

​Optional

​Hidden

​Outputs

​Kling 2_6 Motion Control

​Inputs

​Outputs

​Kling Reference To Video

​Inputs

​Outputs

​Kling V3 Image To Video

​Inputs

​Outputs

​Kling V3 Text To Video

​Inputs

​Outputs

​Kling Edit Video

​Inputs

​Outputs

​Kling 3 Motion Control

​Inputs

​Outputs

​LLM Node

​Inputs

​Required

​Optional

​Outputs

​Ltx2 Fast Image To Video

​Inputs

​Required

​Optional

​Outputs

​Ltx2 Pro ImageToVideo

​Inputs

​Outputs

​Ltx2 Fast TextToVideo

​Inputs

​Required

​Optional

​Outputs

​Ltx2 Pro TextToVideo

​Inputs

​Outputs

​Ltx2 Pro Retake

​Inputs

​Required

​Optional

​Outputs

​Ltx2 Pro Audio-to-Video

​Inputs

​Required

​Optional

​Outputs

​Luma Image-to-Video

​Inputs

​Required

​Optional

​Outputs

​Minimax Hailuo Video

​Inputs

​Outputs

​Minimax Text To Video

​Inputs

​Outputs

​Minimax Image To Video

​Inputs

​Outputs

​Pixverse v5_5 t2v

​Inputs

​Required

​Optional

Kling Text-to-Video

Inputs

Required

Hidden

Outputs

Kling Image2Video

Inputs

Required

Optional

Hidden

Outputs

Kling 2_6 Motion Control

Inputs

Outputs

Kling Reference To Video

Inputs

Outputs

Kling V3 Image To Video

Inputs

Outputs

Kling V3 Text To Video

Inputs

Outputs

Kling Edit Video

Inputs

Outputs

Kling 3 Motion Control

Inputs

Outputs

LLM Node

Inputs

Required

Optional

Outputs

Ltx2 Fast Image To Video

Inputs

Required

Optional

Outputs

Ltx2 Pro ImageToVideo

Inputs

Outputs

Ltx2 Fast TextToVideo

Inputs

Required

Optional

Outputs

Ltx2 Pro TextToVideo

Inputs

Outputs

Ltx2 Pro Retake

Inputs

Required

Optional

Outputs

Ltx2 Pro Audio-to-Video

Inputs

Required

Optional

Outputs

Luma Image-to-Video

Inputs

Required

Optional

Outputs

Minimax Hailuo Video

Inputs

Outputs

Minimax Text To Video

Inputs

Outputs

Minimax Image To Video

Inputs

Outputs

Pixverse v5_5 t2v

Inputs

Required

Optional

Outputs

Pixverse v5_5 i2v