Nuwa-xl: Diffusion over diffusion for extremely long video generation

Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, et al · 2023 · arXiv 2303.12346

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

cs.CV · 2025-12-04 · conditional · novelty 6.0

Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

cs.CV · 2023-08-16 · unverdicted · novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.

Matrix-game 2.0: An open-source real-time and streaming interactive world model

cs.CV · 2025-08-18 · unverdicted · novelty 5.0

Matrix-Game 2.0 introduces a scalable data pipeline, action-injection module, and few-step distillation to enable real-time streaming video generation at 25 FPS from game-engine interactions, with open-sourced weights and code.

Character-Centered Dialogue Generation from Scene-Level Prompts

cs.CV · 2025-05-22 · unverdicted · novelty 4.0

A training-free framework generates expressive, character-grounded dialogue and speech from scene prompts using vision-language encoders, LLMs, and a recursive narrative memory bank for cross-scene consistency.

Scene-Action Prompt Fusion for Coherent Text-to-Video Storytelling

cs.CV · 2025-03-08 · unverdicted · novelty 3.0

A prompt fusion approach combines bidirectional time-weighted latent blending, dynamics-informed prompt weighting via CLIP, and semantic action representations to produce temporally consistent long videos from text without retraining.

citing papers explorer

Showing 5 of 5 citing papers.

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation cs.CV · 2025-12-04 · conditional · none · ref 85
Reward Forcing combines EMA-Sink tokens and Rewarded Distribution Matching Distillation to deliver state-of-the-art streaming video generation at 23.1 FPS without copying initial frames.
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory cs.CV · 2023-08-16 · unverdicted · none · ref 31
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
Matrix-game 2.0: An open-source real-time and streaming interactive world model cs.CV · 2025-08-18 · unverdicted · none · ref 51
Matrix-Game 2.0 introduces a scalable data pipeline, action-injection module, and few-step distillation to enable real-time streaming video generation at 25 FPS from game-engine interactions, with open-sourced weights and code.
Character-Centered Dialogue Generation from Scene-Level Prompts cs.CV · 2025-05-22 · unverdicted · none · ref 65
A training-free framework generates expressive, character-grounded dialogue and speech from scene prompts using vision-language encoders, LLMs, and a recursive narrative memory bank for cross-scene consistency.
Scene-Action Prompt Fusion for Coherent Text-to-Video Storytelling cs.CV · 2025-03-08 · unverdicted · none · ref 41
A prompt fusion approach combines bidirectional time-weighted latent blending, dynamics-informed prompt weighting via CLIP, and semantic action representations to produce temporally consistent long videos from text without retraining.

Nuwa-xl: Diffusion over diffusion for extremely long video generation

fields

years

verdicts

representative citing papers

citing papers explorer