Goku: Flow based video generative foundation models

Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yang · 2025 · arXiv 2502.04896

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VACE: All-in-One Video Creation and Editing

cs.CV · 2025-03-10 · unverdicted · novelty 7.0

VACE unifies reference-to-video generation, video-to-video editing, and masked video-to-video editing in one Diffusion Transformer framework using a Video Condition Unit for inputs and a Context Adapter for task injection.

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

cs.CV · 2025-12-10 · unverdicted · novelty 6.0

VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.

Demystifying Transition Matching: When and Why It Can Beat Flow Matching

cs.LG · 2025-10-20 · unverdicted · novelty 6.0

TM outperforms FM for well-separated modes with non-negligible variance by preserving covariance via stochastic latent updates, with the gap closing as variance approaches zero.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 4 of 4 citing papers.

VACE: All-in-One Video Creation and Editing cs.CV · 2025-03-10 · unverdicted · none · ref 9
VACE unifies reference-to-video generation, video-to-video editing, and masked video-to-video editing in one Diffusion Transformer framework using a Video Condition Unit for inputs and a Context Adapter for task injection.
VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification cs.CV · 2025-12-10 · unverdicted · none · ref 8
VHOI densifies sparse trajectories into color-encoded HOI mask sequences and conditions a fine-tuned video diffusion model on them to produce controllable human-object interaction videos, including full navigation sequences.
Demystifying Transition Matching: When and Why It Can Beat Flow Matching cs.LG · 2025-10-20 · unverdicted · none · ref 1
TM outperforms FM for well-separated modes with non-negligible variance by preserving covariance via stochastic latent updates, with the gap closing as variance approaches zero.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 84
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Goku: Flow based video generative foundation models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer