One-Forcing: Towards Stable One-Step Autoregressive Video Generation
Pith reviewed 2026-05-25 04:36 UTC · model grok-4.3
The pith
One-Forcing augments the DMD objective with an auxiliary GAN loss to enable stable one-step autoregressive video generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
One-Forcing augments the DMD objective with an auxiliary GAN loss. This addresses the blurriness of prior DMD-based one-step methods and the weak dynamics of trajectory-style consistency distillation. The result is high-quality and efficient one-step video generation that reaches a total VBench score of 83.76 while using significantly less training compute for the framewise autoregressive case.
What carries the argument
One-Forcing, the augmentation of the DMD objective with an auxiliary GAN loss that stabilizes one-step sampling in autoregressive video generators.
If this is right
- Achieves a VBench total score of 83.76 as state-of-the-art for one-step causal video generation methods.
- Remains competitive with strong many-step video generation approaches.
- One-step framewise autoregressive generation becomes stable with one-third the training cost of the chunkwise model.
- Prior one-step methods failed to achieve stable framewise generation successfully.
Where Pith is reading between the lines
- The reduced training cost for framewise models could support deployment in settings where chunkwise processing adds unnecessary overhead.
- Success with the GAN correction in the one-step regime may indicate that similar auxiliary losses could stabilize other distillation methods.
Load-bearing premise
The auxiliary GAN loss reliably corrects blurriness and weak dynamics without causing instability or new artifacts, and the reported training-cost savings apply beyond the specific experimental setup.
What would settle it
An experiment that removes the auxiliary GAN loss and measures whether the VBench score falls below competitive levels or whether framewise one-step training becomes unstable.
Figures
read the original abstract
Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-step autoregressive video generation methods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting. Trajectory-style consistency distillation methods often produce videos with weak dynamics, while DMD-based approaches, such as Self-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliary GAN loss for high-quality and efficient one-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-step causal video generation methods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of the chunkwise model, a setting that prior methods have failed to achieve successfully.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes One-Forcing, a method that augments the DMD objective with an auxiliary GAN loss to enable stable one-step causal autoregressive video generation. It reports achieving a total score of 83.76 on VBench, claiming state-of-the-art performance among one-step methods while remaining competitive with many-step approaches, and demonstrates stable framewise autoregressive generation at one-third the training cost of chunkwise models.
Significance. If the empirical claims hold after proper validation, the work would advance efficient real-time video generation by addressing blurriness and weak dynamics in prior one-step methods while substantially reducing both inference latency and training cost. The combination of DMD with auxiliary GAN loss is a straightforward extension that could generalize if the stability claims are substantiated.
major comments (1)
- [Abstract] Abstract: The central claim of a VBench total score of 83.76 establishing SOTA among one-step causal methods is presented without any description of experimental setup, baselines, statistical significance, ablations, or implementation details. This absence makes the performance claim impossible to assess and is load-bearing for the paper's primary contribution.
Simulated Author's Rebuttal
We thank the referee for their feedback. The major comment highlights an important point about the abstract's self-containment. We address it directly below and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of a VBench total score of 83.76 establishing SOTA among one-step causal methods is presented without any description of experimental setup, baselines, statistical significance, ablations, or implementation details. This absence makes the performance claim impossible to assess and is load-bearing for the paper's primary contribution.
Authors: We agree that the abstract, as a standalone summary, should provide enough context to allow readers to assess the central claim without immediately consulting the full text. The manuscript's Sections 3 and 4 already detail the experimental protocol (VBench evaluation on 1K videos, causal framewise autoregressive setting), baselines (Self-Forcing, other DMD variants, chunkwise models, and multi-step methods), ablations on the GAN loss, and implementation (training cost comparison at one-third of chunkwise models). However, to directly address the concern, we will revise the abstract to concisely reference the evaluation setting, primary baselines, and the one-step causal regime. This change strengthens accessibility while preserving the abstract's brevity. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents One-Forcing as an empirical method that augments the DMD objective with an auxiliary GAN loss term. Its central claims consist of benchmark scores on VBench (total 83.76) and a reported training-cost reduction, both obtained through direct experimentation rather than any mathematical derivation or prediction step. No equations, fitted parameters, or self-citations are shown to reduce the reported results to their inputs by construction. The argument structure is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
augments the DMD objective with an auxiliary GAN loss... shared fake-score transformer backbone
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Wan video trajectories concentrate 92.5% of their curvature mass at t≥0.9
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Video generation models as world simulators
Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. 2024. URL https://openai.com/index/ video-generation-models-as-world-simulators/
2024
-
[2]
Veo: a text-to-video generation system
Google DeepMind. Veo: a text-to-video generation system. Technical report, Google DeepMind, 2025. URL https://storage.googleapis.com/deepmind-media/veo/ Veo-3-Tech-Report.pdf
2025
-
[3]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, T...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
HunyuanVideo: A systematic framework for large video generative models,
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, ...
-
[5]
URLhttps://arxiv.org/abs/2412.03603
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Seedance 2.0: Advancing video generation for world complexity,
Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, Jian Cong, Qinpeng Cui, Fei Ding, Qide Dong, et al. Seedance 2.0: Advancing video generation for world complexity,
-
[7]
URLhttps://arxiv.org/abs/2604.14148
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Freeman, Frédo Durand, Eli Shechtman, and Xun Huang
Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Frédo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22963–22974, June 2025. doi: 10.1109/CVPR52734.2025.02138. 10 URL https://openacces...
-
[9]
Self forcing: Bridging the train-test gap in autoregressive video diffusion
Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neu- ral Information Processing Systems, volume 38, pages 167283–167308. Curran Associates, Inc., 2025. URL ...
2025
-
[10]
Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, and Jun Zhu. Causal Forcing: Autoregressive diffusion distillation done right for high-quality real-time interactive video generation, 2026. URLhttps://arxiv.org/abs/2602.02214
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Recurrent world models facilitate policy evolution
David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems, volume 31, pages 2450–2462. Cur- ran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/ 2de5d16682c3c35007e4e92982f1a2ba-Abstract.html
2018
-
[12]
Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025. doi: 10.1038/ s41586-025-08744-2. URLhttps://doi.org/10.1038/s41586-025-08744-2
-
[13]
Genie 2: A large-scale foundation world model
Jack Parker-Holder, Philip Ball, Jake Bruce, Vibhavari Dasagi, Kristian Holsheimer, Chris- tos Kaplanis, Alexandre Moufarek, Guy Scully, Jeremy Shar, Jimmy Shi, Stephen Spencer, Jessica Yung, Michael Dennis, Sultan Kenjeyev, Shangbang Long, Vlad Mnih, Harris Chan, Maxime Gazeau, Bonnie Li, Fabio Pardo, Luyu Wang, Lei Zhang, Frederic Besse, Tim Harley, Ann...
2024
-
[14]
Astra: General interactive world model with autoregressive denoising
Yixuan Zhu, Feng Jiaqi, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Jiwen Lu, and Jie Zhou. Astra: General interactive world model with autoregressive denoising. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=8UZpmrxoLG
2026
-
[15]
Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando De Freitas, Satinder Singh, and Tim Rocktäschel
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Maria Elis- abeth Bechtle, Feryal Behbahani, Stephanie C.Y . Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nan...
2024
-
[16]
Diffusion models are real- time game engines
Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. Diffusion models are real- time game engines. InThe Thirteenth International Conference on Learning Representations,
-
[17]
URLhttps://openreview.net/forum?id=P8pqeEkn1H
-
[18]
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24174– 24184, June 2024. doi: 10.1109/CVPR52733.2024.02282. URL https://openaccess. thecvf.com/content...
-
[19]
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. 11 InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 6613–6623, June 2024. doi: 10.1109/CVPR52733.2024.00632. URL https://openaccess.the...
-
[20]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Infor- mation Processing Systems, volume 35, pages 8633–8646. Curran Associates, Inc.,
-
[21]
URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ 39235c56aef13fb05a6adc95eb9d8d66-Paper-Conference.pdf
2022
-
[22]
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, and Tim Salimans. Imagen Video: High definition video generation with diffusion models, 2022. URL https: //arxiv.org/abs/2210.02303
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[23]
CogVideoX: Text-to-video diffusion models with an expert transformer
Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, and Jie Tang. CogVideoX: Text-to-video diffusion models with an expert transformer. InThe Thirteenth International Conference on Learning Represen...
2025
-
[24]
Min Zhao, Hongzhou Zhu, Kaiwen Zheng, Zihan Zhou, Bokai Yan, Xinyuan Li, Xiao Yang, Chongxuan Li, and Jun Zhu. Causal Forcing++: Scalable few-step autoregressive diffusion distillation for real-time interactive video generation, 2026. URL https://arxiv.org/abs/ 2605.15141
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Sand.ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, Yuchen Sun, Yue Cao, Yunpeng Huang, Yutong Lin, Yuxin Fang, Zewei Tao, Zheng Zhang, Zhongshu Wang, Zixun Liu, Dai Shi, Guoli Su, Hanwen Sun, Hong Pan, Jie Wang, Jiexin Sheng, Min Cui, Min Hu, Ming Yan, Shucheng...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
LongLive: Real-time interactive long video generation
Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Ying-Cong Chen, Yao Lu, Song Han, and Yukang Chen. LongLive: Real-time interactive long video generation. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=nCAODkpsPJ
2026
-
[27]
Rolling forcing: Autoregressive long video diffusion in real time
Kunhao Liu, Wenbo Hu, Jiale Xu, Ying Shan, and Shijian Lu. Rolling forcing: Autoregressive long video diffusion in real time. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=IAyzXjbfwo
2026
-
[28]
Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, and Pinar Yanardag. Infinity-RoPE: Action-controllable infinite video generation emerges from autoregressive self- rollout, 2025. URLhttps://arxiv.org/abs/2511.20649. CVPR 2026
-
[29]
Self-forcing++: Towards minute-scale high-quality video generation
Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Self-forcing++: Towards minute-scale high-quality video generation. In The Fourteenth International Conference on Learning Representations, 2026. URL https: //openreview.net/forum?id=DzvPiqh23f
2026
-
[30]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=PqvMRDCJT9t
2023
-
[31]
Scaling rectified flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InProceedings of the 41st International Conference on Machine Learning, volu...
2024
-
[32]
Consistency models
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 32211–32252. PMLR, 23–29 Jul 2...
2023
-
[33]
Improved techniques for training consistency models
Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. InThe Twelfth International Conference on Learning Representations, 2024. URL https: //openreview.net/forum?id=WNzy9bRDvG
2024
-
[34]
Simplifying, stabilizing and scaling continuous-time consistency models
Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=LyJi5ugyJx
2025
-
[35]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023. URL https://arxiv. org/abs/2310.04378
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
One step diffusion via shortcut models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=OlzB6LnXcS
2025
-
[37]
Large scale diffusion distillation via score-regularized continuous-time consistency
Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Large scale diffusion distillation via score-regularized continuous-time consistency. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=2uNlM353RI
2026
-
[38]
Wong, Yu Qiao, and Ziwei Liu
Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, and Ziwei Liu. Dual-expert consistency model for efficient and high-quality video gen- eration. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14983–14993, October 2025. URL https://openaccess.thecvf.com/ content/ICCV2025/html/Lv_Dual-Ex...
2025
-
[39]
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Du- rand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Sys- tems, volume 37, pages 47455–4748...
2024
-
[40]
Transition matching distillation for fast video generation, 2026
Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, and Arash Vahdat. Transition matching distillation for fast video generation, 2026. URL https://arxiv.org/abs/2601. 09881
2026
-
[41]
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
Xingtong Ge, Yi Zhang, Yushi Huang, Dailan He, Xiahong Wang, Bingqi Ma, Guanglu Song, Yu Liu, and Jun Zhang. Salt: Self-consistent distribution matching with cache-aware training for fast video generation, 2026. URLhttps://arxiv.org/abs/2604.03118
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[42]
Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, and Min Zhang. Reward forcing: Efficient streaming video generation with rewarded distribution matching distillation, 2025. URLhttps://arxiv.org/abs/2512.04678
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Streaming autoregressive video generation via diagonal distillation
Jinxiu Liu, Xuanming Liu, Kangfu Mei, Yandong Wen, Ming-Hsuan Yang, and Weiyang Liu. Streaming autoregressive video generation via diagonal distillation. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=X7YW6STzeL
2026
-
[44]
Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger, editors,Advances 13 in Neural Information Processing Systems, volume 27, pages 2672–2680. Curran Asso...
2014
-
[45]
Generating videos with scene dynamics
Carl V ondrick, Hamed Pirsiavash, and Antonio Torralba. Generating videos with scene dynamics. InAdvances in Neural Information Processing Systems, volume 29, pages 613–621. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/ 2016/file/04025959b191f8f9de3f924f0940515f-Paper.pdf
-
[46]
MoCoGAN: Decomposing motion and content for video generation
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. MoCoGAN: Decomposing motion and content for video generation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1526–1535, June 2018. doi: 10.1109/CVPR.2018. 00165. URL https://openaccess.thecvf.com/content_cvpr_2018/html/Tulyakov_ MoCoGAN_Decomposing_M...
-
[47]
StyleGAN-V: A continuous video generator with the price, image quality and perks of StyleGAN2
Ivan Skorokhodov, Sergey Tulyakov, and Mohamed Elhoseiny. StyleGAN-V: A continuous video generator with the price, image quality and perks of StyleGAN2. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3626–3636, June 2022. URL https://openaccess.thecvf.com/content/ CVPR2022/html/Skorokhodov_StyleGAN-V_A_Co...
2022
-
[48]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InComputer Vision – ECCV 2024, volume 15144 ofLecture Notes in Computer Science, pages 87–103. Springer, 2024. doi: 10.1007/978-3-031-73016-0_6. URL https: //doi.org/10.1007/978-3-031-73016-0_6
-
[49]
Diffusion adversarial post-training for one-step video generation
Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, and Lu Jiang. Diffusion adversarial post-training for one-step video generation. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,Proceedings of the 42nd International Conference on Machine Learning, volume 267...
2025
-
[50]
Autoregressive adversarial post-training for real-time interactive video generation
Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, and Lu Jiang. Autoregressive adversarial post-training for real-time interactive video generation. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 410...
2025
-
[51]
Towards one-step causal video generation via adversarial self-distillation
Yongqi Yang, Huayang Huang, Xu Peng, Xiaobin Hu, Donghao Luo, Jiangning Zhang, Chengjie Wang, and Yu Wu. Towards one-step causal video generation via adversarial self-distillation. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps: //openreview.net/forum?id=P3O0fNmnWa
2026
-
[52]
Jiaxiang Cheng, Bing Ma, Xuhua Ren, Hongyi Henry Jin, Kai Yu, Peng Zhang, Wenyue Li, Yuan Zhou, Tianxiang Zheng, and Qinglin Lu. Phased one-step adversarial equilibrium for video diffusion models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(5): 3237–3245, March 2026. doi: 10.1609/aaai.v40i5.37318. URL https://ojs.aaai.org/ index.php/A...
-
[53]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=XVjTT1nw5z
2023
-
[54]
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video gener- ative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni...
-
[55]
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, Nuo Pang, Kang Kang, Zhiheng Xu, Yuzhe Jin, Yupeng Liang, Yubing Song, Peng Zhao, Boyuan Xu, Di Qiu, Debang Li, Zhengcong Fei, Yang Li, and Yahui Zhou. SkyReels-V2: Infinite-length film generative model...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
Autoregressive video generation without vector quantization
Haoge Deng, Ting Pan, Haiwen Diao, Zhengxiong Luo, Yufeng Cui, Huchuan Lu, Shiguang Shan, Yonggang Qi, and Xinlong Wang. Autoregressive video generation without vector quantization. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=JE9tCwe3lp
2025
-
[57]
LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richard- son, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. LTX-Video: Realtime video latent diffusion, 2024. URLhttps://arxiv.org/abs/2501.00103
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Pyramidal flow matching for efficient video generative modeling
Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InThe Thirteenth International Conference on Learning Representations,
-
[59]
URLhttps://openreview.net/forum?id=66NzcRQuOq
-
[60]
Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Mar- jorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, ...
2025
-
[61]
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, and Chunchao Guo. WorldPlay: Towards long-term geometric consistency for real-time interactive world modeling, 2025. URL https://arxiv.org/abs/ 2512.14614
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Matrix-game: Interactive world foundation model, 2025
Yifan Zhang, Chunli Peng, Boyang Wang, Puyi Wang, Qingcheng Zhu, Fei Kang, Biao Jiang, Zedong Gao, Eric Li, Yang Liu, and Yahui Zhou. Matrix-game: Interactive world foundation model, 2025. URLhttps://arxiv.org/abs/2506.18701. A Details of Implementations Our implementation is based on the Causal Forcing codebase [8] and the Wan2.1 model family [3]. The re...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.