arxiv: 2605.13724 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.AI

Recognition: unknown

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Yuchao Gu , Guian Fang , Yuxin Jiang , Weijia Mao , Song Han , Han Cai , Mike Zheng Shou

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords video diffusionflow map distillationfew-step generationany-step samplingconsistency distillationODE trajectoryon-policy distillation

0 comments

The pith

AnyFlow distills video diffusion models to match few-step consistency performance while preserving scaling as sampling steps increase.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that consistency distillation weakens the test-time scaling of probability-flow ODE sampling by replacing the original trajectory with a fixed consistency path. AnyFlow instead learns flow-map transitions from z_t to z_r over arbitrary time intervals and uses backward simulation to decompose full Euler rollouts into efficient shortcut steps. This on-policy distillation reduces both discretization error in few-step sampling and exposure bias in causal generation. Experiments on bidirectional and causal video architectures from 1.3B to 14B parameters show the resulting model matches or exceeds consistency baselines at low step counts yet continues to improve when more steps are allocated at inference time.

Core claim

AnyFlow optimizes the full ODE sampling trajectory by shifting the distillation target from endpoint consistency mapping (z_t to z_0) to flow-map transition learning (z_t to z_r) over arbitrary intervals, with Flow Map Backward Simulation decomposing Euler rollouts into shortcut transitions that enable efficient on-policy training.

What carries the argument

Flow-map transition learning over arbitrary time intervals combined with Flow Map Backward Simulation for decomposing full rollouts into on-policy shortcuts.

If this is right

Performance matches or surpasses consistency-distilled models in the few-step regime across bidirectional and causal video generators.
Quality continues to improve when additional sampling steps are allocated at test time.
The approach applies to models ranging from 1.3B to 14B parameters without architecture-specific retraining.
On-policy backward simulation reduces both discretization error and exposure bias during distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same flow-map distillation could be applied to image or audio diffusion to restore scaling behavior lost in consistency training.
Adaptive step allocation during inference becomes feasible because the model remains well-behaved at every step count.
Flow maps may serve as a drop-in replacement for consistency targets in other generative settings where trajectory fidelity matters.

Load-bearing premise

Learning flow-map transitions over arbitrary intervals via backward simulation will not introduce new discretization or exposure biases that undermine the original ODE scaling behavior.

What would settle it

A plot of video generation quality versus number of sampling steps (from 1 to 50) in which AnyFlow's performance stops improving or drops after a small step count, matching the degradation pattern of consistency-distilled models.

read the original abstract

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based on flow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the full ODE sampling trajectory. To this end, we shift the distillation target from endpoint consistency mapping $(z_{t}\rightarrow z_{0})$ to flow-map transition learning $(z_{t}\rightarrow z_{r})$ over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors (i.e., discretization error in few-step sampling and exposure bias in causal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AnyFlow shifts video diffusion distillation from fixed consistency maps to flow-map transitions over arbitrary intervals, which lets it keep ODE scaling with more steps while matching few-step baselines.

read the letter

The punchline is that consistency distillation trades away the original ODE's test-time scaling for speed, and AnyFlow tries to recover both by learning z_t to z_r transitions instead of just z_t to z_0. They add Flow Map Backward Simulation to turn full Euler rollouts into on-policy shortcuts for training. That change is the actual technical departure from prior distillation work. Experiments run on bidirectional and causal models from 1.3B to 14B parameters and report that the distilled model matches or beats consistency baselines in the few-step regime while continuing to improve as more steps are given at inference. That combination is useful for real deployments where inference budgets vary. The soft spot is the simulation step itself. Decomposing rollouts into shortcuts should cut discretization error and exposure bias, but the abstract offers no error analysis, convergence bound, or ablation that isolates how approximation error behaves as interval length grows. If those errors accumulate, the claimed any-step scaling could flatten even if the few-step numbers hold. The citation pattern looks standard for the area and the scale of the runs is respectable, but without the full derivations or training curves it is hard to judge how cleanly the method separates from the original ODE trajectory. This paper is for groups already working on efficient video generation and distillation; a reader who needs flexible step budgets would get concrete empirical value from the results. It deserves a serious referee because the central claim is falsifiable with the right ablations and the experiments are large enough to be worth checking.

Referee Report

2 major / 2 minor

Summary. The paper claims that consistency distillation for video diffusion models degrades performance when more sampling steps are used at test time because it replaces the original probability-flow ODE trajectory. AnyFlow addresses this by distilling flow-map transitions (z_t → z_r) over arbitrary intervals rather than fixed endpoint consistency mappings, using a proposed Flow Map Backward Simulation that decomposes full Euler rollouts into shortcut transitions for efficient on-policy training. This is said to reduce discretization error in few-step sampling and exposure bias in causal generation, enabling any-step performance that matches or exceeds consistency baselines while continuing to improve with larger step budgets. Experiments span bidirectional and causal architectures at scales from 1.3B to 14B parameters.

Significance. If the central claims hold, the work would be significant for video generation by providing the first any-step distillation framework that preserves ODE scaling behavior. The shift to flow-map targets and on-policy backward simulation could generalize beyond video to other diffusion domains, and the reported scaling across large model sizes offers practical value for flexible inference budgets.

major comments (2)

[Abstract; Section 3 (Flow Map Backward Simulation)] The abstract and method description state that Flow Map Backward Simulation 'reduces test-time errors' via decomposition of Euler rollouts, but no error analysis, bounds on accumulated approximation error, or convergence argument is given showing that the shortcut transitions preserve the original probability-flow ODE trajectory without introducing new discretization or exposure biases that grow with interval length. This directly underpins the any-step scaling claim.
[Section 4 (Experiments)] Experiments claim AnyFlow matches or surpasses consistency counterparts in the few-step regime while scaling with step budgets, yet no ablation isolates the effect of the backward-simulation approximation on ODE fidelity (e.g., comparing full-rollout vs. decomposed trajectories). Without this, it is unclear whether observed gains stem from the flow-map target or from unaccounted biases.

minor comments (2)

[Section 3] Notation for time intervals (t, r) and the precise definition of the flow-map target could be clarified with a small diagram or explicit equation early in the method section to aid readability.
[Section 4] Tables comparing model scales should include standard deviations or multiple seeds for the reported metrics to strengthen the scaling claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, clarifying our approach and outlining planned revisions to strengthen the presentation of the method and experiments.

read point-by-point responses

Referee: [Abstract; Section 3 (Flow Map Backward Simulation)] The abstract and method description state that Flow Map Backward Simulation 'reduces test-time errors' via decomposition of Euler rollouts, but no error analysis, bounds on accumulated approximation error, or convergence argument is given showing that the shortcut transitions preserve the original probability-flow ODE trajectory without introducing new discretization or exposure biases that grow with interval length. This directly underpins the any-step scaling claim.

Authors: We appreciate the referee highlighting the absence of formal error analysis. The manuscript presents Flow Map Backward Simulation as a practical decomposition for efficient on-policy training, relying on empirical results to demonstrate reduced test-time errors and improved scaling. In the revised version, we will expand Section 3 with a qualitative discussion of the approximation, including how the on-policy backward simulation mitigates exposure bias by aligning training and inference trajectories, supported by additional visualizations of per-step trajectory deviations. We will also include a brief analysis of error accumulation for varying interval lengths. However, deriving rigorous bounds or a full convergence proof for arbitrary intervals is a non-trivial theoretical extension that lies beyond the current scope; we will explicitly note this as a limitation while emphasizing the empirical safeguards and scaling behavior observed across model sizes. revision: partial
Referee: [Section 4 (Experiments)] Experiments claim AnyFlow matches or surpasses consistency counterparts in the few-step regime while scaling with step budgets, yet no ablation isolates the effect of the backward-simulation approximation on ODE fidelity (e.g., comparing full-rollout vs. decomposed trajectories). Without this, it is unclear whether observed gains stem from the flow-map target or from unaccounted biases.

Authors: We agree that an ablation isolating the backward-simulation approximation is necessary to strengthen the experimental claims. In the revised manuscript, we will add this ablation to Section 4, comparing AnyFlow variants trained with full Euler rollouts against the decomposed shortcut transitions. The comparison will report ODE fidelity metrics such as average trajectory deviation from the reference probability-flow path, as well as generation quality (FVD, CLIP score) across different step budgets. This will help attribute performance differences more clearly to the flow-map targets versus the on-policy decomposition. revision: yes

Circularity Check

0 steps flagged

No significant circularity in AnyFlow derivation chain

full rationale

The paper defines a new distillation target as flow-map transitions (z_t to z_r) over arbitrary intervals and introduces Flow Map Backward Simulation as a decomposition of Euler rollouts into shortcuts. Neither step reduces by construction to fitted inputs, self-referential definitions, or load-bearing self-citations; the central claim of any-step scaling is supported by independent experimental validation across model scales and architectures rather than being forced by the method's own equations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the ability to decompose the full ODE trajectory into learnable flow-map transitions without loss of scaling properties, and on the effectiveness of the proposed backward simulation for on-policy training.

axioms (1)

domain assumption The probability-flow ODE trajectory can be decomposed into shortcut flow-map transitions for distillation
Invoked as the basis for shifting from endpoint consistency to arbitrary-interval flow maps

invented entities (1)

Flow Map Backward Simulation no independent evidence
purpose: Decompose full Euler rollout into shortcut transitions for efficient on-policy distillation
New procedure introduced to enable the any-step training

pith-pipeline@v0.9.0 · 5552 in / 1259 out tokens · 50911 ms · 2026-05-14T19:59:06.628346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 16 internal anchors

[1]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Cosmos World Foundation Model Platform for Physical AI

Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

LTX-Video: Realtime Video Latent Diffusion

Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. LTX-Video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023

work page 2023
[7]

Improved techniques for training consistency models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. InICLR, 2024

work page 2024
[8]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Large scale diffusion distillation via score-regularized continuous-time consistency. arXiv preprint arXiv:2510.08431, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Self forcing: Bridging the train-test gap in autoregressive video diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. InNeurIPS, 2025

work page 2025
[11]

Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

work page arXiv 2025
[12]

Flow map matching.arXiv preprint arXiv:2406.07507,

Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models.arXiv preprint arXiv:2406.07507, 2024

work page arXiv 2024
[13]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Consistency models made easy

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J Zico Kolter. Consistency models made easy. arXiv preprint arXiv:2406.14548, 2024

work page arXiv 2024
[15]

Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

Fu-Yun Wang, Zhaoyang Huang, Alexander Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, et al. Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

work page 2024
[16]

Hyper-SD: Trajectory segmented consistency model for efficient image synthesis.NeurIPS, 2024

Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, and Xuefeng Xiao. Hyper-SD: Trajectory segmented consistency model for efficient image synthesis.NeurIPS, 2024. 17 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

work page 2024
[17]

Truncated consistency models.arXiv preprint arXiv:2410.14895, 2024

Sangyun Lee, Yilun Xu, Tomas Geffner, Giulia Fanti, Karsten Kreis, Arash Vahdat, and Weili Nie. Truncated consistency models.arXiv preprint arXiv:2410.14895, 2024

work page arXiv 2024
[18]

Consistency trajectory models: Learning probability flow ODE trajectory of diffusion

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. InICLR, 2024

work page 2024
[19]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[20]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025

Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, and Lei Bai. Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025

work page arXiv 2025
[22]

Soflow: Solution flow models for one-step generative modeling.arXiv preprint arXiv:2512.15657, 2025

Tianze Luo, Haotian Yuan, and Zhuang Liu. Soflow: Solution flow models for one-step generative modeling.arXiv preprint arXiv:2512.15657, 2025

work page arXiv 2025
[23]

SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025

Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, et al. Splitmeanflow: Interval splitting consistency in few-step generative modeling.arXiv preprint arXiv:2507.16884, 2025

work page arXiv 2025
[24]

Improved distribution matching distillation for fast image synthesis.NeurIPS, 2024

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.NeurIPS, 2024

work page 2024
[25]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InCVPR, 2024

work page 2024
[26]

From slow bidirectional to fast autoregressive video diffusion models

Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InCVPR, 2025

work page 2025
[27]

Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, and Arash Vahdat. Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

work page arXiv 2026
[28]

LongLive: Real-time Interactive Long Video Generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Ying- cong Chen, Yao Lu, et al. Longlive: Real-time interactive long video generation.arXiv preprint arXiv:2509.22622, 2025

work page internal anchor Pith review arXiv 2025
[29]

Pyramidal flow matching for efficient video generative modeling

Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong MU, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InICLR, 2025

work page 2025
[30]

Frame context packing and drift prevention in next-frame-prediction video diffusion models

Lvmin Zhang, Shengqu Cai, Muyang Li, Gordon Wetzstein, and Maneesh Agrawala. Frame context packing and drift prevention in next-frame-prediction video diffusion models. InNeurIPS, 2025

work page 2025
[31]

History-guided video diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, and Vincent Sitzmann. History-guided video diffusion. InICML, 2024

work page 2024
[32]

arXiv preprint arXiv:2512.15702 (2025)

Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, and Dahua Lin. End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

work page arXiv 2025
[33]

On-policy distillation.Thinking Machines Lab: Connectionism, 2025

Kevin Lu and Thinking Machines Lab. On-policy distillation.Thinking Machines Lab: Connectionism, 2025. https://thinkingmachines.ai/blog/on-policy-distillation

work page 2025
[34]

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. Self-distilled reasoner: On-policy self-distillation for large language models.arXiv preprint arXiv:2601.18734, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[35]

On-Policy Context Distillation for Language Models

Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, and Furu Wei. On-policy context distillation for language models. arXiv preprint arXiv:2602.12275, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

Diffusion adversarial post-training for one-step video generation

Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, and Lu Jiang. Diffusion adversarial post-training for one-step video generation. InICML, 2025

work page 2025
[37]

Autoregressive adversarial post-training for real-time interactive video generation

Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, and Lu Jiang. Autoregressive adversarial post-training for real-time interactive video generation. InNeurIPS, 2025

work page 2025
[38]

Long-context autoregressive video modeling with next-frame prediction

Yuchao Gu, Weijia Mao, and Mike Zheng Shou. Long-context autoregressive video modeling with next-frame prediction.arXiv preprint arXiv:2503.19325, 2025

work page arXiv 2025
[39]

MAGI-1: Autoregressive Video Generation at Scale

Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, WQ Zhang, Weifeng Luo, et al. MAGI-1: Autoregressive video generation at scale.arXiv preprint arXiv:2505.13211, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

SkyReels-V2: Infinite-length Film Generative Model

Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, et al. Skyreels-v2: Infinite-length film generative model.arXiv preprint arXiv:2504.13074, 2025. 18 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

work page internal anchor Pith review arXiv 2025
[41]

Lightx2v: Light video generation inference framework.https://github.com/ModelTC/ lightx2v, 2025

LightX2V Contributors. Lightx2v: Light video generation inference framework.https://github.com/ModelTC/ lightx2v, 2025

work page 2025
[42]

Causalwan2.2-i2v-a14b-preview-diffusers

FastVideo Team. Causalwan2.2-i2v-a14b-preview-diffusers. https://huggingface.co/FastVideo/CausalWan2. 2-I2V-A14B-Preview-Diffusers, 2025

work page 2025
[43]

Krea realtime 14b: Real-time video generation, 2025

Erwann Millon. Krea realtime 14b: Real-time video generation, 2025

work page 2025
[44]

Step-video-ti2v technical report: A state-of-the-art text-driven image-to-video generation model, 2025

Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong, Jiaxin He, Jianchang Wu, Jianlong Yuan, Jie Wu, Jiashuai Liu, Junjin...

work page 2025
[45]

Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022

Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022

work page 2022
[46]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

work page 2022
[48]

VBench: Comprehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

work page 2024
[49]

Vbench++: Comprehensive and versatile benchmark suite for video generative models

Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, et al. Vbench++: Comprehensive and versatile benchmark suite for video generative models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 19

work page 2025