pith. machine review for the scientific record. sign in

arxiv: 2605.13724 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.AI

Recognition: unknown

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords video diffusionflow map distillationfew-step generationany-step samplingconsistency distillationODE trajectoryon-policy distillation
0
0 comments X

The pith

AnyFlow distills video diffusion models to match few-step consistency performance while preserving scaling as sampling steps increase.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that consistency distillation weakens the test-time scaling of probability-flow ODE sampling by replacing the original trajectory with a fixed consistency path. AnyFlow instead learns flow-map transitions from z_t to z_r over arbitrary time intervals and uses backward simulation to decompose full Euler rollouts into efficient shortcut steps. This on-policy distillation reduces both discretization error in few-step sampling and exposure bias in causal generation. Experiments on bidirectional and causal video architectures from 1.3B to 14B parameters show the resulting model matches or exceeds consistency baselines at low step counts yet continues to improve when more steps are allocated at inference time.

Core claim

AnyFlow optimizes the full ODE sampling trajectory by shifting the distillation target from endpoint consistency mapping (z_t to z_0) to flow-map transition learning (z_t to z_r) over arbitrary intervals, with Flow Map Backward Simulation decomposing Euler rollouts into shortcut transitions that enable efficient on-policy training.

What carries the argument

Flow-map transition learning over arbitrary time intervals combined with Flow Map Backward Simulation for decomposing full rollouts into on-policy shortcuts.

If this is right

  • Performance matches or surpasses consistency-distilled models in the few-step regime across bidirectional and causal video generators.
  • Quality continues to improve when additional sampling steps are allocated at test time.
  • The approach applies to models ranging from 1.3B to 14B parameters without architecture-specific retraining.
  • On-policy backward simulation reduces both discretization error and exposure bias during distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same flow-map distillation could be applied to image or audio diffusion to restore scaling behavior lost in consistency training.
  • Adaptive step allocation during inference becomes feasible because the model remains well-behaved at every step count.
  • Flow maps may serve as a drop-in replacement for consistency targets in other generative settings where trajectory fidelity matters.

Load-bearing premise

Learning flow-map transitions over arbitrary intervals via backward simulation will not introduce new discretization or exposure biases that undermine the original ODE scaling behavior.

What would settle it

A plot of video generation quality versus number of sampling steps (from 1 to 50) in which AnyFlow's performance stops improving or drops after a small step count, matching the degradation pattern of consistency-distilled models.

read the original abstract

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based on flow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the full ODE sampling trajectory. To this end, we shift the distillation target from endpoint consistency mapping $(z_{t}\rightarrow z_{0})$ to flow-map transition learning $(z_{t}\rightarrow z_{r})$ over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a full Euler rollout into shortcut flow-map transitions, enabling efficient on-policy distillation that reduces test-time errors (i.e., discretization error in few-step sampling and exposure bias in causal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that consistency distillation for video diffusion models degrades performance when more sampling steps are used at test time because it replaces the original probability-flow ODE trajectory. AnyFlow addresses this by distilling flow-map transitions (z_t → z_r) over arbitrary intervals rather than fixed endpoint consistency mappings, using a proposed Flow Map Backward Simulation that decomposes full Euler rollouts into shortcut transitions for efficient on-policy training. This is said to reduce discretization error in few-step sampling and exposure bias in causal generation, enabling any-step performance that matches or exceeds consistency baselines while continuing to improve with larger step budgets. Experiments span bidirectional and causal architectures at scales from 1.3B to 14B parameters.

Significance. If the central claims hold, the work would be significant for video generation by providing the first any-step distillation framework that preserves ODE scaling behavior. The shift to flow-map targets and on-policy backward simulation could generalize beyond video to other diffusion domains, and the reported scaling across large model sizes offers practical value for flexible inference budgets.

major comments (2)
  1. [Abstract; Section 3 (Flow Map Backward Simulation)] The abstract and method description state that Flow Map Backward Simulation 'reduces test-time errors' via decomposition of Euler rollouts, but no error analysis, bounds on accumulated approximation error, or convergence argument is given showing that the shortcut transitions preserve the original probability-flow ODE trajectory without introducing new discretization or exposure biases that grow with interval length. This directly underpins the any-step scaling claim.
  2. [Section 4 (Experiments)] Experiments claim AnyFlow matches or surpasses consistency counterparts in the few-step regime while scaling with step budgets, yet no ablation isolates the effect of the backward-simulation approximation on ODE fidelity (e.g., comparing full-rollout vs. decomposed trajectories). Without this, it is unclear whether observed gains stem from the flow-map target or from unaccounted biases.
minor comments (2)
  1. [Section 3] Notation for time intervals (t, r) and the precise definition of the flow-map target could be clarified with a small diagram or explicit equation early in the method section to aid readability.
  2. [Section 4] Tables comparing model scales should include standard deviations or multiple seeds for the reported metrics to strengthen the scaling claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, clarifying our approach and outlining planned revisions to strengthen the presentation of the method and experiments.

read point-by-point responses
  1. Referee: [Abstract; Section 3 (Flow Map Backward Simulation)] The abstract and method description state that Flow Map Backward Simulation 'reduces test-time errors' via decomposition of Euler rollouts, but no error analysis, bounds on accumulated approximation error, or convergence argument is given showing that the shortcut transitions preserve the original probability-flow ODE trajectory without introducing new discretization or exposure biases that grow with interval length. This directly underpins the any-step scaling claim.

    Authors: We appreciate the referee highlighting the absence of formal error analysis. The manuscript presents Flow Map Backward Simulation as a practical decomposition for efficient on-policy training, relying on empirical results to demonstrate reduced test-time errors and improved scaling. In the revised version, we will expand Section 3 with a qualitative discussion of the approximation, including how the on-policy backward simulation mitigates exposure bias by aligning training and inference trajectories, supported by additional visualizations of per-step trajectory deviations. We will also include a brief analysis of error accumulation for varying interval lengths. However, deriving rigorous bounds or a full convergence proof for arbitrary intervals is a non-trivial theoretical extension that lies beyond the current scope; we will explicitly note this as a limitation while emphasizing the empirical safeguards and scaling behavior observed across model sizes. revision: partial

  2. Referee: [Section 4 (Experiments)] Experiments claim AnyFlow matches or surpasses consistency counterparts in the few-step regime while scaling with step budgets, yet no ablation isolates the effect of the backward-simulation approximation on ODE fidelity (e.g., comparing full-rollout vs. decomposed trajectories). Without this, it is unclear whether observed gains stem from the flow-map target or from unaccounted biases.

    Authors: We agree that an ablation isolating the backward-simulation approximation is necessary to strengthen the experimental claims. In the revised manuscript, we will add this ablation to Section 4, comparing AnyFlow variants trained with full Euler rollouts against the decomposed shortcut transitions. The comparison will report ODE fidelity metrics such as average trajectory deviation from the reference probability-flow path, as well as generation quality (FVD, CLIP score) across different step budgets. This will help attribute performance differences more clearly to the flow-map targets versus the on-policy decomposition. revision: yes

Circularity Check

0 steps flagged

No significant circularity in AnyFlow derivation chain

full rationale

The paper defines a new distillation target as flow-map transitions (z_t to z_r) over arbitrary intervals and introduces Flow Map Backward Simulation as a decomposition of Euler rollouts into shortcuts. Neither step reduces by construction to fitted inputs, self-referential definitions, or load-bearing self-citations; the central claim of any-step scaling is supported by independent experimental validation across model scales and architectures rather than being forced by the method's own equations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the ability to decompose the full ODE trajectory into learnable flow-map transitions without loss of scaling properties, and on the effectiveness of the proposed backward simulation for on-policy training.

axioms (1)
  • domain assumption The probability-flow ODE trajectory can be decomposed into shortcut flow-map transitions for distillation
    Invoked as the basis for shifting from endpoint consistency to arbitrary-interval flow maps
invented entities (1)
  • Flow Map Backward Simulation no independent evidence
    purpose: Decompose full Euler rollout into shortcut transitions for efficient on-policy distillation
    New procedure introduced to enable the any-step training

pith-pipeline@v0.9.0 · 5552 in / 1259 out tokens · 50911 ms · 2026-05-14T19:59:06.628346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 16 internal anchors

  1. [1]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

  2. [2]

    Cosmos World Foundation Model Platform for Physical AI

    Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

  3. [3]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

  4. [4]

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024

  5. [5]

    LTX-Video: Realtime Video Latent Diffusion

    Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. LTX-Video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024

  6. [6]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023

  7. [7]

    Improved techniques for training consistency models

    Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. InICLR, 2024

  8. [8]

    Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

    Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024

  9. [9]

    Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

    Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Large scale diffusion distillation via score-regularized continuous-time consistency. arXiv preprint arXiv:2510.08431, 2025

  10. [10]

    Self forcing: Bridging the train-test gap in autoregressive video diffusion

    Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. InNeurIPS, 2025

  11. [11]

    Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

    Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

  12. [12]

    Flow map matching.arXiv preprint arXiv:2406.07507,

    Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models.arXiv preprint arXiv:2406.07507, 2024

  13. [13]

    Mean Flows for One-step Generative Modeling

    Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025

  14. [14]

    Consistency models made easy

    Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J Zico Kolter. Consistency models made easy. arXiv preprint arXiv:2406.14548, 2024

  15. [15]

    Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

    Fu-Yun Wang, Zhaoyang Huang, Alexander Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, et al. Phased consistency models.Advances in neural information processing systems, 37:83951–84009, 2024

  16. [16]

    Hyper-SD: Trajectory segmented consistency model for efficient image synthesis.NeurIPS, 2024

    Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, and Xuefeng Xiao. Hyper-SD: Trajectory segmented consistency model for efficient image synthesis.NeurIPS, 2024. 17 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

  17. [17]

    Truncated consistency models.arXiv preprint arXiv:2410.14895, 2024

    Sangyun Lee, Yilun Xu, Tomas Geffner, Giulia Fanti, Karsten Kreis, Arash Vahdat, and Weili Nie. Truncated consistency models.arXiv preprint arXiv:2410.14895, 2024

  18. [18]

    Consistency trajectory models: Learning probability flow ODE trajectory of diffusion

    Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. InICLR, 2024

  19. [19]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

  20. [20]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

  21. [21]

    Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025

    Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, and Lei Bai. Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025

  22. [22]

    Soflow: Solution flow models for one-step generative modeling.arXiv preprint arXiv:2512.15657, 2025

    Tianze Luo, Haotian Yuan, and Zhuang Liu. Soflow: Solution flow models for one-step generative modeling.arXiv preprint arXiv:2512.15657, 2025

  23. [23]

    SplitMeanFlow: Interval splitting consistency in few-step generative modeling, 2025

    Yi Guo, Wei Wang, Zhihang Yuan, Rong Cao, Kuan Chen, Zhengyang Chen, Yuanyuan Huo, Yang Zhang, Yuping Wang, Shouda Liu, et al. Splitmeanflow: Interval splitting consistency in few-step generative modeling.arXiv preprint arXiv:2507.16884, 2025

  24. [24]

    Improved distribution matching distillation for fast image synthesis.NeurIPS, 2024

    Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.NeurIPS, 2024

  25. [25]

    One-step diffusion with distribution matching distillation

    Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InCVPR, 2024

  26. [26]

    From slow bidirectional to fast autoregressive video diffusion models

    Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InCVPR, 2025

  27. [27]

    Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

    Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, and Arash Vahdat. Transition matching distillation for fast video generation.arXiv preprint arXiv:2601.09881, 2026

  28. [28]

    LongLive: Real-time Interactive Long Video Generation

    Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Ying- cong Chen, Yao Lu, et al. Longlive: Real-time interactive long video generation.arXiv preprint arXiv:2509.22622, 2025

  29. [29]

    Pyramidal flow matching for efficient video generative modeling

    Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong MU, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InICLR, 2025

  30. [30]

    Frame context packing and drift prevention in next-frame-prediction video diffusion models

    Lvmin Zhang, Shengqu Cai, Muyang Li, Gordon Wetzstein, and Maneesh Agrawala. Frame context packing and drift prevention in next-frame-prediction video diffusion models. InNeurIPS, 2025

  31. [31]

    History-guided video diffusion

    Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, and Vincent Sitzmann. History-guided video diffusion. InICML, 2024

  32. [32]

    arXiv preprint arXiv:2512.15702 (2025)

    Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, and Dahua Lin. End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

  33. [33]

    On-policy distillation.Thinking Machines Lab: Connectionism, 2025

    Kevin Lu and Thinking Machines Lab. On-policy distillation.Thinking Machines Lab: Connectionism, 2025. https://thinkingmachines.ai/blog/on-policy-distillation

  34. [34]

    Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

    Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. Self-distilled reasoner: On-policy self-distillation for large language models.arXiv preprint arXiv:2601.18734, 2026

  35. [35]

    On-Policy Context Distillation for Language Models

    Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, and Furu Wei. On-policy context distillation for language models. arXiv preprint arXiv:2602.12275, 2026

  36. [36]

    Diffusion adversarial post-training for one-step video generation

    Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, and Lu Jiang. Diffusion adversarial post-training for one-step video generation. InICML, 2025

  37. [37]

    Autoregressive adversarial post-training for real-time interactive video generation

    Shanchuan Lin, Ceyuan Yang, Hao He, Jianwen Jiang, Yuxi Ren, Xin Xia, Yang Zhao, Xuefeng Xiao, and Lu Jiang. Autoregressive adversarial post-training for real-time interactive video generation. InNeurIPS, 2025

  38. [38]

    Long-context autoregressive video modeling with next-frame prediction

    Yuchao Gu, Weijia Mao, and Mike Zheng Shou. Long-context autoregressive video modeling with next-frame prediction.arXiv preprint arXiv:2503.19325, 2025

  39. [39]

    MAGI-1: Autoregressive Video Generation at Scale

    Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, WQ Zhang, Weifeng Luo, et al. MAGI-1: Autoregressive video generation at scale.arXiv preprint arXiv:2505.13211, 2025

  40. [40]

    SkyReels-V2: Infinite-length Film Generative Model

    Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, et al. Skyreels-v2: Infinite-length film generative model.arXiv preprint arXiv:2504.13074, 2025. 18 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

  41. [41]

    Lightx2v: Light video generation inference framework.https://github.com/ModelTC/ lightx2v, 2025

    LightX2V Contributors. Lightx2v: Light video generation inference framework.https://github.com/ModelTC/ lightx2v, 2025

  42. [42]

    Causalwan2.2-i2v-a14b-preview-diffusers

    FastVideo Team. Causalwan2.2-i2v-a14b-preview-diffusers. https://huggingface.co/FastVideo/CausalWan2. 2-I2V-A14B-Preview-Diffusers, 2025

  43. [43]

    Krea realtime 14b: Real-time video generation, 2025

    Erwann Millon. Krea realtime 14b: Real-time video generation, 2025

  44. [44]

    Step-video-ti2v technical report: A state-of-the-art text-driven image-to-video generation model, 2025

    Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong, Jiaxin He, Jianchang Wu, Jianlong Yuan, Jie Wu, Jiashuai Liu, Junjin...

  45. [45]

    Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022

    Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models.https://github.com/huggingface/diffusers, 2022

  46. [46]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  47. [47]

    Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

  48. [48]

    VBench: Comprehensive benchmark suite for video generative models

    Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  49. [49]

    Vbench++: Comprehensive and versatile benchmark suite for video generative models

    Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, et al. Vbench++: Comprehensive and versatile benchmark suite for video generative models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 19