CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation

Changyuan Wang; Jiajun Zha; Jiaya Jia; Jingyi Zhang; Kang Zhao; Kun Cheng; Shiyao Li; Wenbo Li; Wenhu Zhang; Yuechen Zhang

arxiv: 2606.21982 · v1 · pith:25D2JBSBnew · submitted 2026-06-20 · 💻 cs.CV

CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation

Wenhu Zhang , Kun Cheng , Changyuan Wang , Shiyao Li , Yuechen Zhang , Wenbo Li , Jiajun Zha , Jingyi Zhang

show 2 more authors

Kang Zhao Jiaya Jia

This is my paper

Pith reviewed 2026-06-26 12:50 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-step distillationvideo diffusiondistribution matchingcopula regularizationrelational matchingfast video generationDMDCoDMD

0 comments

The pith

Matching pairwise relation matrices from existing scores prevents DMD collapse in few-step video distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard distribution matching distillation matches only within each sample using coordinate-wise gradients, leaving relations across batch elements and temporal frames unregulated and causing layout instability, oversaturation, and broken motion in the few-step regime. CoDMD adds a lightweight supplementary objective that reuses score estimates already computed by the teacher and student to build and match pairwise relation matrices, thereby enforcing the underlying copula without new networks, data, or trajectories. On the Wan-2.1-T2V series at 1.3B and 14B scales, this allows 50-step teachers to be distilled into 4-step students that reach higher VBench scores than prior trajectory-based and distribution-based baselines while delivering roughly 25 times faster inference.

Core claim

The paper claims that DMD's reverse-KL objective combined with its intra-sample nature causes the model to collapse into local optima under low NFE budgets because the copula across samples and frames remains unconstrained; the proposed CoDMD regularizer corrects this by constructing pairwise relation matrices from frozen-teacher and online-student scores and matching their distributions through an additional term that requires no extra parameters or sampling.

What carries the argument

The copula-aware relational regularizer that reuses existing score estimates to construct and match pairwise relation matrices across samples and temporal frames.

If this is right

50-step teachers can be distilled to 4-step students with an approximate 25 times inference speedup on 1.3B and 14B video models.
The resulting 4-step models attain VBench scores of 84.46 and 84.87, exceeding both trajectory-based and standard distribution-based distillation baselines.
Layout stability, color saturation, and motion dynamics improve because the joint distribution across frames and batch elements is now explicitly regularized.
The added regularizer introduces no extra networks, datasets, or sampling trajectories during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relation-matrix construction could be tested on image or audio diffusion models to check whether copula regulation improves few-step quality outside video.
If the copula term proves decisive, distillation pipelines might shift from purely marginal matching toward explicit joint-distribution objectives as a default design choice.
Future work could examine whether the benefit scales to even fewer steps, such as 2-step or 1-step regimes, where relational collapse would be expected to be more severe.

Load-bearing premise

The performance drop in few-step DMD occurs specifically because relational constraints across elements are absent, and explicitly matching pairwise relation matrices will restore quality without creating new failure modes.

What would settle it

An ablation that disables the pairwise relation-matrix matching term while keeping all other components identical, then measures whether VBench scores and artifact rates revert to those of standard DMD on the same 4-step video models.

Figures

Figures reproduced from arXiv: 2606.21982 by Changyuan Wang, Jiajun Zha, Jiaya Jia, Jingyi Zhang, Kang Zhao, Kun Cheng, Shiyao Li, Wenbo Li, Wenhu Zhang, Yuechen Zhang.

**Figure 2.** Figure 2: Left: Training pipeline of CoDMD. Beyond standard LDMD, we reuse the real and fake scores to build relational matrices at batch and frame granularities, yielding two supplementary copula losses L batch rel and L frame rel . Right: DMD aligns fake elements to preal but scrambles their pairwise geometry (crossed edges). CoDMD additionally preserves the real structure among these elements. driven by the dynam… view at source ↗

**Figure 3.** Figure 3: Left: Our CoDMD produces more faithful motion, stronger prompt alignment, and more realistic color than previous methods (i.e., DMD and rCM). Owing to the copula-aware relational objective, our output also matches the teacher’s visual structure more closely, e.g., (1)screw turns inward; (2) dragon exhales fire. Right: VBench metrics across training iterations. CoDMD sustains high scores throughout training… view at source ↗

**Figure 4.** Figure 4: User study results. We report the win rate of [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Anonymized interface used in the blind pairwise user study. The interface displays the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

Few-step distillation for video diffusion models has attracted significant attention, driven by the urgent demand for efficient deployment in real-world scenarios. However, Distribution Matching Distillation (DMD), a leading paradigm, tends to degrade under limited NFE budgets, manifesting in video generation as layout instability, oversaturation, and broken motion dynamics. We trace this failure to a structural limitation: standard DMD is an intra-sample distribution-matching objective with coordinate-wise gradients, and thus imposes no explicit constraint on the relational geometry across batch elements or temporal frames, leaving the underlying copula largely unregulated. Combined with the mode-seeking tendency of its reverse-KL objective, this absence of relational guidance makes DMD prone to collapsing into local optima in the few-step regime. Motivated by this insight, we propose Copula-aware DMD (CoDMD), a lightweight relational regularizer that reuses score estimates already produced by the frozen teacher and the online fake model to construct pairwise relation matrices across samples and frames. These are matched through a supplementary distributional objective that requires no additional networks, datasets, or sampling trajectories. On the Wan-2.1-T2V model series at 1.3B & 14B scales, CoDMD distills 50-step teachers into 4-step students, achieving an approximate 25$\times$ speed-up while attaining VBench scores of 84.46 & 84.87, outperforming prior trajectory-based (rCM 82.81 & 84.05) and distribution-based (DMD 83.38 & 83.81) methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoDMD adds a copula regularizer to DMD that matches pairwise relations from existing scores to stabilize few-step video distillation, with reported VBench gains on Wan-2.1 models that still need full experimental backing.

read the letter

The main thing here is that the authors extend DMD by adding a copula-aware regularizer that builds pairwise relation matrices from teacher and student score estimates and matches them across samples and frames. This targets the relational geometry that standard DMD leaves unconstrained, which they link to layout instability and motion issues in low-NFE video generation.

They do a clean job keeping the addition cheap: it reuses quantities already computed during distillation and adds only a supplementary distributional term with no new networks or trajectories. The tests on both 1.3B and 14B Wan-2.1-T2V models, distilling 50-step teachers to 4-step students, show higher VBench scores than the rCM and DMD baselines while delivering the claimed 25x speedup. That is a practical data point for anyone working on video diffusion acceleration.

The soft spot is the level of experimental detail. The abstract states the numbers and the mechanistic story, but without seeing ablations that isolate the copula term, training protocols, or checks for side effects in other settings, it is hard to judge how much of the gain comes from the new regularizer versus other implementation choices. The premise about DMD's mode-seeking behavior is plausible, yet direct evidence tying the relational fix to the observed improvements would strengthen it.

This is for people already working on distillation methods for video or image diffusion who want a lightweight way to add relational constraints. A reader focused on efficient deployment would get value from the concrete numbers and the reuse of existing scores.

The paper shows coherent thinking on a real limitation and offers a targeted, low-overhead fix. It deserves peer review so the details can be checked and the results can be stress-tested by others.

Referee Report

2 major / 0 minor

Summary. The paper claims that standard DMD for video diffusion degrades in the few-step regime due to its intra-sample, coordinate-wise nature that leaves relational (copula) structure across batch elements and temporal frames unregulated, combined with reverse-KL mode-seeking. It introduces CoDMD, a lightweight supplementary distributional objective that reuses already-computed teacher and fake scores to build and match pairwise relation matrices, requiring no extra networks or trajectories. On the Wan-2.1-T2V series (1.3B and 14B), 50-step teachers are distilled to 4-step students yielding VBench scores of 84.46 and 84.87, outperforming rCM (82.81/84.05) and DMD (83.38/83.81) while delivering ~25× speedup.

Significance. If the reported gains are reproducible and causally linked to the copula regularizer, the method would be a low-overhead, parameter-free improvement to DMD that directly targets relational geometry in video generation; this is practically significant for real-time deployment of large video models.

major comments (2)

[Abstract] Abstract: the central performance claims (VBench scores, outperformance over DMD/rCM, 25× speedup) are presented without any description of experimental protocol, training details, dataset, number of runs, or statistical significance; this is load-bearing because the superiority assertion cannot be evaluated from the given information.
[Abstract] Abstract: the mechanistic premise that DMD degradation stems specifically from absent relational constraints (and that matching pairwise matrices from existing scores will correct it without new failure modes) is asserted but not supported by ablations or controlled experiments isolating the copula term; this is load-bearing for the motivation and novelty of CoDMD.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the motivation for CoDMD. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (VBench scores, outperformance over DMD/rCM, 25× speedup) are presented without any description of experimental protocol, training details, dataset, number of runs, or statistical significance; this is load-bearing because the superiority assertion cannot be evaluated from the given information.

Authors: We agree that the abstract would benefit from additional context on the experimental setup to allow readers to better evaluate the claims. In the revised manuscript we will insert a concise clause noting the models (Wan-2.1-T2V 1.3B/14B), the distillation dataset, the evaluation benchmark (VBench), and that results are reported from the primary experimental runs with full protocol and training details provided in Section 4. Given typical abstract length limits we will keep the addition brief while ensuring the superiority statement is better grounded. revision: yes
Referee: [Abstract] Abstract: the mechanistic premise that DMD degradation stems specifically from absent relational constraints (and that matching pairwise matrices from existing scores will correct it without new failure modes) is asserted but not supported by ablations or controlled experiments isolating the copula term; this is load-bearing for the motivation and novelty of CoDMD.

Authors: The manuscript motivates the copula regularizer via analysis of DMD’s coordinate-wise objective and reverse-KL behavior (Section 3) and demonstrates empirical gains over DMD on the same teacher-student pairs. However, we did not include a controlled ablation that isolates the copula-matching term while holding all other loss components fixed. We acknowledge this weakens the causal claim. In the revision we will add an ablation study that removes only the copula regularizer (keeping the DMD objective and training protocol identical) and report the resulting VBench degradation to directly support the premise. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper traces DMD degradation to absent relational constraints and introduces CoDMD as a supplementary distributional objective that reuses already-computed teacher/fake scores to match pairwise relation matrices. This construction adds an independent regularizer without reducing the claimed VBench gains or speed-up to a fitted parameter renamed as prediction, a self-definitional equation, or a load-bearing self-citation chain. No uniqueness theorems, ansatzes smuggled via prior author work, or renaming of known results are invoked; the derivation chain from premise to method remains externally falsifiable via reported metrics on Wan-2.1 and comparisons to rCM/DMD baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method is described as reusing existing scores without additional networks.

pith-pipeline@v0.9.1-grok · 5846 in / 1168 out tokens · 28503 ms · 2026-06-26T12:50:19.023227+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 22 canonical work pages · 15 internal anchors

[1]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

2006
[2]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser et al. Scaling rectified flow transformers for high-resolution image synthesis. InICML, 2024

2024
[4]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

2020
[5]

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models.arXiv preprint arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[6]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion, 2025. URLhttps://arxiv.org/abs/2506.08009

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

VBench: Comprehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024
[8]

Distribution matching distillation meets reinforcement learning

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, et al. Distribution matching distillation meets reinforcement learning. arXiv preprint arXiv:2511.13649, 2025

work page arXiv 2025
[9]

Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

2022
[10]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Flow matching for generative modeling

Yaron Lipman et al. Flow matching for generative modeling. InICLR, 2023

2023
[12]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Instaflow: One step is enough for high- quality diffusion-based text-to-image generation

Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high- quality diffusion-based text-to-image generation. InThe Twelfth International Conference on Learning Representations, 2023

2023
[14]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

2022
[16]

Ma, Xiaohua Xie, and Jian-Huang Lai

Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, and Jian-Huang Lai. Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis, 2025. URLhttps://arxiv.org/abs/2507.18569

work page arXiv 2025
[17]

Dual-expert consistency model for efficient and high-quality video generation

Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K Wong, Yu Qiao, and Ziwei Liu. Dual-expert consistency model for efficient and high-quality video generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14983–14993, 2025

2025
[18]

Springer, 2006

Roger B Nelsen.An introduction to copulas. Springer, 2006

2006
[19]

Relational knowledge distillation

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. Relational knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3967–3976, 2019

2019
[20]

Correlation congruence for knowledge distillation

Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. Correlation congruence for knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 5007–5016, 2019. 10

2019
[21]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

2022
[22]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, et al. Seedance 2.0: Advancing video generation for world complexity. arXiv preprint arXiv:2604.14148, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

Fonctions de répartition à n dimensions et leurs marges

M Sklar. Fonctions de répartition à n dimensions et leurs marges. InAnnales de l’ISUP, volume 8, pages 229–231, 1959

1959
[25]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations, 2021

2021
[26]

Improved Techniques for Training Consistency Models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models.arXiv preprint arXiv:2310.14189, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[27]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021

2021
[28]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023

2023
[29]

Learning to compare: Relation network for few-shot learning

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018

2018
[30]

Contrastive representation distillation.arXiv preprint arXiv:1910.10699, 2019

Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation.arXiv preprint arXiv:1910.10699, 2019

work page arXiv 1910
[31]

Similarity-preserving knowledge distillation

Frederick Tung and Greg Mori. Similarity-preserving knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 1365–1374, 2019

2019
[32]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

arXiv preprint arXiv:2305.16213 , year=

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.arXiv preprint arXiv:2305.16213, 2023

work page arXiv 2023
[34]

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Tianhe Wu, Ruibin Li, Lei Zhang, and Kede Ma. Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[35]

arXiv preprint arXiv:2502.15681 , year=

Yilun Xu, Weili Nie, and Arash Vahdat. One-step diffusion models withf-divergence distribution matching. arXiv preprint arXiv:2502.15681, 2025

work page arXiv 2025
[36]

LongLive: Real-time Interactive Long Video Generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, et al. Longlive: Real-time interactive long video generation.arXiv preprint arXiv:2509.22622, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

2024
[39]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

2024
[40]

From slow bidirectional to fast autoregressive video diffusion models

Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22963–22974, 2025. 11

2025
[41]

Adaptive video distillation: Mitigating oversaturation and temporal collapse in few-step generation.arXiv preprint arXiv:2603.21864, 2026

Yuyang You, Yongzhi Li, Jiahui Li, Yadong Mu, Quan Chen, and Peng Jiang. Adaptive video distillation: Mitigating oversaturation and temporal collapse in few-step generation.arXiv preprint arXiv:2603.21864, 2026

work page arXiv 2026
[42]

Fast sampling of dif- fusion models with exponential integrator.arXiv preprint arXiv:2204.13902, 2022

Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator.arXiv preprint arXiv:2204.13902, 2022

work page arXiv 2022
[43]

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, and Ziwei Liu. VBench-2.0: Advancing video generation benchmark suite for intrinsic faithfulness.arXiv preprint arXiv:2503.21755, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Large scale diffusion distillation via score-regularized continuous-time consistency.arXiv preprint arXiv:2510.08431, 2025. 12 Appendixfor Copula-aware Distribution Matching Distillation §AMore Visual Comparisons §BTraining ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Springer, 2006

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

2006

[2] [2]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser et al. Scaling rectified flow transformers for high-resolution image synthesis. InICML, 2024

2024

[4] [4]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

2020

[5] [5]

Imagen Video: High Definition Video Generation with Diffusion Models

Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models.arXiv preprint arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[6] [6]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion, 2025. URLhttps://arxiv.org/abs/2506.08009

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

VBench: Comprehensive benchmark suite for video generative models

Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024

[8] [8]

Distribution matching distillation meets reinforcement learning

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, et al. Distribution matching distillation meets reinforcement learning. arXiv preprint arXiv:2511.13649, 2025

work page arXiv 2025

[9] [9]

Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022

2022

[10] [10]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Flow matching for generative modeling

Yaron Lipman et al. Flow matching for generative modeling. InICLR, 2023

2023

[12] [12]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[13] [13]

Instaflow: One step is enough for high- quality diffusion-based text-to-image generation

Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high- quality diffusion-based text-to-image generation. InThe Twelfth International Conference on Learning Representations, 2023

2023

[14] [14]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in neural information processing systems, 35:5775–5787, 2022

2022

[16] [16]

Ma, Xiaohua Xie, and Jian-Huang Lai

Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, and Jian-Huang Lai. Adversarial distribution matching for diffusion distillation towards efficient image and video synthesis, 2025. URLhttps://arxiv.org/abs/2507.18569

work page arXiv 2025

[17] [17]

Dual-expert consistency model for efficient and high-quality video generation

Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K Wong, Yu Qiao, and Ziwei Liu. Dual-expert consistency model for efficient and high-quality video generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14983–14993, 2025

2025

[18] [18]

Springer, 2006

Roger B Nelsen.An introduction to copulas. Springer, 2006

2006

[19] [19]

Relational knowledge distillation

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. Relational knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3967–3976, 2019

2019

[20] [20]

Correlation congruence for knowledge distillation

Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. Correlation congruence for knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 5007–5016, 2019. 10

2019

[21] [21]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

2022

[22] [22]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[23] [23]

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, et al. Seedance 2.0: Advancing video generation for world complexity. arXiv preprint arXiv:2604.14148, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[24] [24]

Fonctions de répartition à n dimensions et leurs marges

M Sklar. Fonctions de répartition à n dimensions et leurs marges. InAnnales de l’ISUP, volume 8, pages 229–231, 1959

1959

[25] [25]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations, 2021

2021

[26] [26]

Improved Techniques for Training Consistency Models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models.arXiv preprint arXiv:2310.14189, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[27] [27]

Score-based generative modeling through stochastic differential equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021

2021

[28] [28]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. 2023

2023

[29] [29]

Learning to compare: Relation network for few-shot learning

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018

2018

[30] [30]

Contrastive representation distillation.arXiv preprint arXiv:1910.10699, 2019

Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation.arXiv preprint arXiv:1910.10699, 2019

work page arXiv 1910

[31] [31]

Similarity-preserving knowledge distillation

Frederick Tung and Greg Mori. Similarity-preserving knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 1365–1374, 2019

2019

[32] [32]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

arXiv preprint arXiv:2305.16213 , year=

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.arXiv preprint arXiv:2305.16213, 2023

work page arXiv 2023

[34] [34]

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Tianhe Wu, Ruibin Li, Lei Zhang, and Kede Ma. Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[35] [35]

arXiv preprint arXiv:2502.15681 , year=

Yilun Xu, Weili Nie, and Arash Vahdat. One-step diffusion models withf-divergence distribution matching. arXiv preprint arXiv:2502.15681, 2025

work page arXiv 2025

[36] [36]

LongLive: Real-time Interactive Long Video Generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, et al. Longlive: Real-time interactive long video generation.arXiv preprint arXiv:2509.22622, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. Cogvideox: Text-to-video diffusion models with an expert transformer.arXiv preprint arXiv:2408.06072, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and Bill Freeman. Improved distribution matching distillation for fast image synthesis.Advances in neural information processing systems, 37:47455–47487, 2024

2024

[39] [39]

One-step diffusion with distribution matching distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613–6623, 2024

2024

[40] [40]

From slow bidirectional to fast autoregressive video diffusion models

Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22963–22974, 2025. 11

2025

[41] [41]

Adaptive video distillation: Mitigating oversaturation and temporal collapse in few-step generation.arXiv preprint arXiv:2603.21864, 2026

Yuyang You, Yongzhi Li, Jiahui Li, Yadong Mu, Quan Chen, and Peng Jiang. Adaptive video distillation: Mitigating oversaturation and temporal collapse in few-step generation.arXiv preprint arXiv:2603.21864, 2026

work page arXiv 2026

[42] [42]

Fast sampling of dif- fusion models with exponential integrator.arXiv preprint arXiv:2204.13902, 2022

Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator.arXiv preprint arXiv:2204.13902, 2022

work page arXiv 2022

[43] [43]

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, and Ziwei Liu. VBench-2.0: Advancing video generation benchmark suite for intrinsic faithfulness.arXiv preprint arXiv:2503.21755, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Large scale diffusion distillation via score-regularized continuous-time consistency.arXiv preprint arXiv:2510.08431, 2025. 12 Appendixfor Copula-aware Distribution Matching Distillation §AMore Visual Comparisons §BTraining ...

work page internal anchor Pith review Pith/arXiv arXiv 2025