pith. sign in

arxiv: 2506.05398 · v2 · submitted 2025-06-03 · 💻 cs.GR

2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching

Pith reviewed 2026-05-19 12:03 UTC · model grok-4.3

classification 💻 cs.GR
keywords diffusion modelsmodel pruningfinetuningJacobian matchingknowledge transfergenerative modelsimage synthesismodel compression
0
0 comments X

The pith

Pruned diffusion models recover much of their output quality by finetuning on a second-order Jacobian matching loss that copies the teacher's sensitivity to small input changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a finetuning method for pruned diffusion models that supplements the usual denoising objective with a new loss term. This term matches the second-order Jacobian of the pruned model to the dense original using random projections. The approach draws from dynamical systems ideas about how models respond to perturbations over time. On image datasets ranging from CIFAR-10 to ImageNet, the method narrows the quality gap to the full model and works for both U-Net and transformer backbones. The framework stays architecture-agnostic and requires no changes to the core diffusion training loop.

Core claim

Finetuning pruned diffusion models with a 2nd-order Jacobian (J^T J) matching loss, computed scalably via random projections and motivated by Finite-Time Lyapunov Exponents, teaches the pruned model to reproduce the dense teacher's sensitivity to small perturbations, thereby restoring generative performance that standard denoising finetuning leaves behind.

What carries the argument

The second-order Jacobian matching loss that transfers perturbation sensitivity from the dense teacher to the pruned student through random projections.

If this is right

  • Pruned models achieve higher visual quality and lower FID scores than those finetuned only with the original denoising objective.
  • The same Jacobian matching procedure works unchanged on both convolutional U-Net and transformer-based diffusion architectures.
  • The performance gap between pruned and dense models shrinks across multiple datasets including CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO.
  • Inference cost drops while retaining most of the original generative capability without altering the sampling procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sensitivity-matching idea could be tested on other compression methods such as quantization or distillation to see whether it generalizes beyond pruning.
  • Direct computation of the full Jacobian without projections might yield further gains if the random-projection approximation is the current accuracy bottleneck.
  • Preserving local dynamical stability after compression may be a broader requirement for maintaining performance in other iterative generative models.

Load-bearing premise

Matching the second-order Jacobian through random projections transfers the exact sensitivity information that pruning removes and that this transfer is what restores generative quality.

What would settle it

Running the same pruning and finetuning pipeline on a new dataset and finding that the Jacobian matching term produces no measurable improvement in FID or sample quality over standard denoising finetuning alone.

Figures

Figures reproduced from arXiv: 2506.05398 by Caleb Zheng, Eli Shlizerman.

Figure 1
Figure 1. Figure 1: Visualization of the effect of different training objectives on score-based generative [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CelebA images sampled from dense models (left), Diff-Pruning (mid) and IGSM models [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: LSUN-Church images sampled from dense models (left), Diff-Pruning (mid) and IGSM [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: LSUN-Bedroom images sampled from dense models (left), Diff-Pruning (mid) and IGSM [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
read the original abstract

Diffusion models achieve remarkable performance across diverse generative tasks in computer vision, but their high computational cost remains a major barrier to deployment. Model pruning offers a promising way to reduce inference cost and enable lightweight models. However, pruning leads to quality drop due to reduced capacity. A key limitation of existing pruning approaches is that pruned models are finetuned using the same objective as the dense model (denoising score matching). Since the dense model is accessible during finetuning, it warrants a more effective approach for knowledge transfer from the dense to the pruned model. Motivated by this, we propose \textbf{2ndMatch} (\textbf{2ndM}), a general-purpose finetuning framework that introduces a \textbf{2nd}-order Jacobian ($J^{\top} J$) \textbf{M}atching loss inspired by Finite-Time Lyapunov Exponents. \textbf{2ndM} teaches the pruned model to mimic the sensitivity of the dense teacher, i.e., how to respond to small perturbations over time, through scalable random projections. The framework is architecture-agnostic and applies to both U-Net- and Transformer-based diffusion models. Experiments on CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO demonstrate that \textbf{2ndM} reduces the performance gap between pruned and dense models, substantially improving output quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces 2ndMatch (2ndM), a finetuning framework for pruned diffusion models that augments the standard denoising objective with a second-order Jacobian (J^T J) matching loss. This loss is motivated by Finite-Time Lyapunov Exponents and implemented via scalable random projections to encourage the pruned student to mimic the dense teacher's local sensitivity to perturbations over time. The approach is architecture-agnostic and is evaluated on U-Net and Transformer-based diffusion models across CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO, with the central claim that it reduces the performance gap relative to the dense model.

Significance. If the core hypothesis holds, the work offers a principled alternative to standard knowledge distillation for pruned diffusion models by transferring dynamical sensitivity information rather than only score-matching behavior. The random-projection estimator for tractability and the extension to both convolutional and attention-based backbones are practical strengths. However, the significance is tempered by the absence of direct evidence linking post-pruning changes in the J^T J spectrum to FID degradation.

major comments (3)
  1. [Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.
  2. [Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).
  3. [Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.
minor comments (2)
  1. [Method] The notation for the Jacobian operator and the precise definition of the matching loss (including how random projections are applied to both forward and backward passes) would benefit from an explicit equation block early in the method section.
  2. [Experiments] Figure captions and axis labels in the qualitative results could more clearly indicate which rows correspond to 2ndM versus baseline pruning methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will incorporate revisions to improve the manuscript's rigor and clarity.

read point-by-point responses
  1. Referee: [Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.

    Authors: We agree that direct empirical evidence would strengthen the motivation. In the revised manuscript, we will add pre- and post-pruning comparisons of the J^T J spectrum (approximated via random projections) and FTLE estimates on a representative small-scale setting (e.g., CIFAR-10 U-Net). These will be presented as a new figure or table correlating spectral changes with FID degradation, demonstrating that pruning alters local sensitivity in ways not fully recovered by standard denoising finetuning alone. revision: yes

  2. Referee: [Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).

    Authors: We acknowledge the need for quantitative validation of the estimator. The revised version will include a dedicated ablation subsection that (1) reports the approximation error (e.g., relative Frobenius norm to exact J^T J on toy or small networks where exact computation is feasible) and (2) varies the projection dimension while measuring downstream FID impact. We will also compare the random-projection version against exact Jacobian matching on the smallest evaluated model to isolate the effect of the estimator on the transferred signal. revision: yes

  3. Referee: [Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.

    Authors: We apologize for the lack of detail in the current draft. In the revision we will expand the Experiments section and add an appendix containing: complete baseline descriptions with citations, error bars and standard deviations from at least three independent runs, statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the reported improvements, and a comprehensive table of all hyperparameters, optimizer settings, random seeds, and implementation choices to enable full reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the 2ndMatch derivation.

full rationale

The paper defines a new finetuning objective (2nd-order Jacobian matching loss J^T J approximated via random projections) that is introduced independently as an addition to the standard denoising score matching loss. This construction does not reduce by definition to any fitted parameter from the pruning stage, nor does it rely on a self-citation chain or uniqueness theorem imported from prior author work to force the choice. The motivation via Finite-Time Lyapunov Exponents is presented as inspiration rather than a load-bearing derivation step that collapses the method onto its inputs. The overall framework remains self-contained against external benchmarks, with performance claims resting on empirical results across datasets rather than tautological equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that sensitivity matching via second-order Jacobians captures the knowledge needed to recover generative quality after pruning; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Second-order Jacobian matching transfers the relevant sensitivity information from dense to pruned diffusion models.
    Invoked as the core of the 2ndM loss, motivated by Finite-Time Lyapunov Exponents.

pith-pipeline@v0.9.0 · 5776 in / 1175 out tokens · 47375 ms · 2026-05-19T12:03:22.422051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 5 internal anchors

  1. [1]

    Approximate caching for efficiently serving {Text-to-Image} diffusion models

    Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. Approximate caching for efficiently serving {Text-to-Image} diffusion models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1173–1189, 2024

  2. [2]

    A minimization principle for the description of modes associated with finite-time instabilities

    H Babaee and TP Sapsis. A minimization principle for the description of modes associated with finite-time instabilities. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 472(2186):20150779, 2016

  3. [3]

    Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents

    Hessam Babaee, Mohamad Farazmand, George Haller, and Themistoklis P Sapsis. Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(6), 2017

  4. [4]

    All are worth words: A vit backbone for diffusion models

    Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023

  5. [5]

    Fast computation of finite-time lyapunov exponent fields for unsteady flows

    Steven L Brunton and Clarence W Rowley. Fast computation of finite-time lyapunov exponent fields for unsteady flows. Chaos: An Interdisciplinary Journal of Nonlinear Science , 20(1), 2010

  6. [6]

    Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights

    Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 821–830, 2024

  7. [7]

    Sobolev training for neural networks

    Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017

  8. [8]

    Genie: Higher-order denoising diffusion solvers

    Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35:30150–30166, 2022

  9. [9]

    Lyapunov spectra of chaotic recurrent neural networks

    Rainer Engelken, Fred Wolf, and Larry F Abbott. Lyapunov spectra of chaotic recurrent neural networks. arXiv preprint arXiv:2006.02427, 2020

  10. [10]

    Depgraph: Towards any structural pruning

    Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16091–16101, 2023

  11. [11]

    Structural pruning for diffusion models

    Gongfan Fang, Xinyin Ma, and Xinchao Wang. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023

  12. [12]

    Detection of lagrangian coherent structures in 3d turbulence

    MA Green, Clarence Rowley, and George Haller. Detection of lagrangian coherent structures in 3d turbulence. J. Fluid Mech, 18547(18550):20, 2006

  13. [13]

    Lagrangian coherent structures

    George Haller. Lagrangian coherent structures. Annual review of fluid mechanics, 47:137–162, 2015

  14. [14]

    Lagrangian coherent structures and mixing in two- dimensional turbulence

    George Haller and Guocheng Yuan. Lagrangian coherent structures and mixing in two- dimensional turbulence. Physica D: Nonlinear Phenomena, 147(3-4):352–370, 2000

  15. [15]

    Ordinary differential equations

    Philip Hartman. Ordinary differential equations. SIAM, 2002

  16. [16]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

  17. [17]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

  18. [18]

    Estimation of non-normalized statistical models by score matching

    Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005

  19. [19]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

  20. [20]

    On architectural compression of text-to-image diffusion models

    Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798, 2023. 20

  21. [21]

    Finite time lyapunov exponent analysis of model predictive control and reinforcement learning

    Kartik Krishna, Steven L Brunton, and Zhuoyuan Song. Finite time lyapunov exponent analysis of model predictive control and reinforcement learning. arXiv preprint arXiv:2304.03326, 2023

  22. [22]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

  23. [23]

    Srdiff: Single image super-resolution with diffusion probabilistic models

    Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022

  24. [24]

    Q-diffusion: Quantizing diffusion models

    Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, and Kurt Keutzer. Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17535–17545, 2023

  25. [25]

    Doug Lipinski and Kamran Mohseni. A ridge tracking algorithm and error estimate for efficient computation of lagrangian coherent structures.Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(1):017504, 2010

  26. [26]

    Oms-dpm: Optimizing the model schedule for diffusion probabilistic models

    Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. In International Conference on Machine Learning, pages 21915–21936. PMLR, 2023

  27. [27]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision , pages 3730–3738, 2015

  28. [28]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022

  29. [29]

    Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

    Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021

  30. [30]

    Deepcache: Accelerating diffusion models for free

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deepcache: Accelerating diffusion models for free. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15762–15772, 2024

  31. [31]

    Dynamical entropy production in spiking neuron networks in the balanced state

    Michael Monteforte and Fred Wolf. Dynamical entropy production in spiking neuron networks in the balanced state. Physical review letters, 105(26):268104, 2010

  32. [32]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021

  33. [33]

    Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences

    Akira Okubo. Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences. In Deep sea research and oceanographic abstracts, volume 17, pages 445–454. Elsevier, 1970

  34. [34]

    Oseledets theorem

    Valery Oseledets. Oseledets theorem. Scholarpedia, 3(1):1846, 2008

  35. [35]

    T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching

    Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, and Anima Anandkumar. T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching. arXiv preprint arXiv:2402.14167, 2024

  36. [36]

    Zero-shot image-to-image translation

    Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023

  37. [37]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

  38. [38]

    High- resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  39. [39]

    Ergodic theory of differentiable dynamical systems

    David Ruelle. Ergodic theory of differentiable dynamical systems. Publications Mathématiques de l’Institut des Hautes Études Scientifiques, 50(1):27–58, 1979

  40. [40]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022. 21

  41. [41]

    Progressive Distillation for Fast Sampling of Diffusion Models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022

  42. [42]

    Lagrangian coherent structures

    Shawn C Shadden. Lagrangian coherent structures. Transport and mixing in laminar flows: from microfluidics to oceanic currents, pages 59–89, 2011

  43. [43]

    Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows

    Shawn C Shadden, Francois Lekien, and Jerrold E Marsden. Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows. Physica D: Nonlinear Phenomena, 212(3-4):271–304, 2005

  44. [44]

    Post-training quantization on diffusion models

    Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023

  45. [45]

    Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems

    Jimmy Smith, Scott Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. Advances in Neural Information Processing Systems, 34:16700–16713, 2021

  46. [46]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020

  47. [47]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019

  48. [48]

    Sliced score matching: A scalable approach to density and score estimation

    Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence , pages 574–584. PMLR, 2020

  49. [49]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020

  50. [50]

    Knowledge transfer with jacobian matching

    Suraj Srinivas and François Fleuret. Knowledge transfer with jacobian matching. In Interna- tional Conference on Machine Learning, pages 4723–4731. PMLR, 2018

  51. [51]

    A connection between score matching and denoising autoencoders

    Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

  52. [52]

    Lyapunov-guided representation of recurrent neural network performance

    Ryan V ogt, Yang Zheng, and Eli Shlizerman. Lyapunov-guided representation of recurrent neural network performance. Neural Computing and Applications, pages 1–16, 2024

  53. [53]

    Sparsedm: Toward sparse efficient diffusion models

    Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, and Jun Zhu. Sparsedm: Toward sparse efficient diffusion models. arXiv preprint arXiv:2404.10445, 2024

  54. [54]

    Learning to efficiently sample from diffusion probabilistic models

    Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021

  55. [55]

    Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

    Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

  56. [56]

    The dynamics of enstrophy transfer in two-dimensional hydrodynamics

    John Weiss. The dynamics of enstrophy transfer in two-dimensional hydrodynamics. Physica D: Nonlinear Phenomena, 48(2-3):273–294, 1991

  57. [57]

    Cache me if you can: Accelerating diffusion models through block caching

    Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, et al. Cache me if you can: Accelerating diffusion models through block caching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6211–6220, 2024

  58. [58]

    Diffusion probabilistic modeling for video generation

    Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023

  59. [59]

    LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

    Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

  60. [60]

    Laptop-diff: Layer pruning and normalized dis- tillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024

    Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models. arXiv preprint arXiv:2404.11098, 2024

  61. [61]

    Accelerating diffusion models with one-to-many knowledge distillation

    Linfeng Zhang and Kaisheng Ma. Accelerating diffusion models with one-to-many knowledge distillation. arXiv preprint arXiv:2410.04191, 2024. 22

  62. [62]

    Fast sampling of diffusion models with exponential integrator

    Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022

  63. [63]

    Effortless efficiency: Low-cost pruning of diffusion models

    Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models. arXiv preprint arXiv:2412.02852, 2024

  64. [64]

    Hyperpruning: Efficient pruning through lyapunov metric hypersearch

    Yang Zheng and Eli Shlizerman. Hyperpruning: Efficient pruning through lyapunov metric hypersearch. 2022

  65. [65]

    Dip-go: A diffusion pruner via few-step gradient optimization

    Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, et al. Dip-go: A diffusion pruner via few-step gradient optimization. Advances in Neural Information Processing Systems, 37:92581–92604, 2024. 23