2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching

Caleb Zheng; Eli Shlizerman

arxiv: 2506.05398 · v2 · submitted 2025-06-03 · 💻 cs.GR

2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching

Caleb Zheng , Eli Shlizerman This is my paper

Pith reviewed 2026-05-19 12:03 UTC · model grok-4.3

classification 💻 cs.GR

keywords diffusion modelsmodel pruningfinetuningJacobian matchingknowledge transfergenerative modelsimage synthesismodel compression

0 comments

The pith

Pruned diffusion models recover much of their output quality by finetuning on a second-order Jacobian matching loss that copies the teacher's sensitivity to small input changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a finetuning method for pruned diffusion models that supplements the usual denoising objective with a new loss term. This term matches the second-order Jacobian of the pruned model to the dense original using random projections. The approach draws from dynamical systems ideas about how models respond to perturbations over time. On image datasets ranging from CIFAR-10 to ImageNet, the method narrows the quality gap to the full model and works for both U-Net and transformer backbones. The framework stays architecture-agnostic and requires no changes to the core diffusion training loop.

Core claim

Finetuning pruned diffusion models with a 2nd-order Jacobian (J^T J) matching loss, computed scalably via random projections and motivated by Finite-Time Lyapunov Exponents, teaches the pruned model to reproduce the dense teacher's sensitivity to small perturbations, thereby restoring generative performance that standard denoising finetuning leaves behind.

What carries the argument

The second-order Jacobian matching loss that transfers perturbation sensitivity from the dense teacher to the pruned student through random projections.

If this is right

Pruned models achieve higher visual quality and lower FID scores than those finetuned only with the original denoising objective.
The same Jacobian matching procedure works unchanged on both convolutional U-Net and transformer-based diffusion architectures.
The performance gap between pruned and dense models shrinks across multiple datasets including CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO.
Inference cost drops while retaining most of the original generative capability without altering the sampling procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity-matching idea could be tested on other compression methods such as quantization or distillation to see whether it generalizes beyond pruning.
Direct computation of the full Jacobian without projections might yield further gains if the random-projection approximation is the current accuracy bottleneck.
Preserving local dynamical stability after compression may be a broader requirement for maintaining performance in other iterative generative models.

Load-bearing premise

Matching the second-order Jacobian through random projections transfers the exact sensitivity information that pruning removes and that this transfer is what restores generative quality.

What would settle it

Running the same pruning and finetuning pipeline on a new dataset and finding that the Jacobian matching term produces no measurable improvement in FID or sample quality over standard denoising finetuning alone.

Figures

Figures reproduced from arXiv: 2506.05398 by Caleb Zheng, Eli Shlizerman.

**Figure 2.** Figure 2: CelebA images sampled from dense models (left), Diff-Pruning (mid) and IGSM models [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: LSUN-Church images sampled from dense models (left), Diff-Pruning (mid) and IGSM [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: LSUN-Bedroom images sampled from dense models (left), Diff-Pruning (mid) and IGSM [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

read the original abstract

Diffusion models achieve remarkable performance across diverse generative tasks in computer vision, but their high computational cost remains a major barrier to deployment. Model pruning offers a promising way to reduce inference cost and enable lightweight models. However, pruning leads to quality drop due to reduced capacity. A key limitation of existing pruning approaches is that pruned models are finetuned using the same objective as the dense model (denoising score matching). Since the dense model is accessible during finetuning, it warrants a more effective approach for knowledge transfer from the dense to the pruned model. Motivated by this, we propose \textbf{2ndMatch} (\textbf{2ndM}), a general-purpose finetuning framework that introduces a \textbf{2nd}-order Jacobian ($J^{\top} J$) \textbf{M}atching loss inspired by Finite-Time Lyapunov Exponents. \textbf{2ndM} teaches the pruned model to mimic the sensitivity of the dense teacher, i.e., how to respond to small perturbations over time, through scalable random projections. The framework is architecture-agnostic and applies to both U-Net- and Transformer-based diffusion models. Experiments on CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO demonstrate that \textbf{2ndM} reduces the performance gap between pruned and dense models, substantially improving output quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

2ndMatch proposes Jacobian matching for pruned diffusion finetuning but the evidence tying that specific signal to the quality drop is still thin.

read the letter

Here's the quick take on this arXiv paper: it proposes using second-order Jacobian matching as a finetuning loss to help pruned diffusion models recover quality by mimicking the teacher's sensitivity to perturbations, but the link between that quantity and actual generative performance after pruning is not strongly established. The new part is the 2ndM framework that adds this J^T J matching term, approximated via random projections for scalability, on top of the usual denoising objective. They apply it to both U-Net and transformer based models and test on a range of datasets from CIFAR-10 to ImageNet and MSCOCO. If the gains are real, this could be a practical way to get better pruned models without changing the architecture much. What works is the motivation to do something different from standard finetuning since the dense model is there during the process. The random projection trick keeps it efficient, which is important for these large models. The main concern is that the paper motivates the choice with Finite-Time Lyapunov Exponents but does not measure whether the Jacobian spectrum actually changes much after pruning or if matching it explains the quality recovery. Without ablations showing that this term is responsible for the improvement rather than just more training or other factors, and without reported numbers or error bars, it's hard to gauge the effect size. The stress test note about unquantified approximation error is fair based on what's described. This paper is for people working on model compression for diffusion models in computer vision. Someone already doing pruning experiments might get an idea to try here, but it would need the full experimental details to be convincing. I think it deserves peer review because the idea is distinct and addresses a clear deployment issue, even if the current evidence is preliminary.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces 2ndMatch (2ndM), a finetuning framework for pruned diffusion models that augments the standard denoising objective with a second-order Jacobian (J^T J) matching loss. This loss is motivated by Finite-Time Lyapunov Exponents and implemented via scalable random projections to encourage the pruned student to mimic the dense teacher's local sensitivity to perturbations over time. The approach is architecture-agnostic and is evaluated on U-Net and Transformer-based diffusion models across CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO, with the central claim that it reduces the performance gap relative to the dense model.

Significance. If the core hypothesis holds, the work offers a principled alternative to standard knowledge distillation for pruned diffusion models by transferring dynamical sensitivity information rather than only score-matching behavior. The random-projection estimator for tractability and the extension to both convolutional and attention-based backbones are practical strengths. However, the significance is tempered by the absence of direct evidence linking post-pruning changes in the J^T J spectrum to FID degradation.

major comments (3)

[Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.
[Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).
[Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.

minor comments (2)

[Method] The notation for the Jacobian operator and the precise definition of the matching loss (including how random projections are applied to both forward and backward passes) would benefit from an explicit equation block early in the method section.
[Experiments] Figure captions and axis labels in the qualitative results could more clearly indicate which rows correspond to 2ndM versus baseline pruning methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will incorporate revisions to improve the manuscript's rigor and clarity.

read point-by-point responses

Referee: [Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.

Authors: We agree that direct empirical evidence would strengthen the motivation. In the revised manuscript, we will add pre- and post-pruning comparisons of the J^T J spectrum (approximated via random projections) and FTLE estimates on a representative small-scale setting (e.g., CIFAR-10 U-Net). These will be presented as a new figure or table correlating spectral changes with FID degradation, demonstrating that pruning alters local sensitivity in ways not fully recovered by standard denoising finetuning alone. revision: yes
Referee: [Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).

Authors: We acknowledge the need for quantitative validation of the estimator. The revised version will include a dedicated ablation subsection that (1) reports the approximation error (e.g., relative Frobenius norm to exact J^T J on toy or small networks where exact computation is feasible) and (2) varies the projection dimension while measuring downstream FID impact. We will also compare the random-projection version against exact Jacobian matching on the smallest evaluated model to isolate the effect of the estimator on the transferred signal. revision: yes
Referee: [Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.

Authors: We apologize for the lack of detail in the current draft. In the revision we will expand the Experiments section and add an appendix containing: complete baseline descriptions with citations, error bars and standard deviations from at least three independent runs, statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the reported improvements, and a comprehensive table of all hyperparameters, optimizer settings, random seeds, and implementation choices to enable full reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the 2ndMatch derivation.

full rationale

The paper defines a new finetuning objective (2nd-order Jacobian matching loss J^T J approximated via random projections) that is introduced independently as an addition to the standard denoising score matching loss. This construction does not reduce by definition to any fitted parameter from the pruning stage, nor does it rely on a self-citation chain or uniqueness theorem imported from prior author work to force the choice. The motivation via Finite-Time Lyapunov Exponents is presented as inspiration rather than a load-bearing derivation step that collapses the method onto its inputs. The overall framework remains self-contained against external benchmarks, with performance claims resting on empirical results across datasets rather than tautological equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that sensitivity matching via second-order Jacobians captures the knowledge needed to recover generative quality after pruning; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Second-order Jacobian matching transfers the relevant sensitivity information from dense to pruned diffusion models.
Invoked as the core of the 2ndM loss, motivated by Finite-Time Lyapunov Exponents.

pith-pipeline@v0.9.0 · 5776 in / 1175 out tokens · 47375 ms · 2026-05-19T12:03:22.422051+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

IGSM supervises how the score function responds to small input perturbations over time... draws inspiration from... Finite-Time Lyapunov Exponents (FTLE)... LJac = E[ (||J^T v̂||² - ||J_D^T v̂||²) ]
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FTLE... σ = (1/t1) ln √λ_max where C = J^T J is the Cauchy-Green tensor

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 5 internal anchors

[1]

Approximate caching for efficiently serving {Text-to-Image} diffusion models

Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. Approximate caching for efficiently serving {Text-to-Image} diffusion models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1173–1189, 2024

work page 2024
[2]

A minimization principle for the description of modes associated with finite-time instabilities

H Babaee and TP Sapsis. A minimization principle for the description of modes associated with finite-time instabilities. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 472(2186):20150779, 2016

work page 2016
[3]

Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents

Hessam Babaee, Mohamad Farazmand, George Haller, and Themistoklis P Sapsis. Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(6), 2017

work page 2017
[4]

All are worth words: A vit backbone for diffusion models

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023

work page 2023
[5]

Fast computation of finite-time lyapunov exponent fields for unsteady flows

Steven L Brunton and Clarence W Rowley. Fast computation of finite-time lyapunov exponent fields for unsteady flows. Chaos: An Interdisciplinary Journal of Nonlinear Science , 20(1), 2010

work page 2010
[6]

Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 821–830, 2024

work page 2024
[7]

Sobolev training for neural networks

Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017

work page 2017
[8]

Genie: Higher-order denoising diffusion solvers

Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35:30150–30166, 2022

work page 2022
[9]

Lyapunov spectra of chaotic recurrent neural networks

Rainer Engelken, Fred Wolf, and Larry F Abbott. Lyapunov spectra of chaotic recurrent neural networks. arXiv preprint arXiv:2006.02427, 2020

work page arXiv 2006
[10]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16091–16101, 2023

work page 2023
[11]

Structural pruning for diffusion models

Gongfan Fang, Xinyin Ma, and Xinchao Wang. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023

work page arXiv 2023
[12]

Detection of lagrangian coherent structures in 3d turbulence

MA Green, Clarence Rowley, and George Haller. Detection of lagrangian coherent structures in 3d turbulence. J. Fluid Mech, 18547(18550):20, 2006

work page 2006
[13]

Lagrangian coherent structures

George Haller. Lagrangian coherent structures. Annual review of fluid mechanics, 47:137–162, 2015

work page 2015
[14]

Lagrangian coherent structures and mixing in two- dimensional turbulence

George Haller and Guocheng Yuan. Lagrangian coherent structures and mixing in two- dimensional turbulence. Physica D: Nonlinear Phenomena, 147(3-4):352–370, 2000

work page 2000
[15]

Ordinary differential equations

Philip Hartman. Ordinary differential equations. SIAM, 2002

work page 2002
[16]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

work page 2017
[17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[18]

Estimation of non-normalized statistical models by score matching

Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005

work page 2005
[19]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

work page 2022
[20]

On architectural compression of text-to-image diffusion models

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798, 2023. 20

work page arXiv 2023
[21]

Finite time lyapunov exponent analysis of model predictive control and reinforcement learning

Kartik Krishna, Steven L Brunton, and Zhuoyuan Song. Finite time lyapunov exponent analysis of model predictive control and reinforcement learning. arXiv preprint arXiv:2304.03326, 2023

work page arXiv 2023
[22]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[23]

Srdiff: Single image super-resolution with diffusion probabilistic models

Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022

work page 2022
[24]

Q-diffusion: Quantizing diffusion models

Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, and Kurt Keutzer. Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17535–17545, 2023

work page 2023
[25]

Doug Lipinski and Kamran Mohseni. A ridge tracking algorithm and error estimate for efficient computation of lagrangian coherent structures.Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(1):017504, 2010

work page 2010
[26]

Oms-dpm: Optimizing the model schedule for diffusion probabilistic models

Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. In International Conference on Machine Learning, pages 21915–21936. PMLR, 2023

work page 2023
[27]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision , pages 3730–3738, 2015

work page 2015
[28]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022

work page 2022
[29]

Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[30]

Deepcache: Accelerating diffusion models for free

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deepcache: Accelerating diffusion models for free. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15762–15772, 2024

work page 2024
[31]

Dynamical entropy production in spiking neuron networks in the balanced state

Michael Monteforte and Fred Wolf. Dynamical entropy production in spiking neuron networks in the balanced state. Physical review letters, 105(26):268104, 2010

work page 2010
[32]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021

work page 2021
[33]

Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences

Akira Okubo. Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences. In Deep sea research and oceanographic abstracts, volume 17, pages 445–454. Elsevier, 1970

work page 1970
[34]

Oseledets theorem

Valery Oseledets. Oseledets theorem. Scholarpedia, 3(1):1846, 2008

work page 2008
[35]

T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching

Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, and Anima Anandkumar. T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching. arXiv preprint arXiv:2402.14167, 2024

work page arXiv 2024
[36]

Zero-shot image-to-image translation

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023

work page 2023
[37]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

work page 2023
[38]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[39]

Ergodic theory of differentiable dynamical systems

David Ruelle. Ergodic theory of differentiable dynamical systems. Publications Mathématiques de l’Institut des Hautes Études Scientifiques, 50(1):27–58, 1979

work page 1979
[40]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022. 21

work page 2022
[41]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

Lagrangian coherent structures

Shawn C Shadden. Lagrangian coherent structures. Transport and mixing in laminar flows: from microfluidics to oceanic currents, pages 59–89, 2011

work page 2011
[43]

Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows

Shawn C Shadden, Francois Lekien, and Jerrold E Marsden. Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows. Physica D: Nonlinear Phenomena, 212(3-4):271–304, 2005

work page 2005
[44]

Post-training quantization on diffusion models

Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023

work page 1972
[45]

Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems

Jimmy Smith, Scott Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. Advances in Neural Information Processing Systems, 34:16700–16713, 2021

work page 2021
[46]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[47]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019

work page 2019
[48]

Sliced score matching: A scalable approach to density and score estimation

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence , pages 574–584. PMLR, 2020

work page 2020
[49]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[50]

Knowledge transfer with jacobian matching

Suraj Srinivas and François Fleuret. Knowledge transfer with jacobian matching. In Interna- tional Conference on Machine Learning, pages 4723–4731. PMLR, 2018

work page 2018
[51]

A connection between score matching and denoising autoencoders

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

work page 2011
[52]

Lyapunov-guided representation of recurrent neural network performance

Ryan V ogt, Yang Zheng, and Eli Shlizerman. Lyapunov-guided representation of recurrent neural network performance. Neural Computing and Applications, pages 1–16, 2024

work page 2024
[53]

Sparsedm: Toward sparse efficient diffusion models

Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, and Jun Zhu. Sparsedm: Toward sparse efficient diffusion models. arXiv preprint arXiv:2404.10445, 2024

work page arXiv 2024
[54]

Learning to efficiently sample from diffusion probabilistic models

Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021

work page arXiv 2021
[55]

Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

work page arXiv 2022
[56]

The dynamics of enstrophy transfer in two-dimensional hydrodynamics

John Weiss. The dynamics of enstrophy transfer in two-dimensional hydrodynamics. Physica D: Nonlinear Phenomena, 48(2-3):273–294, 1991

work page 1991
[57]

Cache me if you can: Accelerating diffusion models through block caching

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, et al. Cache me if you can: Accelerating diffusion models through block caching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6211–6220, 2024

work page 2024
[58]

Diffusion probabilistic modeling for video generation

Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023

work page 2023
[59]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[60]

Laptop-diff: Layer pruning and normalized dis- tillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024

Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models. arXiv preprint arXiv:2404.11098, 2024

work page arXiv 2024
[61]

Accelerating diffusion models with one-to-many knowledge distillation

Linfeng Zhang and Kaisheng Ma. Accelerating diffusion models with one-to-many knowledge distillation. arXiv preprint arXiv:2410.04191, 2024. 22

work page arXiv 2024
[62]

Fast sampling of diffusion models with exponential integrator

Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022

work page arXiv 2022
[63]

Effortless efficiency: Low-cost pruning of diffusion models

Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models. arXiv preprint arXiv:2412.02852, 2024

work page arXiv 2024
[64]

Hyperpruning: Efficient pruning through lyapunov metric hypersearch

Yang Zheng and Eli Shlizerman. Hyperpruning: Efficient pruning through lyapunov metric hypersearch. 2022

work page 2022
[65]

Dip-go: A diffusion pruner via few-step gradient optimization

Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, et al. Dip-go: A diffusion pruner via few-step gradient optimization. Advances in Neural Information Processing Systems, 37:92581–92604, 2024. 23

work page 2024

[1] [1]

Approximate caching for efficiently serving {Text-to-Image} diffusion models

Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. Approximate caching for efficiently serving {Text-to-Image} diffusion models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1173–1189, 2024

work page 2024

[2] [2]

A minimization principle for the description of modes associated with finite-time instabilities

H Babaee and TP Sapsis. A minimization principle for the description of modes associated with finite-time instabilities. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 472(2186):20150779, 2016

work page 2016

[3] [3]

Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents

Hessam Babaee, Mohamad Farazmand, George Haller, and Themistoklis P Sapsis. Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(6), 2017

work page 2017

[4] [4]

All are worth words: A vit backbone for diffusion models

Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023

work page 2023

[5] [5]

Fast computation of finite-time lyapunov exponent fields for unsteady flows

Steven L Brunton and Clarence W Rowley. Fast computation of finite-time lyapunov exponent fields for unsteady flows. Chaos: An Interdisciplinary Journal of Nonlinear Science , 20(1), 2010

work page 2010

[6] [6]

Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 821–830, 2024

work page 2024

[7] [7]

Sobolev training for neural networks

Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017

work page 2017

[8] [8]

Genie: Higher-order denoising diffusion solvers

Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35:30150–30166, 2022

work page 2022

[9] [9]

Lyapunov spectra of chaotic recurrent neural networks

Rainer Engelken, Fred Wolf, and Larry F Abbott. Lyapunov spectra of chaotic recurrent neural networks. arXiv preprint arXiv:2006.02427, 2020

work page arXiv 2006

[10] [10]

Depgraph: Towards any structural pruning

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16091–16101, 2023

work page 2023

[11] [11]

Structural pruning for diffusion models

Gongfan Fang, Xinyin Ma, and Xinchao Wang. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023

work page arXiv 2023

[12] [12]

Detection of lagrangian coherent structures in 3d turbulence

MA Green, Clarence Rowley, and George Haller. Detection of lagrangian coherent structures in 3d turbulence. J. Fluid Mech, 18547(18550):20, 2006

work page 2006

[13] [13]

Lagrangian coherent structures

George Haller. Lagrangian coherent structures. Annual review of fluid mechanics, 47:137–162, 2015

work page 2015

[14] [14]

Lagrangian coherent structures and mixing in two- dimensional turbulence

George Haller and Guocheng Yuan. Lagrangian coherent structures and mixing in two- dimensional turbulence. Physica D: Nonlinear Phenomena, 147(3-4):352–370, 2000

work page 2000

[15] [15]

Ordinary differential equations

Philip Hartman. Ordinary differential equations. SIAM, 2002

work page 2002

[16] [16]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017

work page 2017

[17] [17]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020

[18] [18]

Estimation of non-normalized statistical models by score matching

Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005

work page 2005

[19] [19]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

work page 2022

[20] [20]

On architectural compression of text-to-image diffusion models

Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798, 2023. 20

work page arXiv 2023

[21] [21]

Finite time lyapunov exponent analysis of model predictive control and reinforcement learning

Kartik Krishna, Steven L Brunton, and Zhuoyuan Song. Finite time lyapunov exponent analysis of model predictive control and reinforcement learning. arXiv preprint arXiv:2304.03326, 2023

work page arXiv 2023

[22] [22]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[23] [23]

Srdiff: Single image super-resolution with diffusion probabilistic models

Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022

work page 2022

[24] [24]

Q-diffusion: Quantizing diffusion models

Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, and Kurt Keutzer. Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17535–17545, 2023

work page 2023

[25] [25]

Doug Lipinski and Kamran Mohseni. A ridge tracking algorithm and error estimate for efficient computation of lagrangian coherent structures.Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(1):017504, 2010

work page 2010

[26] [26]

Oms-dpm: Optimizing the model schedule for diffusion probabilistic models

Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. In International Conference on Machine Learning, pages 21915–21936. PMLR, 2023

work page 2023

[27] [27]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision , pages 3730–3738, 2015

work page 2015

[28] [28]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022

work page 2022

[29] [29]

Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[30] [30]

Deepcache: Accelerating diffusion models for free

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deepcache: Accelerating diffusion models for free. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15762–15772, 2024

work page 2024

[31] [31]

Dynamical entropy production in spiking neuron networks in the balanced state

Michael Monteforte and Fred Wolf. Dynamical entropy production in spiking neuron networks in the balanced state. Physical review letters, 105(26):268104, 2010

work page 2010

[32] [32]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021

work page 2021

[33] [33]

Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences

Akira Okubo. Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences. In Deep sea research and oceanographic abstracts, volume 17, pages 445–454. Elsevier, 1970

work page 1970

[34] [34]

Oseledets theorem

Valery Oseledets. Oseledets theorem. Scholarpedia, 3(1):1846, 2008

work page 2008

[35] [35]

T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching

Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, and Anima Anandkumar. T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching. arXiv preprint arXiv:2402.14167, 2024

work page arXiv 2024

[36] [36]

Zero-shot image-to-image translation

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023

work page 2023

[37] [37]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

work page 2023

[38] [38]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[39] [39]

Ergodic theory of differentiable dynamical systems

David Ruelle. Ergodic theory of differentiable dynamical systems. Publications Mathématiques de l’Institut des Hautes Études Scientifiques, 50(1):27–58, 1979

work page 1979

[40] [40]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022. 21

work page 2022

[41] [41]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[42] [42]

Lagrangian coherent structures

Shawn C Shadden. Lagrangian coherent structures. Transport and mixing in laminar flows: from microfluidics to oceanic currents, pages 59–89, 2011

work page 2011

[43] [43]

Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows

Shawn C Shadden, Francois Lekien, and Jerrold E Marsden. Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows. Physica D: Nonlinear Phenomena, 212(3-4):271–304, 2005

work page 2005

[44] [44]

Post-training quantization on diffusion models

Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023

work page 1972

[45] [45]

Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems

Jimmy Smith, Scott Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. Advances in Neural Information Processing Systems, 34:16700–16713, 2021

work page 2021

[46] [46]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[47] [47]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019

work page 2019

[48] [48]

Sliced score matching: A scalable approach to density and score estimation

Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence , pages 574–584. PMLR, 2020

work page 2020

[49] [49]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011

[50] [50]

Knowledge transfer with jacobian matching

Suraj Srinivas and François Fleuret. Knowledge transfer with jacobian matching. In Interna- tional Conference on Machine Learning, pages 4723–4731. PMLR, 2018

work page 2018

[51] [51]

A connection between score matching and denoising autoencoders

Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

work page 2011

[52] [52]

Lyapunov-guided representation of recurrent neural network performance

Ryan V ogt, Yang Zheng, and Eli Shlizerman. Lyapunov-guided representation of recurrent neural network performance. Neural Computing and Applications, pages 1–16, 2024

work page 2024

[53] [53]

Sparsedm: Toward sparse efficient diffusion models

Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, and Jun Zhu. Sparsedm: Toward sparse efficient diffusion models. arXiv preprint arXiv:2404.10445, 2024

work page arXiv 2024

[54] [54]

Learning to efficiently sample from diffusion probabilistic models

Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021

work page arXiv 2021

[55] [55]

Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022

work page arXiv 2022

[56] [56]

The dynamics of enstrophy transfer in two-dimensional hydrodynamics

John Weiss. The dynamics of enstrophy transfer in two-dimensional hydrodynamics. Physica D: Nonlinear Phenomena, 48(2-3):273–294, 1991

work page 1991

[57] [57]

Cache me if you can: Accelerating diffusion models through block caching

Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, et al. Cache me if you can: Accelerating diffusion models through block caching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6211–6220, 2024

work page 2024

[58] [58]

Diffusion probabilistic modeling for video generation

Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023

work page 2023

[59] [59]

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[60] [60]

Laptop-diff: Layer pruning and normalized dis- tillation for compressing diffusion models.arXiv preprint arXiv:2404.11098, 2024

Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models. arXiv preprint arXiv:2404.11098, 2024

work page arXiv 2024

[61] [61]

Accelerating diffusion models with one-to-many knowledge distillation

Linfeng Zhang and Kaisheng Ma. Accelerating diffusion models with one-to-many knowledge distillation. arXiv preprint arXiv:2410.04191, 2024. 22

work page arXiv 2024

[62] [62]

Fast sampling of diffusion models with exponential integrator

Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022

work page arXiv 2022

[63] [63]

Effortless efficiency: Low-cost pruning of diffusion models

Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models. arXiv preprint arXiv:2412.02852, 2024

work page arXiv 2024

[64] [64]

Hyperpruning: Efficient pruning through lyapunov metric hypersearch

Yang Zheng and Eli Shlizerman. Hyperpruning: Efficient pruning through lyapunov metric hypersearch. 2022

work page 2022

[65] [65]

Dip-go: A diffusion pruner via few-step gradient optimization

Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, et al. Dip-go: A diffusion pruner via few-step gradient optimization. Advances in Neural Information Processing Systems, 37:92581–92604, 2024. 23

work page 2024