2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching
Pith reviewed 2026-05-19 12:03 UTC · model grok-4.3
The pith
Pruned diffusion models recover much of their output quality by finetuning on a second-order Jacobian matching loss that copies the teacher's sensitivity to small input changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Finetuning pruned diffusion models with a 2nd-order Jacobian (J^T J) matching loss, computed scalably via random projections and motivated by Finite-Time Lyapunov Exponents, teaches the pruned model to reproduce the dense teacher's sensitivity to small perturbations, thereby restoring generative performance that standard denoising finetuning leaves behind.
What carries the argument
The second-order Jacobian matching loss that transfers perturbation sensitivity from the dense teacher to the pruned student through random projections.
If this is right
- Pruned models achieve higher visual quality and lower FID scores than those finetuned only with the original denoising objective.
- The same Jacobian matching procedure works unchanged on both convolutional U-Net and transformer-based diffusion architectures.
- The performance gap between pruned and dense models shrinks across multiple datasets including CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO.
- Inference cost drops while retaining most of the original generative capability without altering the sampling procedure.
Where Pith is reading between the lines
- The same sensitivity-matching idea could be tested on other compression methods such as quantization or distillation to see whether it generalizes beyond pruning.
- Direct computation of the full Jacobian without projections might yield further gains if the random-projection approximation is the current accuracy bottleneck.
- Preserving local dynamical stability after compression may be a broader requirement for maintaining performance in other iterative generative models.
Load-bearing premise
Matching the second-order Jacobian through random projections transfers the exact sensitivity information that pruning removes and that this transfer is what restores generative quality.
What would settle it
Running the same pruning and finetuning pipeline on a new dataset and finding that the Jacobian matching term produces no measurable improvement in FID or sample quality over standard denoising finetuning alone.
Figures
read the original abstract
Diffusion models achieve remarkable performance across diverse generative tasks in computer vision, but their high computational cost remains a major barrier to deployment. Model pruning offers a promising way to reduce inference cost and enable lightweight models. However, pruning leads to quality drop due to reduced capacity. A key limitation of existing pruning approaches is that pruned models are finetuned using the same objective as the dense model (denoising score matching). Since the dense model is accessible during finetuning, it warrants a more effective approach for knowledge transfer from the dense to the pruned model. Motivated by this, we propose \textbf{2ndMatch} (\textbf{2ndM}), a general-purpose finetuning framework that introduces a \textbf{2nd}-order Jacobian ($J^{\top} J$) \textbf{M}atching loss inspired by Finite-Time Lyapunov Exponents. \textbf{2ndM} teaches the pruned model to mimic the sensitivity of the dense teacher, i.e., how to respond to small perturbations over time, through scalable random projections. The framework is architecture-agnostic and applies to both U-Net- and Transformer-based diffusion models. Experiments on CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO demonstrate that \textbf{2ndM} reduces the performance gap between pruned and dense models, substantially improving output quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces 2ndMatch (2ndM), a finetuning framework for pruned diffusion models that augments the standard denoising objective with a second-order Jacobian (J^T J) matching loss. This loss is motivated by Finite-Time Lyapunov Exponents and implemented via scalable random projections to encourage the pruned student to mimic the dense teacher's local sensitivity to perturbations over time. The approach is architecture-agnostic and is evaluated on U-Net and Transformer-based diffusion models across CIFAR-10, CelebA, LSUN, ImageNet, and MSCOCO, with the central claim that it reduces the performance gap relative to the dense model.
Significance. If the core hypothesis holds, the work offers a principled alternative to standard knowledge distillation for pruned diffusion models by transferring dynamical sensitivity information rather than only score-matching behavior. The random-projection estimator for tractability and the extension to both convolutional and attention-based backbones are practical strengths. However, the significance is tempered by the absence of direct evidence linking post-pruning changes in the J^T J spectrum to FID degradation.
major comments (3)
- [Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.
- [Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).
- [Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.
minor comments (2)
- [Method] The notation for the Jacobian operator and the precise definition of the matching loss (including how random projections are applied to both forward and backward passes) would benefit from an explicit equation block early in the method section.
- [Experiments] Figure captions and axis labels in the qualitative results could more clearly indicate which rows correspond to 2ndM versus baseline pruning methods.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and will incorporate revisions to improve the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [Introduction and Method] The motivation for J^T J matching (inspired by FTLE) is presented without direct empirical support that the second-order Jacobian spectrum changes substantially after pruning in a manner correlated with generative performance drop. No pre/post-pruning comparison of FTLE or J^T J eigenvalues is reported, leaving open whether this quantity is the primary missing signal that standard denoising finetuning fails to recover.
Authors: We agree that direct empirical evidence would strengthen the motivation. In the revised manuscript, we will add pre- and post-pruning comparisons of the J^T J spectrum (approximated via random projections) and FTLE estimates on a representative small-scale setting (e.g., CIFAR-10 U-Net). These will be presented as a new figure or table correlating spectral changes with FID degradation, demonstrating that pruning alters local sensitivity in ways not fully recovered by standard denoising finetuning alone. revision: yes
-
Referee: [Method] The random-projection estimator for J^T J is introduced for scalability, yet the paper does not quantify the approximation error of this estimator or isolate its impact on the transferred sensitivity signal through controlled ablations (e.g., varying projection dimension or comparing to exact Jacobian matching on small models).
Authors: We acknowledge the need for quantitative validation of the estimator. The revised version will include a dedicated ablation subsection that (1) reports the approximation error (e.g., relative Frobenius norm to exact J^T J on toy or small networks where exact computation is feasible) and (2) varies the projection dimension while measuring downstream FID impact. We will also compare the random-projection version against exact Jacobian matching on the smallest evaluated model to isolate the effect of the estimator on the transferred signal. revision: yes
-
Referee: [Experiments] While the abstract asserts that experiments on multiple datasets demonstrate a reduced performance gap, the reported results lack sufficient detail on baselines, error bars, statistical significance, and implementation hyperparameters to allow independent verification of the claimed improvements.
Authors: We apologize for the lack of detail in the current draft. In the revision we will expand the Experiments section and add an appendix containing: complete baseline descriptions with citations, error bars and standard deviations from at least three independent runs, statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the reported improvements, and a comprehensive table of all hyperparameters, optimizer settings, random seeds, and implementation choices to enable full reproducibility. revision: yes
Circularity Check
No significant circularity detected in the 2ndMatch derivation.
full rationale
The paper defines a new finetuning objective (2nd-order Jacobian matching loss J^T J approximated via random projections) that is introduced independently as an addition to the standard denoising score matching loss. This construction does not reduce by definition to any fitted parameter from the pruning stage, nor does it rely on a self-citation chain or uniqueness theorem imported from prior author work to force the choice. The motivation via Finite-Time Lyapunov Exponents is presented as inspiration rather than a load-bearing derivation step that collapses the method onto its inputs. The overall framework remains self-contained against external benchmarks, with performance claims resting on empirical results across datasets rather than tautological equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Second-order Jacobian matching transfers the relevant sensitivity information from dense to pruned diffusion models.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
IGSM supervises how the score function responds to small input perturbations over time... draws inspiration from... Finite-Time Lyapunov Exponents (FTLE)... LJac = E[ (||J^T v̂||² - ||J_D^T v̂||²) ]
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FTLE... σ = (1/t1) ln √λ_max where C = J^T J is the Cauchy-Green tensor
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Approximate caching for efficiently serving {Text-to-Image} diffusion models
Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna Karanam, Koyel Mukherjee, and Shiv Kumar Saini. Approximate caching for efficiently serving {Text-to-Image} diffusion models. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1173–1189, 2024
work page 2024
-
[2]
A minimization principle for the description of modes associated with finite-time instabilities
H Babaee and TP Sapsis. A minimization principle for the description of modes associated with finite-time instabilities. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 472(2186):20150779, 2016
work page 2016
-
[3]
Hessam Babaee, Mohamad Farazmand, George Haller, and Themistoklis P Sapsis. Reduced- order description of transient instabilities and computation of finite-time lyapunov exponents. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(6), 2017
work page 2017
-
[4]
All are worth words: A vit backbone for diffusion models
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22669–22679, 2023
work page 2023
-
[5]
Fast computation of finite-time lyapunov exponent fields for unsteady flows
Steven L Brunton and Clarence W Rowley. Fast computation of finite-time lyapunov exponent fields for unsteady flows. Chaos: An Interdisciplinary Journal of Nonlinear Science , 20(1), 2010
work page 2010
-
[6]
Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights
Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, and Shinkook Choi. Ld-pruner: Ef- ficient pruning of latent diffusion models using task-agnostic insights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 821–830, 2024
work page 2024
-
[7]
Sobolev training for neural networks
Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017
work page 2017
-
[8]
Genie: Higher-order denoising diffusion solvers
Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35:30150–30166, 2022
work page 2022
-
[9]
Lyapunov spectra of chaotic recurrent neural networks
Rainer Engelken, Fred Wolf, and Larry F Abbott. Lyapunov spectra of chaotic recurrent neural networks. arXiv preprint arXiv:2006.02427, 2020
-
[10]
Depgraph: Towards any structural pruning
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16091–16101, 2023
work page 2023
-
[11]
Structural pruning for diffusion models
Gongfan Fang, Xinyin Ma, and Xinchao Wang. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023
-
[12]
Detection of lagrangian coherent structures in 3d turbulence
MA Green, Clarence Rowley, and George Haller. Detection of lagrangian coherent structures in 3d turbulence. J. Fluid Mech, 18547(18550):20, 2006
work page 2006
-
[13]
Lagrangian coherent structures
George Haller. Lagrangian coherent structures. Annual review of fluid mechanics, 47:137–162, 2015
work page 2015
-
[14]
Lagrangian coherent structures and mixing in two- dimensional turbulence
George Haller and Guocheng Yuan. Lagrangian coherent structures and mixing in two- dimensional turbulence. Physica D: Nonlinear Phenomena, 147(3-4):352–370, 2000
work page 2000
-
[15]
Ordinary differential equations
Philip Hartman. Ordinary differential equations. SIAM, 2002
work page 2002
-
[16]
Gans trained by a two time-scale update rule converge to a local nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017
work page 2017
-
[17]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[18]
Estimation of non-normalized statistical models by score matching
Aapo Hyvärinen and Peter Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005
work page 2005
-
[19]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022
work page 2022
-
[20]
On architectural compression of text-to-image diffusion models
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798, 2023. 20
-
[21]
Finite time lyapunov exponent analysis of model predictive control and reinforcement learning
Kartik Krishna, Steven L Brunton, and Zhuoyuan Song. Finite time lyapunov exponent analysis of model predictive control and reinforcement learning. arXiv preprint arXiv:2304.03326, 2023
-
[22]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[23]
Srdiff: Single image super-resolution with diffusion probabilistic models
Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022
work page 2022
-
[24]
Q-diffusion: Quantizing diffusion models
Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, and Kurt Keutzer. Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17535–17545, 2023
work page 2023
-
[25]
Doug Lipinski and Kamran Mohseni. A ridge tracking algorithm and error estimate for efficient computation of lagrangian coherent structures.Chaos: An Interdisciplinary Journal of Nonlinear Science, 20(1):017504, 2010
work page 2010
-
[26]
Oms-dpm: Optimizing the model schedule for diffusion probabilistic models
Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, and Yu Wang. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. In International Conference on Machine Learning, pages 21915–21936. PMLR, 2023
work page 2023
-
[27]
Deep learning face attributes in the wild
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision , pages 3730–3738, 2015
work page 2015
-
[28]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022
work page 2022
-
[29]
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[30]
Deepcache: Accelerating diffusion models for free
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deepcache: Accelerating diffusion models for free. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15762–15772, 2024
work page 2024
-
[31]
Dynamical entropy production in spiking neuron networks in the balanced state
Michael Monteforte and Fred Wolf. Dynamical entropy production in spiking neuron networks in the balanced state. Physical review letters, 105(26):268104, 2010
work page 2010
-
[32]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021
work page 2021
-
[33]
Akira Okubo. Horizontal dispersion of floatable particles in the vicinity of velocity singularities such as convergences. In Deep sea research and oceanographic abstracts, volume 17, pages 445–454. Elsevier, 1970
work page 1970
- [34]
-
[35]
T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching
Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, and Anima Anandkumar. T-stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching. arXiv preprint arXiv:2402.14167, 2024
-
[36]
Zero-shot image-to-image translation
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023
work page 2023
-
[37]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
work page 2023
-
[38]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[39]
Ergodic theory of differentiable dynamical systems
David Ruelle. Ergodic theory of differentiable dynamical systems. Publications Mathématiques de l’Institut des Hautes Études Scientifiques, 50(1):27–58, 1979
work page 1979
-
[40]
Photorealistic text-to-image diffusion models with deep language understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022. 21
work page 2022
-
[41]
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[42]
Lagrangian coherent structures
Shawn C Shadden. Lagrangian coherent structures. Transport and mixing in laminar flows: from microfluidics to oceanic currents, pages 59–89, 2011
work page 2011
-
[43]
Shawn C Shadden, Francois Lekien, and Jerrold E Marsden. Definition and properties of la- grangian coherent structures from finite-time lyapunov exponents in two-dimensional aperiodic flows. Physica D: Nonlinear Phenomena, 212(3-4):271–304, 2005
work page 2005
-
[44]
Post-training quantization on diffusion models
Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023
work page 1972
-
[45]
Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems
Jimmy Smith, Scott Linderman, and David Sussillo. Reverse engineering recurrent neural networks with jacobian switching linear dynamical systems. Advances in Neural Information Processing Systems, 34:16700–16713, 2021
work page 2021
-
[46]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[47]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019
work page 2019
-
[48]
Sliced score matching: A scalable approach to density and score estimation
Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence , pages 574–584. PMLR, 2020
work page 2020
-
[49]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[50]
Knowledge transfer with jacobian matching
Suraj Srinivas and François Fleuret. Knowledge transfer with jacobian matching. In Interna- tional Conference on Machine Learning, pages 4723–4731. PMLR, 2018
work page 2018
-
[51]
A connection between score matching and denoising autoencoders
Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011
work page 2011
-
[52]
Lyapunov-guided representation of recurrent neural network performance
Ryan V ogt, Yang Zheng, and Eli Shlizerman. Lyapunov-guided representation of recurrent neural network performance. Neural Computing and Applications, pages 1–16, 2024
work page 2024
-
[53]
Sparsedm: Toward sparse efficient diffusion models
Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, and Jun Zhu. Sparsedm: Toward sparse efficient diffusion models. arXiv preprint arXiv:2404.10445, 2024
-
[54]
Learning to efficiently sample from diffusion probabilistic models
Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021
-
[55]
Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. Learning fast samplers for diffusion models by differentiating through sample quality.arXiv preprint arXiv:2202.05830, 2022
-
[56]
The dynamics of enstrophy transfer in two-dimensional hydrodynamics
John Weiss. The dynamics of enstrophy transfer in two-dimensional hydrodynamics. Physica D: Nonlinear Phenomena, 48(2-3):273–294, 1991
work page 1991
-
[57]
Cache me if you can: Accelerating diffusion models through block caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, et al. Cache me if you can: Accelerating diffusion models through block caching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6211–6220, 2024
work page 2024
-
[58]
Diffusion probabilistic modeling for video generation
Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation. Entropy, 25(10):1469, 2023
work page 2023
-
[59]
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[60]
Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, and Haonan Lu. Laptop-diff: Layer pruning and normalized distillation for compressing diffusion models. arXiv preprint arXiv:2404.11098, 2024
-
[61]
Accelerating diffusion models with one-to-many knowledge distillation
Linfeng Zhang and Kaisheng Ma. Accelerating diffusion models with one-to-many knowledge distillation. arXiv preprint arXiv:2410.04191, 2024. 22
-
[62]
Fast sampling of diffusion models with exponential integrator
Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022
-
[63]
Effortless efficiency: Low-cost pruning of diffusion models
Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, and Kenji Kawaguchi. Effortless efficiency: Low-cost pruning of diffusion models. arXiv preprint arXiv:2412.02852, 2024
-
[64]
Hyperpruning: Efficient pruning through lyapunov metric hypersearch
Yang Zheng and Eli Shlizerman. Hyperpruning: Efficient pruning through lyapunov metric hypersearch. 2022
work page 2022
-
[65]
Dip-go: A diffusion pruner via few-step gradient optimization
Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, et al. Dip-go: A diffusion pruner via few-step gradient optimization. Advances in Neural Information Processing Systems, 37:92581–92604, 2024. 23
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.