Variance Reduction for Expectations with Diffusion Teachers
Pith reviewed 2026-05-25 05:39 UTC · model grok-4.3
The pith
CARV amortizes expensive diffusion teacher computations over multiple noise samples to cut Monte Carlo variance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CARV supplies a hierarchical Monte Carlo estimator that amortizes the costly upstream computation (rendering, simulation, encoding) across multiple cheap diffusion-noise resamples, sharpened by timestep importance sampling and stratified-inverse-CDF sampling. The construction preserves the exact expectation required by the downstream pipeline while lowering estimator variance.
What carries the argument
CARV, the compute-aware variance-accounting framework that motivates the hierarchical MC estimator with amortization over noise resamples plus importance sampling and stratification.
If this is right
- Text-to-3D distillation and attribution pipelines obtain 2-3x effective compute multipliers.
- Single-step distillation sees gradient variance reduced by roughly an order of magnitude.
- The target objective remains exactly the same; only the estimator changes.
- In some regimes Monte Carlo variance ceases to be the dominant bottleneck once the proposed reductions are applied.
Where Pith is reading between the lines
- The same amortization pattern could be applied to any pipeline that repeatedly evaluates an expensive forward map before adding cheap stochastic perturbations.
- When variance reduction no longer improves final metrics, attention should shift to other sources of error such as optimization dynamics or model capacity.
- The importance-sampling and stratification components can be tuned independently of the amortization layer, allowing incremental adoption.
Load-bearing premise
The expensive upstream computation can be amortized over multiple cheap diffusion-noise resamples while preserving the exact Monte Carlo expectation that the downstream pipeline requires.
What would settle it
A side-by-side comparison in which the hierarchical estimator and the standard Monte Carlo estimator produce statistically different mean values on the same downstream objective would show that the amortization step alters the expectation.
Figures
read the original abstract
Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CARV, a compute-aware variance-accounting framework for reducing variance in Monte Carlo expectations over diffusion noise levels and samples when pretrained diffusion models serve as frozen teachers for downstream pipelines (text-to-3D distillation, single-step distillation, data attribution). It introduces a hierarchical MC estimator that amortizes expensive upstream computations (rendering, simulation, encoding) over multiple cheap noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. Experiments report 2-3x effective compute multipliers (mostly from amortization, ~25% from IS+stratification) without changing the objective, plus an order-of-magnitude gradient variance reduction in single-step distillation (without FID gains).
Significance. If the estimator remains exactly unbiased, the amortization approach could yield substantial practical efficiency gains in diffusion-teacher pipelines by reusing upstream work across noise samples. The reported empirical multipliers and variance reductions provide concrete, task-specific evidence of utility in text-to-3D and attribution settings, and the observation that variance reduction does not always translate to better FID usefully delineates when MC variance ceases to be the bottleneck.
major comments (2)
- [Abstract] Abstract: the central claim that the hierarchical MC estimator 'delivers 2-3x effective compute multipliers ... without changing the objective' requires that amortizing upstream computation over noise resamples preserves the exact original expectation. No derivation is supplied showing that the upstream function factors out of the integral over noise in a measure-preserving way (or that upstream and noise are unentangled), which is load-bearing for the unbiasedness assertion.
- [Experiments] Experiments (text-to-3D and attribution results): the reported 2-3x multipliers and ~25% additional gain from IS+stratification are presented without error bars, explicit baseline definitions, or details on how effective compute is measured, undermining assessment of whether the gains are statistically reliable or reproducible.
minor comments (2)
- The single-step distillation experiment notes an order-of-magnitude variance cut but no FID improvement; the manuscript would benefit from a brief discussion of why downstream performance is unaffected (e.g., other bottlenecks).
- Notation for the stratified-inverse-CDF construction and the precise form of the hierarchical estimator could be introduced with explicit equations early in the methods section for clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below and will incorporate clarifications and additions in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the hierarchical MC estimator 'delivers 2-3x effective compute multipliers ... without changing the objective' requires that amortizing upstream computation over noise resamples preserves the exact original expectation. No derivation is supplied showing that the upstream function factors out of the integral over noise in a measure-preserving way (or that upstream and noise are unentangled), which is load-bearing for the unbiasedness assertion.
Authors: We agree an explicit derivation strengthens the presentation. The upstream computation (rendering, simulation, or encoding) operates on clean inputs or model parameters and is independent of the diffusion timestep and noise sample; it therefore factors out of the outer expectation over the noise measure. In the revision we will add a short formal derivation in Section 3 establishing that the hierarchical estimator remains exactly unbiased under this independence. revision: yes
-
Referee: [Experiments] Experiments (text-to-3D and attribution results): the reported 2-3x multipliers and ~25% additional gain from IS+stratification are presented without error bars, explicit baseline definitions, or details on how effective compute is measured, undermining assessment of whether the gains are statistically reliable or reproducible.
Authors: We will strengthen the experimental reporting. The revision will include error bars from at least five independent runs, explicitly define the baseline as the standard single-sample Monte Carlo estimator without amortization or importance sampling, and specify that effective compute is measured as the ratio of wall-clock time (or equivalent FLOPs) needed to reach a target variance level. revision: yes
Circularity Check
No circularity; results are empirical measurements of variance reduction
full rationale
The paper presents CARV as a hierarchical MC estimator whose gains are measured directly in text-to-3D, attribution, and distillation experiments. The abstract states that the techniques deliver 2-3x multipliers 'without changing the objective' and that variance is cut 'by an order of magnitude' in one regime, but these are reported as observed outcomes rather than quantities derived from fitted parameters or self-referential definitions. No equations, uniqueness theorems, or ansatzes are shown that reduce the claimed multipliers to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Monte Carlo expectations over noise levels and Gaussian samples can be estimated with variance reduction while preserving the original objective.
Reference graph
Works this paper leans on
-
[1]
Training data attribution via approximate unrolled differentiation
Juhan Bae, Wu Lin, Jonathan Lorraine, and Roger Grosse. Training data attribution via approximate unrolled differentiation. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2405.12186. 44
-
[2]
Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B. Lindell. 4D-fy: Text-to-4d generation using hybrid score distillation sampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7996–8006,
-
[3]
Benedikt Bitterli, Chris Wyman, Matt Pharr, Peter Shirley, Aaron Lefohn, and Wojciech Jarosz. Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc. SIGGRAPH), 39(4):148:1–148:16, 2020. doi: 10.1145/3386569.3392481. 46
-
[4]
Montrage: Monitoring training for attribution of generative diffusion models
Jonathan Brokman, Omer Hofman, Roman Vainshtein, Amit Giloni, Toshiya Shimizu, Inder- jeet Singh, Oren Rachmil, Alon Zolfi, Asaf Shabtai, Yuki Unno, et al. Montrage: Monitoring training for attribution of generative diffusion models. InEuropean Conference on Computer Vision, pages 1–17. Springer, 2024. 44
work page 2024
-
[5]
Video generation models as world simulators
Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. OpenAI technical report.https://openai. com/research/video-generation-models-as-world-simulators, 2024. 43
work page 2024
-
[6]
Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023. 43
work page 2023
-
[7]
Diffusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11):1684–1704, 2025. 44
work page 2025
-
[8]
Perception prioritized training of diffusion models
Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. URLhttps://arxiv.org/ abs/2204.00227. 43
-
[9]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013. 33
work page 2013
-
[10]
FlashTex: Fast relightable mesh texturing with LightCon- trolNet
Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, and Maneesh Agrawala. FlashTex: Fast relightable mesh texturing with LightCon- trolNet. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024. 3, 44
work page 2024
-
[11]
Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons
Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons. Stable audio open. InICASSP 2025 – 2025 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1–5. IEEE, 2025. 42
work page 2025
-
[12]
Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, and Aleksander Madry. The journey, not the destination: How data guides diffusion models.arXiv preprint arXiv:2312.06205, 2023. 44
-
[13]
Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram V oleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3D content generation.https://github.com/ threestudio-project/threestudio, 2023. 7, 17, 26, 47 10
work page 2023
-
[14]
Efficient diffusion training via Min-SNR weighting strategy
Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via Min-SNR weighting strategy. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7441–7451, 2023. 43, 46
work page 2023
-
[15]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, 2021. 7
work page 2021
-
[16]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020. 2, 42, 43
work page 2020
-
[17]
simple diffusion: End-to-end diffu- sion for high resolution images
Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffu- sion for high resolution images. InInternational Conference on Machine Learning, volume 202, pages 13213–13232. PMLR, 2023. 43
work page 2023
-
[18]
JacNet: Learning functions with structured jacobian
Safwan Hossain and Jonathan Lorraine. JacNet: Learning functions with structured jacobian. InFirst Workshop on Invertible Neural Nets and Normalizing Flows (INNF), ICML, 2019. 47
work page 2019
-
[19]
Planning with Diffusion for Flexible Behavior Synthesis
Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning, volume 162, pages 9902–9915. PMLR, 2022. URLhttps://arxiv.org/abs/2205.09991. 44
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[20]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565–26577, 2022. 43
work page 2022
-
[21]
Analyzing and improving the training dynamics of diffusion models
Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. URLhttps: //arxiv.org/abs/2312.02696. 43
-
[22]
Denoising diffusion restora- tion models
Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restora- tion models. InAdvances in Neural Information Processing Systems, volume 35, pages 23593–23606, 2022. 44
work page 2022
-
[23]
Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, and Il-Chul Moon. Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. InInternational Conference on Machine Learning, volume 162, pages 11201–11228. PMLR, 2022. URLhttps://arxiv.org/abs/2106.05527. 43
-
[24]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInter- national Conference on Learning Representations, 2015. 26
work page 2015
-
[25]
Kingma, Tim Salimans, Ben Poole, and Jonathan Ho
Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, volume 34, 2021. URL https://arxiv.org/abs/2107.00630. 2, 3, 19, 43, 46
-
[26]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational conference on machine learning, pages 1885–1894. PMLR, 2017. 4, 22, 44
work page 2017
-
[27]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Curran Associates, Inc., 2012. URLhttps://proceedings.neurips.cc/paper_files/ paper/2012...
work page 2012
-
[28]
DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models
Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models. InInternational Conference on Learning Representations, 2024. URLhttps://arxiv.org/abs/2310.00902. 44 11
-
[29]
Magic3D: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 7, 43
work page 2023
-
[30]
Daqi Lin, Markus Kettunen, Benedikt Bitterli, Jacopo Pantaleoni, Cem Yuksel, and Chris Wyman. Generalized resampled importance sampling: Foundations of ReSTIR.ACM Trans- actions on Graphics (Proc. SIGGRAPH), 41(4):75:1–75:23, 2022. doi: 10.1145/3528223. 3530158. 46
-
[31]
Diffusion attribution score: Evalu- ating training data influence in diffusion models
Jinxu Lin, Linwei Tao, Minjing Dong, and Chang Xu. Diffusion attribution score: Evalu- ating training data influence in diffusion models. InInternational Conference on Learning Representations, 2025. URLhttps://arxiv.org/abs/2410.18639. 44
-
[32]
Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models
Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024. 44
work page 2024
-
[33]
Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, and Yueqi Duan. Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024. 3, 44
-
[34]
DARTS: Differentiable architecture search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019. URLhttps: //openreview.net/forum?id=S1eYHoC5FX. 47
work page 2019
-
[35]
PhD thesis, University of Toronto, 2024
Jonathan Lorraine.Scalable Nested Optimization for Deep Learning. PhD thesis, University of Toronto, 2024. 44, 47
work page 2024
-
[36]
Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,
Jonathan Lorraine, Nihesh Anderson, Chansoo Lee, Quentin De Laroussilhe, and Mehadi Hassen. Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,
-
[37]
Lyapunov exponents for diversity in differentiable games
Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, and Jakob Fo- erster. Lyapunov exponents for diversity in differentiable games. InInternational Conference on Autonomous Agents and Multiagent Systems, pages 842–852, 2022. 47
work page 2022
-
[38]
ATT3D: Amortized text-to-3d object synthesis
Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, and James Lucas. ATT3D: Amortized text-to-3d object synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17946–17956, 2023. 44, 45, 46, 47
work page 2023
-
[39]
Lorraine, David Acuna, Paul Vicol, and David Duvenaud
Jonathan P. Lorraine, David Acuna, Paul Vicol, and David Duvenaud. Complex momentum for optimization in games. InInternational Conference on Artificial Intelligence and Statis- tics, volume 151, pages 7742–7765. PMLR, 2022. 47
work page 2022
-
[40]
RePaint: Inpainting using denoising diffusion probabilistic models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461– 11471, 2022. 44
work page 2022
-
[41]
Artem Lukoianov, Haitz S ´aez de Oc´ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M. Solomon. Score distillation via reparametrized DDIM. InAdvances in Neural Information Processing Systems, volume 37, pages 26011– 26044, 2024. 43, 45, 46
work page 2024
-
[42]
Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models
Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models. InAdvances in Neural Information Processing Systems, volume 36, 2023. 44
work page 2023
-
[43]
Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation
Zhiyuan Ma, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, and Lei Zhang. Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 43 12
work page 2024
-
[44]
Gradient-based hyperparameter op- timization through reversible learning
Dougal Maclaurin, David Duvenaud, and Ryan Adams. Gradient-based hyperparameter op- timization through reversible learning. InInternational Conference on Machine Learning, volume 37, pages 2113–2122. PMLR, 2015. 44, 47
work page 2015
-
[45]
David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. InAdvances in Neural Information Processing Systems, volume 37, 2024. URL https://arxiv.org/abs/2406.09417. 43
-
[46]
Improving hyperparameter optimization with checkpointed model weights
Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, and Arun George Zachariah. Improving hyperparameter optimization with checkpointed model weights. InEuropean Conference on Computer Vision Workshop on Efficient Deep Learning for Foundation Models (EFM), pages 75–96, 2024. doi: 10.1007/ 978-3-031-91979-4 8. 47
work page 2024
-
[47]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URLhttps://arxiv.org/ abs/2108.01073. 44
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
Influence functions for scalable data attribution in diffu- sion models
Bruno Kacper Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, and Richard E Turner. Influence functions for scalable data attribution in diffu- sion models. InThe Thirteenth International Conference on Learning Representations, 2025. 44
work page 2025
-
[49]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4296–4304, 2024. 44
work page 2024
-
[50]
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc. SIGGRAPH), 41(4):102:1–102:15, 2022. doi: 10.1145/3528223.3530127. 26, 45, 47
-
[51]
SwiftBrush: One-step text-to-image diffusion model with variational score distillation
Thuan Hoang Nguyen and Anh Tran. SwiftBrush: One-step text-to-image diffusion model with variational score distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7807–7816, 2024. 44
work page 2024
-
[52]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, volume 139, pages 8162–8171. PMLR, 2021. 2, 18, 43
work page 2021
-
[53]
NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026
Weili Nie, Julius Berner, Chao Liu, and Arash Vahdat. NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026. 8, 17
work page 2026
-
[54]
Art B. Owen. Monte carlo theory, methods and examples.https://artowen.su.domains/ mc/, 2013. 31, 43
work page 2013
-
[55]
Trak: Attributing model behavior at scale
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale. InInternational Conference on Machine Learning, pages 27074–27113. PMLR, 2023. 4, 23, 41, 44
work page 2023
-
[56]
Hyperparameter optimization with approximate gradient
Fabian Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning, volume 48, pages 737–746. PMLR, 2016. 44, 47
work page 2016
-
[57]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023. 8, 37, 43, 50
work page 2023
-
[58]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. DreamFusion: Text-to-3d using 2D diffusion. InInternational Conference on Learning Representations, 2023. URL https://arxiv.org/abs/2209.14988. 3, 7, 43 13
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[59]
Estimating training data influence by tracing gradient descent
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. InAdvances in Neural Information Processing Systems, volume 33, pages 19920–19930, 2020. 4, 23, 44
work page 2020
-
[60]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInter- national Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 27
work page 2021
-
[61]
DreamGaussian4D: Generative 4D gaussian splatting.arXiv preprint arXiv:2312.17142,
Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, and Ziwei Liu. DreamGaussian4D: Generative 4D gaussian splatting.arXiv preprint arXiv:2312.17142,
-
[62]
Input convex gradient net- works
Jack Richter-Powell, Jonathan Lorraine, and Brandon Amos. Input convex gradient net- works. InAdvances in Neural Information Processing Systems Optimal Transport and Ma- chine Learning Workshop, 2021. 47
work page 2021
-
[63]
Score distillation sampling for audio: Source separation, synthesis, and beyond
Jessie Richter-Powell, Antonio Torralba, and Jonathan Lorraine. Score distillation sampling for audio: Source separation, synthesis, and beyond. arXiv preprint arXiv:2505.04621, 2025. Presented at the ICML 2025 AI Heard That! Workshop on Machine Learning for Audio. 3, 44, 47
-
[64]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Om- mer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695,
-
[65]
Reuven Y . Rubinstein and Dirk P. Kroese.Simulation and the Monte Carlo Method. John Wiley & Sons, 3 edition, 2016. 2, 18, 43
work page 2016
-
[66]
Align your flow: Scaling continuous- time flow map distillation
Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous- time flow map distillation. InAdvances in Neural Information Processing Systems, 2025. URLhttps://arxiv.org/abs/2506.14603. 44
-
[67]
Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv. org/abs/2202.00512. 43
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[68]
Multistep distilla- tion of diffusion models via moment matching
Tim Salimans, Thomas Mensink, Jonathan Heek, and Emiel Hoogeboom. Multistep distilla- tion of diffusion models via moment matching. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2406.04103. 43, 44
-
[69]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision. Springer, 2024. URLhttps: //arxiv.org/abs/2311.17042. 44
-
[70]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. MVDream: Multi-view diffusion for 3D generation. InInternational Conference on Learning Represen- tations, 2024. URLhttps://arxiv.org/abs/2308.16512. 43, 47
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[71]
Deep un- supervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep un- supervised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, volume 37, pages 2256–2265. PMLR, 2015. 42
work page 2015
-
[72]
Improved Techniques for Training Consistency Models
Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In International Conference on Learning Representations, 2024. URLhttps://arxiv.org/ abs/2310.14189. 44
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[73]
Generative modeling by estimating gradients of the data dis- tribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data dis- tribution. InAdvances in Neural Information Processing Systems, volume 32, pages 11918– 11930, 2019. 42 14
work page 2019
-
[74]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021. URLhttps://arxiv. org/abs/2011.13456. 42
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[75]
Solving inverse problems in med- ical imaging with score-based generative models
Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in med- ical imaging with score-based generative models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv.org/abs/2111.08005. 44
-
[76]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, volume 202, pages 32211–32252. PMLR,
-
[77]
URLhttps://arxiv.org/abs/2303.01469. 44
work page internal anchor Pith review Pith/arXiv arXiv
-
[78]
Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, and James Lucas. Multi-student diffusion distillation for better one-step generators.arXiv preprint arXiv:2410.23274, 2024. 44, 46, 47
-
[79]
VidGen-1M: A large-scale dataset for text-to-video generation
Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, and Hao Li. VidGen-1M: A large-scale dataset for text-to-video generation. arXiv preprint arXiv:2408.02629, 2024. 8, 41, 43
-
[80]
Mean-shift distillation for diffusion mode seeking
Vikas Thamizharasan, Nikitas Chatzis, Iliyan Georgiev, Matthew Fisher, Difan Liu, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Luk ´aˇc. Mean-shift distillation for diffusion mode seeking. InIEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026. URLhttps://arxiv.org/abs/2502.15989. 46
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.