Variance Reduction for Expectations with Diffusion Teachers

James Lucas; Jesse Bettencourt; Jonathan Lorraine; Matan Atzmon; Xindi Wu

arxiv: 2605.21489 · v1 · pith:THBHO7MOnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.CV· stat.CO· stat.ML

Variance Reduction for Expectations with Diffusion Teachers

Jesse Bettencourt , Xindi Wu , Matan Atzmon , James Lucas , Jonathan Lorraine This is my paper

Pith reviewed 2026-05-21 04:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVstat.COstat.ML

keywords variance reductionmonte carlo estimationdiffusion modelstext-to-3D distillationimportance samplingstratified samplinggradient estimation

0 comments

The pith

A hierarchical Monte Carlo estimator amortizes costly upstream work over multiple cheap diffusion noise samples to cut gradient variance in teacher-based pipelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to lower the variance of Monte Carlo estimates when pretrained diffusion models act as fixed teachers for tasks such as text-to-3D generation and data attribution. Expensive operations like rendering or encoding are performed once and then reused across many inexpensive noise draws at different timesteps. Timestep importance sampling together with a stratified inverse-CDF construction further sharpens the estimator. Experiments show the combined approach multiplies effective compute by two to three times in distillation and attribution settings while leaving the original objective unchanged. In single-step distillation the variance drops sharply yet downstream image quality stays the same, indicating that Monte Carlo variance has ceased to be the dominant bottleneck.

Core claim

CARV is a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In text-to-3D distillation and attribution experiments this yields 2-3x effective compute multipliers, most of the gain coming from amortized reuse and roughly 25 percent additional gain from the sampling refinements, all without altering the objective.

What carries the argument

CARV hierarchical Monte Carlo estimator that reuses a single expensive upstream computation across multiple cheap diffusion noise resamples, refined by timestep importance sampling and stratified inverse-CDF sampling.

If this is right

Text-to-3D distillation and attribution pipelines obtain 2-3x effective compute multipliers.
Single-step distillation sees gradient variance reduced by roughly an order of magnitude.
The majority of the gain comes from amortizing upstream costs; importance sampling plus stratification supplies an additional 25 percent improvement.
Downstream FID remains unchanged once variance falls below a certain threshold, showing variance is no longer the limiting factor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same amortization pattern could be tested in other teacher-student loops that combine diffusion models with expensive forward simulations.
Adding control variates or learned proposal distributions on top of the existing hierarchy might produce further variance reduction without extra upstream cost.
When Monte Carlo variance ceases to dominate, optimization effort should shift toward model capacity or training dynamics rather than sampling refinements.

Load-bearing premise

The expensive upstream computation can be performed once and reused across multiple independent diffusion noise samples while preserving unbiasedness of the overall estimator.

What would settle it

A controlled run of the text-to-3D distillation pipeline that measures gradient variance and effective compute multiplier both with and without the stratified-inverse-CDF step, checking whether the reported variance reduction and 2-3x multiplier disappear.

Figures

Figures reproduced from arXiv: 2605.21489 by James Lucas, Jesse Bettencourt, Jonathan Lorraine, Matan Atzmon, Xindi Wu.

**Figure 2.** Figure 2: Stratified Sampling Visualization: We show 3 realizations/batches of 8 timestep samples for both IID and stratified sampling. Notably, the stratified method creates bins for each sample and requires each batch to contain one sample from each bin, often resulting in lower-variance estimators. 2.3 Diffusion Model Applications 2.3.1 Diffusion Priors for Optimization Score Distillation Sampling (SDS) uses a f… view at source ↗

**Figure 3.** Figure 3: Compute Re-use Visualization: Computational graph comparing baseline (left, K = 1) and our re-noising (right, K > 1). Both take θ (e.g., NeRF weights or generator), render, encode, noise, denoise, combine into a residual, and backpropagate. Renoising helps when (a) (t, ϵ) drives variance and (b) denoising is cheaper than rendering. From [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Combining Stratified Sampling with Importance Weighting: We illustrate how to use inversetransform sampling to map a stratified sample uniformly in [0, 1] (see [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Quantifying variance reduction from IW and stratification (SDS). Top: Variance (tr(Cov(∇θ)) late in training) vs. compute. Colors: uniform baseline and IW+Strat. Points annotated by (R, K). Bottom: Effective compute multiplier vs. uniform baseline. Lines trace (R = 1, K), peaking at (1, 8): ∼2.6× (uniform), ∼3.3× (IW+Strat). Ablations in App [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Performance Gains from Variance Reduction: CLIP score versus optimization iteration, averaged across 30 prompts, 3 seeds, and multiple views (± std. dev.). Equal per-iteration cost (∼ 300 − 400ms/iter, App. Sec. D.1.1), so the iteration axis is wall-clock up to a known constant: baseline vs. ours (stratified+IS+re-noising). Higher CLIP at fixed iteration count from lower per-iteration variance ( [PITH_F… view at source ↗

**Figure 9.** Figure 9: Geometric intuition for efficiency metrics: [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Variance reduction with Monte-Carlo seed error bars (single SDS prompt). [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Quantifying variance reduction from hierarchical cost awareness with importance weighting [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative Results from Variance Reduction: [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

**Figure 13.** Figure 13: Variance reduction across training, low classifier-free guidance ( [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗

**Figure 14.** Figure 14: Variance reduction measured via latent-space residual norm. [PITH_FULL_IMAGE:figures/full_fig_p032_14.png] view at source ↗

**Figure 15.** Figure 15: Cosine similarity to ground-truth gradient versus compute budget. [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗

**Figure 16.** Figure 16: Variance reduction in the low guidance regime ( [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

**Figure 17.** Figure 17: Performance gains from variance reduction at low guidance ( [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative SDS trajectories at low classifier-free guidance ( [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative SDS trajectories at low classifier-free guidance ( [PITH_FULL_IMAGE:figures/full_fig_p034_19.png] view at source ↗

**Figure 20.** Figure 20: Pair probability matrices Q˜(i, j) for N = 2 sampling strategies, computed on gradient data from a single SDS prompt at the end of training. Each panel shows the probability of selecting pair (i, j) on a log scale (brighter = higher probability, gray = zero). (a) IID places equal mass on all pairs (1.00×, baseline). (b) Index-based stratification concentrates mass in off-diagonal blocks. (c) Importance we… view at source ↗

**Figure 21.** Figure 21: Sensitivity of variance reduction to render-vs-denoise cost ratio. [PITH_FULL_IMAGE:figures/full_fig_p036_21.png] view at source ↗

**Figure 22.** Figure 22: Weight function closely tracks gradient magnitude across timesteps. [PITH_FULL_IMAGE:figures/full_fig_p037_22.png] view at source ↗

**Figure 23.** Figure 23: Importance Sampling Strategy Comparison: Weight-Based Heuristic versus Oracle. [PITH_FULL_IMAGE:figures/full_fig_p037_23.png] view at source ↗

**Figure 24.** Figure 24: Comparing Per-Render and Global Stratification Strategies. [PITH_FULL_IMAGE:figures/full_fig_p038_24.png] view at source ↗

**Figure 25.** Figure 25: Quantifying variance reduction against compute cost for one-step distillation. [PITH_FULL_IMAGE:figures/full_fig_p039_25.png] view at source ↗

**Figure 26.** Figure 26: FID convergence during DMD training for student-step resampling. [PITH_FULL_IMAGE:figures/full_fig_p040_26.png] view at source ↗

**Figure 27.** Figure 27: Best FID achieved during training for fake-score-step resampling strategies. [PITH_FULL_IMAGE:figures/full_fig_p040_27.png] view at source ↗

**Figure 28.** Figure 28: (Extended) Quantifying Changes in Data Attribution: [PITH_FULL_IMAGE:figures/full_fig_p042_28.png] view at source ↗

**Figure 29.** Figure 29: Is there an improvement from importance sampling for data attribution? [PITH_FULL_IMAGE:figures/full_fig_p042_29.png] view at source ↗

**Figure 30.** Figure 30: Example Videos for Attribution: We show assorted clips from VIDGEN-1M [78] used for our video data attribution experiments, where the influence is being calculated for Wan2.1-T2V-1.3B [81] Sora [5], CogVideoX [92], and Wan [81]. Diffusion transformers (DiT) [57] and related architectures scaled these models with transformer backbones. We treat pretrained teachers as given and target gradient-estimator va… view at source ↗

read the original abstract

Pretrained diffusion models serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo (MC) expectations over noise levels and Gaussian noise samples; their estimator variance dominates compute cost because each draw requires expensive upstream work (rendering, simulation, encoding). We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical MC estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified-inverse-CDF construction. In our text-to-3D distillation and attribution experiments, CARV delivers 2-3x effective compute multipliers (most from amortized reuse; ~25% additional from IS+stratification) without changing the objective; in single-step distillation, the same techniques cut gradient variance by an order of magnitude but do not improve downstream FID, marking the regime where MC variance is no longer the bottleneck.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CARV's hierarchical amortization reuses upstream work across noise samples for 2-3x effective gains in diffusion pipelines, but unbiasedness under that reuse is the part that needs checking.

read the letter

The main thing to know is that this paper introduces a hierarchical Monte Carlo estimator that amortizes expensive upstream steps like rendering or encoding over multiple cheap diffusion noise resamples, then adds timestep importance sampling and stratified inverse-CDF sampling on top. In the text-to-3D distillation and attribution experiments this produces 2-3x effective compute multipliers, with most of the lift from the amortization and roughly 25% more from the sampling adjustments, all without altering the target objective. In single-step distillation the variance drops sharply but downstream FID does not improve, which they correctly flag as the point where Monte Carlo variance stops being the bottleneck.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CARV, a compute-aware variance-accounting framework for Monte Carlo expectations in downstream tasks that use pretrained diffusion models as frozen teachers. It proposes a hierarchical MC estimator that amortizes expensive upstream computations (rendering, simulation, encoding) over multiple cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified inverse-CDF construction. Experiments on text-to-3D distillation and data attribution report 2-3x effective compute multipliers (mostly from amortization, ~25% from IS+stratification) without altering the objective; single-step distillation shows order-of-magnitude variance reduction but no FID improvement.

Significance. If the unbiasedness of the amortized hierarchical estimator holds and the reported speed-ups are reproducible with proper statistical controls, the work addresses a practical bottleneck in diffusion-based pipelines and could yield meaningful efficiency gains. The empirical demonstration of compute multipliers in concrete applications (text-to-3D, attribution) is a positive contribution, though the lack of error bars and baseline details limits immediate impact.

major comments (2)

[Method (hierarchical MC estimator construction)] The central claim that the hierarchical estimator remains unbiased when amortizing upstream computation over multiple noise resamples requires an explicit derivation. The conditioning, independence assumptions between the expensive upstream function and the diffusion noise, and the precise measure-theoretic construction that guarantees E[CARV estimator] equals the original MC expectation are not sufficiently detailed; without this, the assertion that the objective is unchanged cannot be verified.
[Experiments section] Table or figure reporting the 2-3x multipliers (and the breakdown into amortized reuse vs. IS+stratification) lacks error bars, description of baseline estimators, and definition of 'effective compute.' These omissions make it impossible to assess whether the claimed gains are statistically reliable or how they were measured.

minor comments (2)

[Method] Notation for the stratified-inverse-CDF sampling and the importance weights should be introduced with a short equation or pseudocode to improve readability.
[Abstract and §1] The abstract and introduction could briefly state the key independence assumption that enables amortization while preserving unbiasedness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive suggestions. We address each of the major comments below and will revise the manuscript to incorporate the requested clarifications and improvements.

read point-by-point responses

Referee: [Method (hierarchical MC estimator construction)] The central claim that the hierarchical estimator remains unbiased when amortizing upstream computation over multiple noise resamples requires an explicit derivation. The conditioning, independence assumptions between the expensive upstream function and the diffusion noise, and the precise measure-theoretic construction that guarantees E[CARV estimator] equals the original MC expectation are not sufficiently detailed; without this, the assertion that the objective is unchanged cannot be verified.

Authors: We agree with the referee that a more explicit derivation is needed to rigorously establish the unbiasedness of the hierarchical estimator. In the revised manuscript, we will include a detailed appendix providing the measure-theoretic construction. This will explicitly state the independence assumptions (that the upstream computation is independent of the diffusion noise samples) and the conditioning on the amortized computations, proving that the expectation of the CARV estimator matches the original Monte Carlo expectation. This ensures the objective remains unchanged. revision: yes
Referee: [Experiments section] Table or figure reporting the 2-3x multipliers (and the breakdown into amortized reuse vs. IS+stratification) lacks error bars, description of baseline estimators, and definition of 'effective compute.' These omissions make it impossible to assess whether the claimed gains are statistically reliable or how they were measured.

Authors: We acknowledge the importance of statistical controls and clear definitions in the experimental section. In the revision, we will add error bars to the reported multipliers, obtained from multiple independent runs with different random seeds. We will also provide a precise definition of 'effective compute' as the factor by which the compute budget can be reduced while achieving the same variance level as the baseline. Additionally, we will describe the baseline estimators in detail and include the breakdown of gains attributable to amortized reuse versus the contributions from importance sampling and stratification. revision: yes

Circularity Check

0 steps flagged

No significant circularity; estimator construction is self-contained

full rationale

The paper presents CARV as a hierarchical Monte Carlo estimator that amortizes expensive upstream computations (rendering/encoding) over multiple cheap diffusion noise resamples, augmented by timestep importance sampling and stratified-inverse-CDF sampling. The abstract asserts that this preserves the original objective and yields unbiased estimates, with reported gains (2-3x multipliers) treated as empirical outcomes rather than derived from fitted parameters or self-referential definitions. No equations, self-citations, or ansatzes are visible in the provided text that reduce the central claim to its own inputs by construction. The derivation appears independent and externally falsifiable via the unbiasedness property of the MC construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on standard Monte Carlo unbiasedness assumptions and the existence of a computable importance-sampling distribution over timesteps; no new free parameters or invented entities are introduced in the visible text.

axioms (2)

domain assumption The Monte Carlo estimator remains unbiased when expensive upstream computations are reused across multiple independent noise samples at the same timestep.
Implicit in the claim that amortization does not change the objective.
standard math A timestep importance distribution and stratified inverse-CDF sampler can be constructed without introducing bias.
Standard importance-sampling and stratification theory applied to the diffusion noise schedule.

pith-pipeline@v0.9.0 · 5711 in / 1433 out tokens · 23653 ms · 2026-05-21T04:48:54.764220+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

111 extracted references · 111 canonical work pages · 10 internal anchors

[1]

Training data attribution via approximate unrolled differentiation

Juhan Bae, Wu Lin, Jonathan Lorraine, and Roger Grosse. Training data attribution via approximate unrolled differentiation. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2405.12186. 44

work page arXiv 2024
[2]

Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B. Lindell. 4D-fy: Text-to-4d generation using hybrid score distillation sampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7996–8006,

work page
[3]

ACM Trans

Benedikt Bitterli, Chris Wyman, Matt Pharr, Peter Shirley, Aaron Lefohn, and Wojciech Jarosz. Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc. SIGGRAPH), 39(4):148:1–148:16, 2020. doi: 10.1145/3386569.3392481. 46

work page doi:10.1145/3386569.3392481 2020
[4]

Montrage: Monitoring training for attribution of generative diffusion models

Jonathan Brokman, Omer Hofman, Roman Vainshtein, Amit Giloni, Toshiya Shimizu, Inder- jeet Singh, Oren Rachmil, Alon Zolfi, Asaf Shabtai, Yuki Unno, et al. Montrage: Monitoring training for attribution of generative diffusion models. InEuropean Conference on Computer Vision, pages 1–17. Springer, 2024. 44

work page 2024
[5]

Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. OpenAI technical report.https://openai. com/research/video-generation-models-as-world-simulators, 2024. 43

work page 2024
[6]

Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023. 43

work page 2023
[7]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11):1684–1704, 2025. 44

work page 2025
[8]

Perception prioritized training of diffusion models

Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. URLhttps://arxiv.org/ abs/2204.00227. 43

work page arXiv 2022
[9]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013. 33

work page 2013
[10]

FlashTex: Fast relightable mesh texturing with LightCon- trolNet

Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, and Maneesh Agrawala. FlashTex: Fast relightable mesh texturing with LightCon- trolNet. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024. 3, 44

work page 2024
[11]

Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons. Stable audio open. InICASSP 2025 – 2025 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1–5. IEEE, 2025. 42

work page 2025
[12]

The journey, not the destination: How data guides diffusion models.arXiv preprint arXiv:2312.06205, 2023

Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, and Aleksander Madry. The journey, not the destination: How data guides diffusion models.arXiv preprint arXiv:2312.06205, 2023. 44

work page arXiv 2023
[13]

threestudio: A unified framework for 3D content generation.https://github.com/ threestudio-project/threestudio, 2023

Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram V oleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3D content generation.https://github.com/ threestudio-project/threestudio, 2023. 7, 17, 26, 47 10

work page 2023
[14]

Efficient diffusion training via Min-SNR weighting strategy

Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via Min-SNR weighting strategy. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7441–7451, 2023. 43, 46

work page 2023
[15]

CLIPScore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, 2021. 7

work page 2021
[16]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020. 2, 42, 43

work page 2020
[17]

simple diffusion: End-to-end diffu- sion for high resolution images

Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffu- sion for high resolution images. InInternational Conference on Machine Learning, volume 202, pages 13213–13232. PMLR, 2023. 43

work page 2023
[18]

JacNet: Learning functions with structured jacobian

Safwan Hossain and Jonathan Lorraine. JacNet: Learning functions with structured jacobian. InFirst Workshop on Invertible Neural Nets and Normalizing Flows (INNF), ICML, 2019. 47

work page 2019
[19]

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning, volume 162, pages 9902–9915. PMLR, 2022. URLhttps://arxiv.org/abs/2205.09991. 44

work page internal anchor Pith review Pith/arXiv arXiv 2022
[20]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565–26577, 2022. 43

work page 2022
[21]

org/abs/2312.02696

Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. URLhttps: //arxiv.org/abs/2312.02696. 43

work page arXiv 2024
[22]

Denoising diffusion restora- tion models

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restora- tion models. InAdvances in Neural Information Processing Systems, volume 35, pages 23593–23606, 2022. 44

work page 2022
[23]

Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation

Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, and Il-Chul Moon. Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. InInternational Conference on Machine Learning, volume 162, pages 11201–11228. PMLR, 2022. URLhttps://arxiv.org/abs/2106.05527. 43

work page arXiv 2022
[24]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInter- national Conference on Learning Representations, 2015. 26

work page 2015
[25]

Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, volume 34, 2021. URL https://arxiv.org/abs/2107.00630. 2, 3, 19, 43, 46

work page arXiv 2021
[26]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational conference on machine learning, pages 1885–1894. PMLR, 2017. 4, 22, 44

work page 2017
[27]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Curran Associates, Inc., 2012. URLhttps://proceedings.neurips.cc/paper_files/ paper/2012...

work page 2012
[28]

DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models

Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models. InInternational Conference on Learning Representations, 2024. URLhttps://arxiv.org/abs/2310.00902. 44 11

work page arXiv 2024
[29]

Magic3D: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 7, 43

work page 2023
[30]

Generalized resampled importance sampling: Foundations of ReSTIR.ACM Trans- actions on Graphics (Proc

Daqi Lin, Markus Kettunen, Benedikt Bitterli, Jacopo Pantaleoni, Cem Yuksel, and Chris Wyman. Generalized resampled importance sampling: Foundations of ReSTIR.ACM Trans- actions on Graphics (Proc. SIGGRAPH), 41(4):75:1–75:23, 2022. doi: 10.1145/3528223. 3530158. 46

work page doi:10.1145/3528223 2022
[31]

Diffusion attribution score: Evalu- ating training data influence in diffusion models

Jinxu Lin, Linwei Tao, Minjing Dong, and Chang Xu. Diffusion attribution score: Evalu- ating training data influence in diffusion models. InInternational Conference on Learning Representations, 2025. URLhttps://arxiv.org/abs/2410.18639. 44

work page arXiv 2025
[32]

Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models

Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024. 44

work page 2024
[33]

Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024

Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, and Yueqi Duan. Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024. 3, 44

work page arXiv 2024
[34]

DARTS: Differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019. URLhttps: //openreview.net/forum?id=S1eYHoC5FX. 47

work page 2019
[35]

PhD thesis, University of Toronto, 2024

Jonathan Lorraine.Scalable Nested Optimization for Deep Learning. PhD thesis, University of Toronto, 2024. 44, 47

work page 2024
[36]

Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,

Jonathan Lorraine, Nihesh Anderson, Chansoo Lee, Quentin De Laroussilhe, and Mehadi Hassen. Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,

work page arXiv
[37]

Lyapunov exponents for diversity in differentiable games

Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, and Jakob Fo- erster. Lyapunov exponents for diversity in differentiable games. InInternational Conference on Autonomous Agents and Multiagent Systems, pages 842–852, 2022. 47

work page 2022
[38]

ATT3D: Amortized text-to-3d object synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, and James Lucas. ATT3D: Amortized text-to-3d object synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17946–17956, 2023. 44, 45, 46, 47

work page 2023
[39]

Lorraine, David Acuna, Paul Vicol, and David Duvenaud

Jonathan P. Lorraine, David Acuna, Paul Vicol, and David Duvenaud. Complex momentum for optimization in games. InInternational Conference on Artificial Intelligence and Statis- tics, volume 151, pages 7742–7765. PMLR, 2022. 47

work page 2022
[40]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461– 11471, 2022. 44

work page 2022
[41]

Artem Lukoianov, Haitz S ´aez de Oc´ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M. Solomon. Score distillation via reparametrized DDIM. InAdvances in Neural Information Processing Systems, volume 37, pages 26011– 26044, 2024. 43, 45, 46

work page 2024
[42]

Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models

Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models. InAdvances in Neural Information Processing Systems, volume 36, 2023. 44

work page 2023
[43]

Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation

Zhiyuan Ma, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, and Lei Zhang. Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 43 12

work page 2024
[44]

Gradient-based hyperparameter op- timization through reversible learning

Dougal Maclaurin, David Duvenaud, and Ryan Adams. Gradient-based hyperparameter op- timization through reversible learning. InInternational Conference on Machine Learning, volume 37, pages 2113–2122. PMLR, 2015. 44, 47

work page 2015
[45]

Jacobs, Alexei A

David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. InAdvances in Neural Information Processing Systems, volume 37, 2024. URL https://arxiv.org/abs/2406.09417. 43

work page arXiv 2024
[46]

Improving hyperparameter optimization with checkpointed model weights

Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, and Arun George Zachariah. Improving hyperparameter optimization with checkpointed model weights. InEuropean Conference on Computer Vision Workshop on Efficient Deep Learning for Foundation Models (EFM), pages 75–96, 2024. doi: 10.1007/ 978-3-031-91979-4 8. 47

work page 2024
[47]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URLhttps://arxiv.org/ abs/2108.01073. 44

work page internal anchor Pith review Pith/arXiv arXiv 2022
[48]

Influence functions for scalable data attribution in diffu- sion models

Bruno Kacper Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, and Richard E Turner. Influence functions for scalable data attribution in diffu- sion models. InThe Thirteenth International Conference on Learning Representations, 2025. 44

work page 2025
[49]

T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4296–4304, 2024. 44

work page 2024
[50]

Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc. SIGGRAPH), 41(4):102:1–102:15, 2022. doi: 10.1145/3528223.3530127. 26, 45, 47

work page doi:10.1145/3528223.3530127 2022
[51]

SwiftBrush: One-step text-to-image diffusion model with variational score distillation

Thuan Hoang Nguyen and Anh Tran. SwiftBrush: One-step text-to-image diffusion model with variational score distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7807–7816, 2024. 44

work page 2024
[52]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, volume 139, pages 8162–8171. PMLR, 2021. 2, 18, 43

work page 2021
[53]

NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026

Weili Nie, Julius Berner, Chao Liu, and Arash Vahdat. NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026. 8, 17

work page 2026
[54]

Art B. Owen. Monte carlo theory, methods and examples.https://artowen.su.domains/ mc/, 2013. 31, 43

work page 2013
[55]

Trak: Attributing model behavior at scale

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale. InInternational Conference on Machine Learning, pages 27074–27113. PMLR, 2023. 4, 23, 41, 44

work page 2023
[56]

Hyperparameter optimization with approximate gradient

Fabian Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning, volume 48, pages 737–746. PMLR, 2016. 44, 47

work page 2016
[57]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023. 8, 37, 43, 50

work page 2023
[58]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. DreamFusion: Text-to-3d using 2D diffusion. InInternational Conference on Learning Representations, 2023. URL https://arxiv.org/abs/2209.14988. 3, 7, 43 13

work page internal anchor Pith review Pith/arXiv arXiv 2023
[59]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. InAdvances in Neural Information Processing Systems, volume 33, pages 19920–19930, 2020. 4, 23, 44

work page 2020
[60]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInter- national Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 27

work page 2021
[61]

Dreamgaussian4d: Genera- tive 4d gaussian splatting.arXiv preprint arXiv:2312.17142,

Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, and Ziwei Liu. DreamGaussian4D: Generative 4D gaussian splatting.arXiv preprint arXiv:2312.17142,

work page arXiv
[62]

Input convex gradient net- works

Jack Richter-Powell, Jonathan Lorraine, and Brandon Amos. Input convex gradient net- works. InAdvances in Neural Information Processing Systems Optimal Transport and Ma- chine Learning Workshop, 2021. 47

work page 2021
[63]

Score distillation sampling for audio: Source separation, synthesis, and beyond

Jessie Richter-Powell, Antonio Torralba, and Jonathan Lorraine. Score distillation sampling for audio: Source separation, synthesis, and beyond. arXiv preprint arXiv:2505.04621, 2025. Presented at the ICML 2025 AI Heard That! Workshop on Machine Learning for Audio. 3, 44, 47

work page arXiv 2025
[64]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Om- mer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695,

work page
[65]

Rubinstein and Dirk P

Reuven Y . Rubinstein and Dirk P. Kroese.Simulation and the Monte Carlo Method. John Wiley & Sons, 3 edition, 2016. 2, 18, 43

work page 2016
[66]

Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous- time flow map distillation. InAdvances in Neural Information Processing Systems, 2025. URLhttps://arxiv.org/abs/2506.14603. 44

work page arXiv 2025
[67]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv. org/abs/2202.00512. 43

work page internal anchor Pith review Pith/arXiv arXiv 2022
[68]

Multistep distilla- tion of diffusion models via moment matching

Tim Salimans, Thomas Mensink, Jonathan Heek, and Emiel Hoogeboom. Multistep distilla- tion of diffusion models via moment matching. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2406.04103. 43, 44

work page arXiv 2024
[69]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision. Springer, 2024. URLhttps: //arxiv.org/abs/2311.17042. 44

work page arXiv 2024
[70]

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. MVDream: Multi-view diffusion for 3D generation. InInternational Conference on Learning Represen- tations, 2024. URLhttps://arxiv.org/abs/2308.16512. 43, 47

work page internal anchor Pith review Pith/arXiv arXiv 2024
[71]

Deep un- supervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep un- supervised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, volume 37, pages 2256–2265. PMLR, 2015. 42

work page 2015
[72]

Improved Techniques for Training Consistency Models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In International Conference on Learning Representations, 2024. URLhttps://arxiv.org/ abs/2310.14189. 44

work page internal anchor Pith review arXiv 2024
[73]

Generative modeling by estimating gradients of the data dis- tribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data dis- tribution. InAdvances in Neural Information Processing Systems, volume 32, pages 11918– 11930, 2019. 42 14

work page 2019
[74]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021. URLhttps://arxiv. org/abs/2011.13456. 42

work page internal anchor Pith review Pith/arXiv arXiv 2021
[75]

arXiv preprint arXiv:2111.08005 , year=

Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in med- ical imaging with score-based generative models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv.org/abs/2111.08005. 44

work page arXiv 2022
[76]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, volume 202, pages 32211–32252. PMLR,

work page
[77]

URLhttps://arxiv.org/abs/2303.01469. 44

work page internal anchor Pith review Pith/arXiv arXiv
[78]

Multi-student diffusion distillation for better one-step generators.arXiv preprint arXiv:2410.23274, 2024

Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, and James Lucas. Multi-student diffusion distillation for better one-step generators.arXiv preprint arXiv:2410.23274, 2024. 44, 46, 47

work page arXiv 2024
[79]

VidGen-1M: A large-scale dataset for text-to-video generation

Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, and Hao Li. VidGen-1M: A large-scale dataset for text-to-video generation. arXiv preprint arXiv:2408.02629, 2024. 8, 41, 43

work page arXiv 2024
[80]

Mean-shift distillation for diffusion mode seeking

Vikas Thamizharasan, Nikitas Chatzis, Iliyan Georgiev, Matthew Fisher, Difan Liu, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Luk ´aˇc. Mean-shift distillation for diffusion mode seeking. InInternational Conference on Machine Learning (ICML), 2025. 46

work page 2025

Showing first 80 references.

[1] [1]

Training data attribution via approximate unrolled differentiation

Juhan Bae, Wu Lin, Jonathan Lorraine, and Roger Grosse. Training data attribution via approximate unrolled differentiation. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2405.12186. 44

work page arXiv 2024

[2] [2]

Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B. Lindell. 4D-fy: Text-to-4d generation using hybrid score distillation sampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7996–8006,

work page

[3] [3]

ACM Trans

Benedikt Bitterli, Chris Wyman, Matt Pharr, Peter Shirley, Aaron Lefohn, and Wojciech Jarosz. Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting.ACM Transactions on Graphics (Proc. SIGGRAPH), 39(4):148:1–148:16, 2020. doi: 10.1145/3386569.3392481. 46

work page doi:10.1145/3386569.3392481 2020

[4] [4]

Montrage: Monitoring training for attribution of generative diffusion models

Jonathan Brokman, Omer Hofman, Roman Vainshtein, Amit Giloni, Toshiya Shimizu, Inder- jeet Singh, Oren Rachmil, Alon Zolfi, Asaf Shabtai, Yuki Unno, et al. Montrage: Monitoring training for attribution of generative diffusion models. InEuropean Conference on Computer Vision, pages 1–17. Springer, 2024. 44

work page 2024

[5] [5]

Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. OpenAI technical report.https://openai. com/research/video-generation-models-as-world-simulators, 2024. 43

work page 2024

[6] [6]

Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation

Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023. 43

work page 2023

[7] [7]

Diffusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11):1684–1704, 2025. 44

work page 2025

[8] [8]

Perception prioritized training of diffusion models

Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, and Sungroh Yoon. Perception prioritized training of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. URLhttps://arxiv.org/ abs/2204.00227. 43

work page arXiv 2022

[9] [9]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013. 33

work page 2013

[10] [10]

FlashTex: Fast relightable mesh texturing with LightCon- trolNet

Kangle Deng, Timothy Omernick, Alexander Weiss, Deva Ramanan, Jun-Yan Zhu, Tinghui Zhou, and Maneesh Agrawala. FlashTex: Fast relightable mesh texturing with LightCon- trolNet. InEuropean Conference on Computer Vision, pages 90–107. Springer, 2024. 3, 44

work page 2024

[11] [11]

Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons. Stable audio open. InICASSP 2025 – 2025 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1–5. IEEE, 2025. 42

work page 2025

[12] [12]

The journey, not the destination: How data guides diffusion models.arXiv preprint arXiv:2312.06205, 2023

Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, and Aleksander Madry. The journey, not the destination: How data guides diffusion models.arXiv preprint arXiv:2312.06205, 2023. 44

work page arXiv 2023

[13] [13]

threestudio: A unified framework for 3D content generation.https://github.com/ threestudio-project/threestudio, 2023

Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram V oleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3D content generation.https://github.com/ threestudio-project/threestudio, 2023. 7, 17, 26, 47 10

work page 2023

[14] [14]

Efficient diffusion training via Min-SNR weighting strategy

Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via Min-SNR weighting strategy. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7441–7451, 2023. 43, 46

work page 2023

[15] [15]

CLIPScore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, 2021. 7

work page 2021

[16] [16]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020. 2, 42, 43

work page 2020

[17] [17]

simple diffusion: End-to-end diffu- sion for high resolution images

Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffu- sion for high resolution images. InInternational Conference on Machine Learning, volume 202, pages 13213–13232. PMLR, 2023. 43

work page 2023

[18] [18]

JacNet: Learning functions with structured jacobian

Safwan Hossain and Jonathan Lorraine. JacNet: Learning functions with structured jacobian. InFirst Workshop on Invertible Neural Nets and Normalizing Flows (INNF), ICML, 2019. 47

work page 2019

[19] [19]

Planning with Diffusion for Flexible Behavior Synthesis

Michael Janner, Yilun Du, Joshua B. Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthesis. InInternational Conference on Machine Learning, volume 162, pages 9902–9915. PMLR, 2022. URLhttps://arxiv.org/abs/2205.09991. 44

work page internal anchor Pith review Pith/arXiv arXiv 2022

[20] [20]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, volume 35, pages 26565–26577, 2022. 43

work page 2022

[21] [21]

org/abs/2312.02696

Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. URLhttps: //arxiv.org/abs/2312.02696. 43

work page arXiv 2024

[22] [22]

Denoising diffusion restora- tion models

Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restora- tion models. InAdvances in Neural Information Processing Systems, volume 35, pages 23593–23606, 2022. 44

work page 2022

[23] [23]

Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation

Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, and Il-Chul Moon. Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. InInternational Conference on Machine Learning, volume 162, pages 11201–11228. PMLR, 2022. URLhttps://arxiv.org/abs/2106.05527. 43

work page arXiv 2022

[24] [24]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInter- national Conference on Learning Representations, 2015. 26

work page 2015

[25] [25]

Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, volume 34, 2021. URL https://arxiv.org/abs/2107.00630. 2, 3, 19, 43, 46

work page arXiv 2021

[26] [26]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational conference on machine learning, pages 1885–1894. PMLR, 2017. 4, 22, 44

work page 2017

[27] [27]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Curran Associates, Inc., 2012. URLhttps://proceedings.neurips.cc/paper_files/ paper/2012...

work page 2012

[28] [28]

DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models

Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models. InInternational Conference on Learning Representations, 2024. URLhttps://arxiv.org/abs/2310.00902. 44 11

work page arXiv 2024

[29] [29]

Magic3D: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 7, 43

work page 2023

[30] [30]

Generalized resampled importance sampling: Foundations of ReSTIR.ACM Trans- actions on Graphics (Proc

Daqi Lin, Markus Kettunen, Benedikt Bitterli, Jacopo Pantaleoni, Cem Yuksel, and Chris Wyman. Generalized resampled importance sampling: Foundations of ReSTIR.ACM Trans- actions on Graphics (Proc. SIGGRAPH), 41(4):75:1–75:23, 2022. doi: 10.1145/3528223. 3530158. 46

work page doi:10.1145/3528223 2022

[31] [31]

Diffusion attribution score: Evalu- ating training data influence in diffusion models

Jinxu Lin, Linwei Tao, Minjing Dong, and Chang Xu. Diffusion attribution score: Evalu- ating training data influence in diffusion models. InInternational Conference on Learning Representations, 2025. URLhttps://arxiv.org/abs/2410.18639. 44

work page arXiv 2025

[32] [32]

Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models

Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your Gaussians: Text-to-4d with dynamic 3D Gaussians and composed diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024. 44

work page 2024

[33] [33]

Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024

Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, and Yueqi Duan. Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024. 3, 44

work page arXiv 2024

[34] [34]

DARTS: Differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019. URLhttps: //openreview.net/forum?id=S1eYHoC5FX. 47

work page 2019

[35] [35]

PhD thesis, University of Toronto, 2024

Jonathan Lorraine.Scalable Nested Optimization for Deep Learning. PhD thesis, University of Toronto, 2024. 44, 47

work page 2024

[36] [36]

Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,

Jonathan Lorraine, Nihesh Anderson, Chansoo Lee, Quentin De Laroussilhe, and Mehadi Hassen. Task selection for automl system evaluation.arXiv preprint arXiv:2208.12754,

work page arXiv

[37] [37]

Lyapunov exponents for diversity in differentiable games

Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, and Jakob Fo- erster. Lyapunov exponents for diversity in differentiable games. InInternational Conference on Autonomous Agents and Multiagent Systems, pages 842–852, 2022. 47

work page 2022

[38] [38]

ATT3D: Amortized text-to-3d object synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, and James Lucas. ATT3D: Amortized text-to-3d object synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17946–17956, 2023. 44, 45, 46, 47

work page 2023

[39] [39]

Lorraine, David Acuna, Paul Vicol, and David Duvenaud

Jonathan P. Lorraine, David Acuna, Paul Vicol, and David Duvenaud. Complex momentum for optimization in games. InInternational Conference on Artificial Intelligence and Statis- tics, volume 151, pages 7742–7765. PMLR, 2022. 47

work page 2022

[40] [40]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461– 11471, 2022. 44

work page 2022

[41] [41]

Artem Lukoianov, Haitz S ´aez de Oc´ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M. Solomon. Score distillation via reparametrized DDIM. InAdvances in Neural Information Processing Systems, volume 37, pages 26011– 26044, 2024. 43, 45, 46

work page 2024

[42] [42]

Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models

Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, and Zhihua Zhang. Diff-Instruct: A universal approach for transferring knowledge from pre-trained diffusion models. InAdvances in Neural Information Processing Systems, volume 36, 2023. 44

work page 2023

[43] [43]

Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation

Zhiyuan Ma, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, and Lei Zhang. Scale- Dreamer: Scalable text-to-3d synthesis with asynchronous score distillation. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 43 12

work page 2024

[44] [44]

Gradient-based hyperparameter op- timization through reversible learning

Dougal Maclaurin, David Duvenaud, and Ryan Adams. Gradient-based hyperparameter op- timization through reversible learning. InInternational Conference on Machine Learning, volume 37, pages 2113–2122. PMLR, 2015. 44, 47

work page 2015

[45] [45]

Jacobs, Alexei A

David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. InAdvances in Neural Information Processing Systems, volume 37, 2024. URL https://arxiv.org/abs/2406.09417. 43

work page arXiv 2024

[46] [46]

Improving hyperparameter optimization with checkpointed model weights

Nikhil Mehta, Jonathan Lorraine, Steve Masson, Ramanathan Arunachalam, Zaid Pervaiz Bhat, James Lucas, and Arun George Zachariah. Improving hyperparameter optimization with checkpointed model weights. InEuropean Conference on Computer Vision Workshop on Efficient Deep Learning for Foundation Models (EFM), pages 75–96, 2024. doi: 10.1007/ 978-3-031-91979-4 8. 47

work page 2024

[47] [47]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2022. URLhttps://arxiv.org/ abs/2108.01073. 44

work page internal anchor Pith review Pith/arXiv arXiv 2022

[48] [48]

Influence functions for scalable data attribution in diffu- sion models

Bruno Kacper Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, and Richard E Turner. Influence functions for scalable data attribution in diffu- sion models. InThe Thirteenth International Conference on Learning Representations, 2025. 44

work page 2025

[49] [49]

T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4296–4304, 2024. 44

work page 2024

[50] [50]

Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graph- ics primitives with a multiresolution hash encoding.ACM Transactions on Graphics (Proc. SIGGRAPH), 41(4):102:1–102:15, 2022. doi: 10.1145/3528223.3530127. 26, 45, 47

work page doi:10.1145/3528223.3530127 2022

[51] [51]

SwiftBrush: One-step text-to-image diffusion model with variational score distillation

Thuan Hoang Nguyen and Anh Tran. SwiftBrush: One-step text-to-image diffusion model with variational score distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7807–7816, 2024. 44

work page 2024

[52] [52]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, volume 139, pages 8162–8171. PMLR, 2021. 2, 18, 43

work page 2021

[53] [53]

NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026

Weili Nie, Julius Berner, Chao Liu, and Arash Vahdat. NVIDIA FastGen: Fast generation from diffusion models.https://github.com/NVlabs/FastGen, 2026. 8, 17

work page 2026

[54] [54]

Art B. Owen. Monte carlo theory, methods and examples.https://artowen.su.domains/ mc/, 2013. 31, 43

work page 2013

[55] [55]

Trak: Attributing model behavior at scale

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale. InInternational Conference on Machine Learning, pages 27074–27113. PMLR, 2023. 4, 23, 41, 44

work page 2023

[56] [56]

Hyperparameter optimization with approximate gradient

Fabian Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning, volume 48, pages 737–746. PMLR, 2016. 44, 47

work page 2016

[57] [57]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceed- ings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023. 8, 37, 43, 50

work page 2023

[58] [58]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. DreamFusion: Text-to-3d using 2D diffusion. InInternational Conference on Learning Representations, 2023. URL https://arxiv.org/abs/2209.14988. 3, 7, 43 13

work page internal anchor Pith review Pith/arXiv arXiv 2023

[59] [59]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. InAdvances in Neural Information Processing Systems, volume 33, pages 19920–19930, 2020. 4, 23, 44

work page 2020

[60] [60]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInter- national Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 27

work page 2021

[61] [61]

Dreamgaussian4d: Genera- tive 4d gaussian splatting.arXiv preprint arXiv:2312.17142,

Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, and Ziwei Liu. DreamGaussian4D: Generative 4D gaussian splatting.arXiv preprint arXiv:2312.17142,

work page arXiv

[62] [62]

Input convex gradient net- works

Jack Richter-Powell, Jonathan Lorraine, and Brandon Amos. Input convex gradient net- works. InAdvances in Neural Information Processing Systems Optimal Transport and Ma- chine Learning Workshop, 2021. 47

work page 2021

[63] [63]

Score distillation sampling for audio: Source separation, synthesis, and beyond

Jessie Richter-Powell, Antonio Torralba, and Jonathan Lorraine. Score distillation sampling for audio: Source separation, synthesis, and beyond. arXiv preprint arXiv:2505.04621, 2025. Presented at the ICML 2025 AI Heard That! Workshop on Machine Learning for Audio. 3, 44, 47

work page arXiv 2025

[64] [64]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Om- mer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695,

work page

[65] [65]

Rubinstein and Dirk P

Reuven Y . Rubinstein and Dirk P. Kroese.Simulation and the Monte Carlo Method. John Wiley & Sons, 3 edition, 2016. 2, 18, 43

work page 2016

[66] [66]

Align your flow: Scaling continuous-time flow map distillation.arXiv preprint arXiv:2506.14603, 2025

Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align your flow: Scaling continuous- time flow map distillation. InAdvances in Neural Information Processing Systems, 2025. URLhttps://arxiv.org/abs/2506.14603. 44

work page arXiv 2025

[67] [67]

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv. org/abs/2202.00512. 43

work page internal anchor Pith review Pith/arXiv arXiv 2022

[68] [68]

Multistep distilla- tion of diffusion models via moment matching

Tim Salimans, Thomas Mensink, Jonathan Heek, and Emiel Hoogeboom. Multistep distilla- tion of diffusion models via moment matching. InAdvances in Neural Information Processing Systems, volume 37, 2024. URLhttps://arxiv.org/abs/2406.04103. 43, 44

work page arXiv 2024

[69] [69]

Adversarial diffusion distillation

Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InEuropean Conference on Computer Vision. Springer, 2024. URLhttps: //arxiv.org/abs/2311.17042. 44

work page arXiv 2024

[70] [70]

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. MVDream: Multi-view diffusion for 3D generation. InInternational Conference on Learning Represen- tations, 2024. URLhttps://arxiv.org/abs/2308.16512. 43, 47

work page internal anchor Pith review Pith/arXiv arXiv 2024

[71] [71]

Deep un- supervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep un- supervised learning using nonequilibrium thermodynamics. InInternational Conference on Machine Learning, volume 37, pages 2256–2265. PMLR, 2015. 42

work page 2015

[72] [72]

Improved Techniques for Training Consistency Models

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In International Conference on Learning Representations, 2024. URLhttps://arxiv.org/ abs/2310.14189. 44

work page internal anchor Pith review arXiv 2024

[73] [73]

Generative modeling by estimating gradients of the data dis- tribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data dis- tribution. InAdvances in Neural Information Processing Systems, volume 32, pages 11918– 11930, 2019. 42 14

work page 2019

[74] [74]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021. URLhttps://arxiv. org/abs/2011.13456. 42

work page internal anchor Pith review Pith/arXiv arXiv 2021

[75] [75]

arXiv preprint arXiv:2111.08005 , year=

Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in med- ical imaging with score-based generative models. InInternational Conference on Learning Representations, 2022. URLhttps://arxiv.org/abs/2111.08005. 44

work page arXiv 2022

[76] [76]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, volume 202, pages 32211–32252. PMLR,

work page

[77] [77]

URLhttps://arxiv.org/abs/2303.01469. 44

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

Multi-student diffusion distillation for better one-step generators.arXiv preprint arXiv:2410.23274, 2024

Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, and James Lucas. Multi-student diffusion distillation for better one-step generators.arXiv preprint arXiv:2410.23274, 2024. 44, 46, 47

work page arXiv 2024

[79] [79]

VidGen-1M: A large-scale dataset for text-to-video generation

Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, and Hao Li. VidGen-1M: A large-scale dataset for text-to-video generation. arXiv preprint arXiv:2408.02629, 2024. 8, 41, 43

work page arXiv 2024

[80] [80]

Mean-shift distillation for diffusion mode seeking

Vikas Thamizharasan, Nikitas Chatzis, Iliyan Georgiev, Matthew Fisher, Difan Liu, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Luk ´aˇc. Mean-shift distillation for diffusion mode seeking. InInternational Conference on Machine Learning (ICML), 2025. 46

work page 2025