pith. sign in

arxiv: 2605.18069 · v1 · pith:OXHPBJSZnew · submitted 2026-05-18 · 📊 stat.ML · cs.LG· math.PR· math.ST· stat.TH

Wasserstein bounds for denoising diffusion probabilistic models via the F\"ollmer process

Pith reviewed 2026-05-20 00:33 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PRmath.STstat.TH
keywords Wasserstein boundsdenoising diffusion probabilistic modelsDDPMFöllmer processsampling error boundsLipschitz scorelog-concave distributionsvariance schedules
0
0 comments X

The pith

DDPM sampling error in Wasserstein distance is bounded optimally in dimension and steps by discretizing the Föllmer process under Lipschitz score conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes sharp upper bounds on the 2-Wasserstein distance between the output of a denoising diffusion probabilistic model and the target distribution. Under general Lipschitz-type conditions on the score function and for broad variance schedules including the cosine schedule, these bounds scale optimally with both data dimension and the number of discretization steps. The analysis replaces the usual reverse Ornstein-Uhlenbeck viewpoint with a discretization of the Föllmer process, which also yields logarithmic Sobolev inequalities and recovers earlier sharp bounds from the literature. For log-concave targets the same optimality holds even without a quadratic transportation cost inequality on the target itself.

Core claim

Viewing the DDPM sampler as a discretization of the Föllmer process rather than the conventional reverse Ornstein-Uhlenbeck process, the paper derives sharp 2-Wasserstein error bounds that are optimal in both dimension and number of steps under general Lipschitz-type conditions on the score function. These conditions encompass those commonly imposed on learned scores and imply a logarithmic Sobolev inequality together with a quadratic transportation cost inequality for the DDPM. Consequently, optimal Wasserstein bounds (up to a logarithmic factor) follow from existing sharp KL-divergence bounds under geometric-type variance schedules, and the optimal bound remains attainable for general log-

What carries the argument

Discretization of the Föllmer process, which converts the continuous-time sampling dynamics into a discrete DDPM update while preserving Wasserstein contraction properties under Lipschitz score assumptions.

If this is right

  • Sharp upper bounds on 2-Wasserstein sampling error that are optimal in dimension and number of steps for broad variance schedules including cosine.
  • Recovery of several previously obtained sharp error bounds in the literature.
  • Lipschitz conditions on the score imply a logarithmic Sobolev inequality and quadratic transportation cost inequality for the DDPM.
  • Optimal Wasserstein bound follows from sharp KL-divergence bounds up to a logarithmic factor under geometric schedules.
  • Optimal Wasserstein error remains attainable for general log-concave targets without requiring a quadratic transportation cost inequality on the target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Föllmer-process viewpoint may extend to analyze convergence of other score-based generative samplers beyond standard DDPM.
  • Cosine and similar schedules may preserve dimension-independent rates more robustly than linear schedules in high-dimensional sampling.
  • Relaxing Lipschitz assumptions while retaining near-optimal rates could be tested on scores learned from image or text data.

Load-bearing premise

The score function satisfies general Lipschitz-type conditions.

What would settle it

A concrete high-dimensional target distribution whose score is Lipschitz yet produces a Wasserstein error that grows with dimension or requires substantially more steps than the derived bound would contradict the optimality claim.

read the original abstract

This paper studies sampling error bounds for denoising diffusion probabilistic models (DDPMs) in the 2-Wasserstein distance. Our contributions are threefold. (i) Under general Lipschitz-type conditions on the score function and for a broad class of variance schedules, including the cosine schedule, we establish sharp upper bounds that are optimal in both the dimension and the number of steps, and recover several sharp error bounds previously obtained in the literature. (ii) We prove that the same Lipschitz-type conditions, which encompass those commonly imposed on the (learned) score, imply a logarithmic Sobolev inequality and hence a quadratic transportation cost inequality for the DDPM. As a consequence, in settings covered by existing work, an optimal Wasserstein bound, up to a logarithmic factor, follows from the recently obtained sharp error bound in the Kullback-Leibler divergence under geometric-type variance schedules. (iii) We show that for general log-concave target distributions, the optimal Wasserstein error bound remains attainable even without a quadratic transportation cost inequality for the target. Our analysis is based on viewing the DDPM sampler as a discretization of the F\"ollmer process rather than the conventional reverse Ornstein-Uhlenbeck process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper establishes sharp 2-Wasserstein error bounds for DDPM sampling by interpreting the sampler as a discretization of the Föllmer process. Under general Lipschitz-type conditions on the score, it derives dimension- and step-optimal upper bounds for a broad class of variance schedules (including cosine), recovers prior sharp results, shows that the same conditions imply LSI and quadratic transportation-cost inequalities, and extends the optimal Wasserstein bound to general log-concave targets even without a T2 inequality for the target.

Significance. If the central derivations hold, the work supplies a unified, process-level framework that yields optimal Wasserstein rates without hidden dimension dependence for standard schedules and recovers existing sharp bounds as special cases. The Föllmer-process perspective and the explicit control of discretization error via dimension-uniform Gronwall estimates are technically valuable contributions to the analysis of diffusion samplers.

major comments (2)
  1. [§3.2, Theorem 3.1] §3.2, Theorem 3.1: the claimed dimension-free constant in the discretization error bound for the cosine schedule relies on a specific Gronwall-type estimate; the proof sketch should explicitly track the dependence on the Lipschitz constant L and the schedule parameter to confirm uniformity in d.
  2. [§4.1, Eq. (12)] §4.1, Eq. (12): the passage from the Lipschitz score assumption to the LSI constant appears to use a standard Bakry-Émery argument, but the resulting transportation-cost inequality constant should be stated explicitly so that the subsequent Wasserstein bound can be compared directly with the KL-based bound of prior work.
minor comments (2)
  1. [Abstract] The abstract states that the bounds are 'optimal in both the dimension and the number of steps' but does not indicate the precise dependence on the number of steps N; a short parenthetical remark would improve readability.
  2. [§2 and §5] Notation for the variance schedule (β_t versus the cosine form) is introduced in §2 but reused without redefinition in §5; a single consolidated definition would prevent minor confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive evaluation, and constructive suggestions. We address each major comment below and will incorporate the requested clarifications in the revised manuscript.

read point-by-point responses
  1. Referee: [§3.2, Theorem 3.1] the claimed dimension-free constant in the discretization error bound for the cosine schedule relies on a specific Gronwall-type estimate; the proof sketch should explicitly track the dependence on the Lipschitz constant L and the schedule parameter to confirm uniformity in d.

    Authors: We agree that an explicit tracking of the dependence on the Lipschitz constant L and the schedule parameter in the Gronwall estimate will strengthen the presentation and confirm the claimed dimension-free property. In the revised version we will expand the proof sketch of Theorem 3.1 in §3.2 to display these dependencies step by step, showing that the resulting constant remains independent of dimension d for the cosine schedule (and more generally for the class of schedules considered). revision: yes

  2. Referee: [§4.1, Eq. (12)] the passage from the Lipschitz score assumption to the LSI constant appears to use a standard Bakry-Émery argument, but the resulting transportation-cost inequality constant should be stated explicitly so that the subsequent Wasserstein bound can be compared directly with the KL-based bound of prior work.

    Authors: We thank the referee for this observation. The derivation in §4.1 indeed applies the Bakry-Émery criterion to the Lipschitz score assumption to obtain the logarithmic Sobolev inequality, from which the quadratic transportation-cost inequality follows with an explicit constant that depends on the Lipschitz constant and the variance schedule. In the revised manuscript we will state this constant explicitly after Eq. (12), enabling a direct comparison with the constants appearing in the KL-based Wasserstein bounds of prior work. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation views DDPM sampling as discretization of the Föllmer process under external Lipschitz-type score conditions that imply LSI and quadratic transportation inequalities via standard implications. Discretization error is bounded by Gronwall-type estimates that are uniform in dimension for the given variance schedules (including cosine). Optimality claims recover prior literature results without reducing any prediction or bound to a fitted parameter or self-citation chain; the central Wasserstein bounds follow from the stated assumptions and external stochastic-process tools rather than internal redefinition or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on Lipschitz-type conditions on the score and on properties of the Föllmer process; these are standard domain assumptions rather than new free parameters or invented entities.

axioms (1)
  • domain assumption Score function satisfies general Lipschitz-type conditions
    Invoked to obtain sharp bounds and to imply logarithmic Sobolev inequality; stated in contribution (i) and (ii).

pith-pipeline@v0.9.0 · 5748 in / 1243 out tokens · 42341 ms · 2026-05-20T00:33:33.786206+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

  1. [1]

    & Semola, D

    Ambrosio, L., Bru ´e, E. & Semola, D. (2024).Lectures on optimal transport. Springer, 2nd edn

  2. [2]

    & Dalalyan, A

    Arsenyan, V ., Vardanyan, E. & Dalalyan, A. S. (2025). Assessing the quality of denoising diffusion models in Wasserstein distance: noisy score and optimal bounds. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  3. [3]

    & Ledoux, M

    Bakry, D., Gentil, I. & Ledoux, M. (2014).Analysis and geometry of Markov diffusion operators. Springer

  4. [4]

    Ball, K., Carlen, E. A. & Lieb, E. H. (1994). Sharp uniform convexity and smoothness inequalities for trace norms.Invent. Math.115, 463–482

  5. [5]

    & Deligiannidis, G

    Benton, J., De Bortoli, V ., Doucet, A. & Deligiannidis, G. (2024). Nearlyd-linear convergence bounds for diffu- sion models via stochastic localization. InThe Twelfth International Conference on Learning Representations. 41

  6. [6]

    & Bach, F

    Beyler, E. & Bach, F. (2025). Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in Wasserstein distance. Preprint, arXiv:2508.03210

  7. [7]

    & Sabanis, S

    Bruno, S. & Sabanis, S. (2025). Wasserstein convergence of score-based generative models under semiconvexity and discontinuous gradients.Transactions on Machine Learning Research

  8. [8]

    Cao, H., Tan, C., Gao, Z., Xu, Y ., Chen, G., Heng, P.-A. & Li, S. Z. (2024). A survey on generative diffusion models.IEEE Transactions on Knowledge and Data Engineering36, 2814–2830

  9. [9]

    & Kree, P

    Carlen, E. & Kree, P. (1991).L p estimates on iterated stochastic integrals.Ann. Probab.19, 354–368

  10. [10]

    Chen, H., Lee, H. & Lu, J. (2023). Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scar- lett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PML...

  11. [11]

    & Niles-Weed, J

    Chen, H.-B. & Niles-Weed, J. (2022). Asymptotics of smoothed Wasserstein distances.Potential Anal.56, 571–595

  12. [12]

    & Zhang, A

    Chen, S., Chewi, S., Li, J., Li, Y ., Salim, A. & Zhang, A. (2023). Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. InThe Eleventh International Conference on Learning Representations

  13. [13]

    & Gentiloni Silveri, M

    Conforti, G., Durmus, A. & Gentiloni Silveri, M. (2025). KL convergence guarantees for score diffusion models under minimal data assumptions.SIAM J. Math. Data Sci.7, 86–109

  14. [14]

    & Pal, S

    Conforti, G., Lacker, D. & Pal, S. (2025). Projected Langevin dynamics and a gradient flow for entropic optimal transport. To appear in J. Eur. Math. Soc. (JEMS)

  15. [15]

    Croitoru, F.-A., Hondru, V ., Ionescu, R. T. & Shah, M. (2023). Diffusion models in vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 10850–10869

  16. [16]

    De Bortoli, V . (2022). Convergence of denoising diffusion models under the manifold hypothesis.Transactions on Machine Learning Research

  17. [17]

    & Lee, J

    Eldan, R. & Lee, J. R. (2018). Regularization under diffusion and anticoncentration of the information content. Duke Math. J.167, 969–993

  18. [18]

    & Shenfeld, Y

    Eldan, R., Lehec, J. & Shenfeld, Y . (2020). Stability of the logarithmic Sobolev inequality via the F ¨ollmer process.Ann. Inst. Henri Poincar ´e Probab. Stat.56, 2253–2269

  19. [19]

    & Zhai, A

    Eldan, R., Mikulincer, D. & Zhai, A. (2020). The CLT in high dimensions: quantitative bounds via martingale embedding.Ann. Probab.48, 2494–2524

  20. [20]

    & Koike, Y

    Fang, X. & Koike, Y . (2024). Sharp high-dimensional central limit theorems for log-concave distributions.Ann. Inst. Henri Poincar´e Probab. Stat.60, 2129–2156

  21. [21]

    Gao, X., Nguyen, H. M. & Zhu, L. (2025). Wasserstein convergence guarantees for a general class of score-based generative models.J. Mach. Learn. Res.26, 1–54

  22. [22]

    & Ocello, A

    Gentiloni-Silveri, M. & Ocello, A. (2025). Beyond log-concavity and score regularity: Improved convergence bounds for score-based generative models inW 2-distance. In A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff & J. Zhu, eds.,Proceedings of the 42nd International Conference on Machine Learning, vol. 267 ofProceedi...

  23. [23]

    Givens, C. R. & Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions.Michigan Math. J.31, 231–240

  24. [24]

    Guan, Q. (2024). A note on Bourgain’s slicing problem. Preprint, arXiv:2412.09075. 42

  25. [25]

    & Ding, X

    Guo, Z., Lang, J., Huang, S., Gao, Y . & Ding, X. (2025). A comprehensive review on noise control of dif- fusion model. In2025 IEEE 6th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT). IEEE, pp. 1587–1593

  26. [26]

    & Abbeel, P

    Ho, J., Jain, A. & Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, vol. 33. pp. 6840–6851

  27. [27]

    Horn, R. A. & Johnson, C. R. (2013).Matrix analysis. Cambridge University Press, 2nd edn

  28. [28]

    Huang, D., Niles-Weed, J., Tropp, J. A. & Ward, R. (2022). Matrix concentration for products.Found. Comput. Math.22, 1767–1799

  29. [29]

    & Chen, Y

    Huang, Z., Wei, Y . & Chen, Y . (2026). Denoising diffusion probabilistic models are optimally adaptive to un- known low dimensionality.Math. Oper. Res. (forthcoming)

  30. [30]

    & Zhang, T

    Jain, N. & Zhang, T. (2026). A sharp KL convergence analysis for diffusion models under minimal assumptions. InThe Fourteenth International Conference on Learning Representations

  31. [31]

    Jiao, Y ., Zhou, Y . & Li, G. (2025). Optimal convergence analysis of DDPM for general distributions. Preprint, arXiv:2510.27562

  32. [32]

    & Sztencel, R

    Kallenberg, O. & Sztencel, R. (1991). Some dimension-free features of vector-valued martingales.Probab. Theory Related Fields88, 215–247

  33. [33]

    & Shreve, S

    Karatzas, I. & Shreve, S. E. (1998).Brownian motion and stochastic calculus. Springer, 2nd edn

  34. [34]

    & Lehec, J

    Klartag, B. & Lehec, J. (2025). Isoperimetric inequalities in high-dimensional convex sets.Bull. Amer. Math. Soc. (N.S.)62, 575–642

  35. [35]

    & Lehec, J

    Klartag, B. & Lehec, J. (2025). Thin-shell bounds via parallel coupling. Preprint, arXiv:2507.15495v2

  36. [36]

    & Putterman, E

    Klartag, B. & Putterman, E. (2023). Spectral monotonicity under Gaussian convolution.Ann. Fac. Sci. Toulouse Math. (6)32, 939–967

  37. [37]

    Koike, Y . (2026). A note on connections between the F ¨ollmer process and the denoising diffusion probabilistic model. Preprint

  38. [38]

    & Lederer, J

    Kremling, G., Iafrate, F., Taheri, M. & Lederer, J. (2025). Non-asymptotic error bounds for probability flow ODEs under weak log-concavity. Preprint, arXiv:2510.17608

  39. [39]

    & Tan, Y

    Lee, H., Lu, J. & Tan, Y . (2023). Convergence of score-based generative modeling for general data distributions. In S. Agrawal & F. Orabona, eds.,Proceedings of The 34th International Conference on Algorithmic Learning Theory, vol. 201 ofProceedings of Machine Learning Research. PMLR, pp. 946–985

  40. [40]

    & Chi, Y

    Li, G., Wei, Y ., Chen, Y . & Chi, Y . (2024). Towards non-asymptotic convergence for diffusion-based generative models. InThe Twelfth International Conference on Learning Representations

  41. [41]

    & Yan, Y

    Li, G. & Yan, Y . (2024). Adapting to unknown low-dimensional structures in score-based diffusion models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems

  42. [42]

    & Yan, Y

    Li, G. & Yan, Y . (2025).O(d/T)convergence theory for diffusion probabilistic models under minimal assump- tions.J. Mach. Learn. Res.26, 1–55

  43. [43]

    & Chen, Y

    Liang, J., Huang, Z. & Chen, Y . (2025). Low-dimensional adaptation of diffusion models: Convergence in total variation. In N. Haghtalab & A. Moitra, eds.,Proceedings of Thirty Eighth Conference on Learning Theory, vol. 291 ofProceedings of Machine Learning Research. PMLR, pp. 3723–3729

  44. [44]

    & Shenfeld, Y

    Mikulincer, D. & Shenfeld, Y . (2023). On the Lipschitz properties of transportation along heat flows. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020- 2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 269–290. 43

  45. [45]

    Neeman, J. (2022). Lipschitz changes of variables via heat flow. Preprint, arXiv:2201.03403

  46. [46]

    Nichol, A. Q. & Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In M. Meila & T. Zhang, eds.,Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Research. PMLR, pp. 8162–8171

  47. [47]

    Niculescu, C. P. & Persson, L.-E. (2018).Convex functions and their applications: A contemporary approach. Springer, Switzerland, 2nd edn

  48. [48]

    & Suzuki, T

    Oko, K., Akiyama, S. & Suzuki, T. (2023). Diffusion models are minimax optimal distribution estimators. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scarlett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PMLR, pp. 26517– 26582

  49. [49]

    & Villani, C

    Otto, F. & Villani, C. (2000). Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal.173, 361–400

  50. [50]

    & Werner, F

    Pfarr, E., Timofte, R. & Werner, F. (2026). Analyzing the error of generative diffusion models: From Euler- Maruyama to higher-order schemes. Preprint, arXiv:2601.18425

  51. [51]

    Polyanskiy, Y . & Wu, Y . (2025).Information theory: From coding to learning. Cambridge University Press

  52. [52]

    & Yor, M

    Revuz, D. & Yor, M. (1999).Continuous martingales and Brownian motion. Springer, 3rd edn

  53. [53]

    Ricard, E. & Xu, Q. (2016). A noncommutative martingale convexity inequality.Ann. Probab.44, 867–882

  54. [54]

    Santos, J. E. & Lin, Y . T. (2023). Using Ornstein–Uhlenbeck process to understand Denoising Diffusion Proba- bilistic Model and its noise schedules. Preprint, arXiv:2311.17673

  55. [55]

    & Wellner, J

    Saumard, A. & Wellner, J. A. (2014). Log-concavity and strong log-concavity: A review.Stat. Surv.8, 45–114

  56. [56]

    & Sobukawa, T

    Seidler, J. & Sobukawa, T. (2003). Exponential integrability of stochastic convolutions.J. Lond. Math. Soc. (2) 67, 245–258

  57. [57]

    (2011).Convexity: an analytic viewpoint

    Simon, B. (2011).Convexity: an analytic viewpoint. Cambridge University Press, New York

  58. [58]

    & Ganguli, S

    Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In F. Bach & D. Blei, eds.,Proceedings of the 32nd International Conference on Machine Learning, vol. 37 ofProceedings of Machine Learning Research. PMLR, Lille, France, pp. 2256– 2265

  59. [59]

    P., Kumar, A., Ermon, S

    Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S. & Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations

  60. [60]

    St ´ephanovitch, A. (2025). Regularity of the score function in generative models. Preprint, arXiv:2506.19559

  61. [61]

    St ´ephanovitch, A. (2026). Lipschitz regularity in flow matching and diffusion models: sharp sampling rates and functional inequalities. Preprint, arXiv:2604.06065

  62. [62]

    & Lemaire, V

    Strasman, S., Ocello, A., Boyer, C., Le Corff, S. & Lemaire, V . (2025). An analysis of the noise schedule for score-based generative models.Transactions on Machine Learning Research

  63. [63]

    & Veraar, M

    van Neerven, J. & Veraar, M. (2022). Maximal inequalities for stochastic convolutions and pathwise uniform convergence of time discretisation schemes.Stoch. Partial Differ. Equ. Anal. Comput.10, 516–581

  64. [64]

    Vempala, S. S. & Wibisono, A. (2023). Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020-2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 381–438. 44

  65. [65]

    & Wang, Z

    Wang, X. & Wang, Z. (2026). Wasserstein bounds for generative diffusion models with Gaussian tail targets. Transactions on Machine Learning Research

  66. [66]

    & Vahdat, A

    Xiao, Z., Kreis, K. & Vahdat, A. (2022). Tackling the generative learning trilemma with denoising diffusion GANs. InInternational Conference on Learning Representations

  67. [67]

    & Yang, M.-H

    Yang, L., Zhang, Z., Song, Y ., Hong, S., Xu, R., Zhao, Y ., Zhang, W., Cui, B. & Yang, M.-H. (2023). Diffusion models: A comprehensive survey of methods and applications.ACM Computing Surveys56, 1–39

  68. [68]

    Yu, Y . & Yu, L. (2025). Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 45