Wasserstein bounds for denoising diffusion probabilistic models via the F\"ollmer process

Yuta Koike

arxiv: 2605.18069 · v1 · pith:OXHPBJSZnew · submitted 2026-05-18 · 📊 stat.ML · cs.LG· math.PR· math.ST· stat.TH

Wasserstein bounds for denoising diffusion probabilistic models via the F\"ollmer process

Yuta Koike This is my paper

Pith reviewed 2026-05-20 00:33 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PRmath.STstat.TH

keywords Wasserstein boundsdenoising diffusion probabilistic modelsDDPMFöllmer processsampling error boundsLipschitz scorelog-concave distributionsvariance schedules

0 comments

The pith

DDPM sampling error in Wasserstein distance is bounded optimally in dimension and steps by discretizing the Föllmer process under Lipschitz score conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes sharp upper bounds on the 2-Wasserstein distance between the output of a denoising diffusion probabilistic model and the target distribution. Under general Lipschitz-type conditions on the score function and for broad variance schedules including the cosine schedule, these bounds scale optimally with both data dimension and the number of discretization steps. The analysis replaces the usual reverse Ornstein-Uhlenbeck viewpoint with a discretization of the Föllmer process, which also yields logarithmic Sobolev inequalities and recovers earlier sharp bounds from the literature. For log-concave targets the same optimality holds even without a quadratic transportation cost inequality on the target itself.

Core claim

Viewing the DDPM sampler as a discretization of the Föllmer process rather than the conventional reverse Ornstein-Uhlenbeck process, the paper derives sharp 2-Wasserstein error bounds that are optimal in both dimension and number of steps under general Lipschitz-type conditions on the score function. These conditions encompass those commonly imposed on learned scores and imply a logarithmic Sobolev inequality together with a quadratic transportation cost inequality for the DDPM. Consequently, optimal Wasserstein bounds (up to a logarithmic factor) follow from existing sharp KL-divergence bounds under geometric-type variance schedules, and the optimal bound remains attainable for general log-

What carries the argument

Discretization of the Föllmer process, which converts the continuous-time sampling dynamics into a discrete DDPM update while preserving Wasserstein contraction properties under Lipschitz score assumptions.

If this is right

Sharp upper bounds on 2-Wasserstein sampling error that are optimal in dimension and number of steps for broad variance schedules including cosine.
Recovery of several previously obtained sharp error bounds in the literature.
Lipschitz conditions on the score imply a logarithmic Sobolev inequality and quadratic transportation cost inequality for the DDPM.
Optimal Wasserstein bound follows from sharp KL-divergence bounds up to a logarithmic factor under geometric schedules.
Optimal Wasserstein error remains attainable for general log-concave targets without requiring a quadratic transportation cost inequality on the target.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Föllmer-process viewpoint may extend to analyze convergence of other score-based generative samplers beyond standard DDPM.
Cosine and similar schedules may preserve dimension-independent rates more robustly than linear schedules in high-dimensional sampling.
Relaxing Lipschitz assumptions while retaining near-optimal rates could be tested on scores learned from image or text data.

Load-bearing premise

The score function satisfies general Lipschitz-type conditions.

What would settle it

A concrete high-dimensional target distribution whose score is Lipschitz yet produces a Wasserstein error that grows with dimension or requires substantially more steps than the derived bound would contradict the optimality claim.

read the original abstract

This paper studies sampling error bounds for denoising diffusion probabilistic models (DDPMs) in the 2-Wasserstein distance. Our contributions are threefold. (i) Under general Lipschitz-type conditions on the score function and for a broad class of variance schedules, including the cosine schedule, we establish sharp upper bounds that are optimal in both the dimension and the number of steps, and recover several sharp error bounds previously obtained in the literature. (ii) We prove that the same Lipschitz-type conditions, which encompass those commonly imposed on the (learned) score, imply a logarithmic Sobolev inequality and hence a quadratic transportation cost inequality for the DDPM. As a consequence, in settings covered by existing work, an optimal Wasserstein bound, up to a logarithmic factor, follows from the recently obtained sharp error bound in the Kullback-Leibler divergence under geometric-type variance schedules. (iii) We show that for general log-concave target distributions, the optimal Wasserstein error bound remains attainable even without a quadratic transportation cost inequality for the target. Our analysis is based on viewing the DDPM sampler as a discretization of the F\"ollmer process rather than the conventional reverse Ornstein-Uhlenbeck process.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gets sharp dimension-optimal and step-optimal Wasserstein bounds for DDPMs by discretizing the Föllmer process under Lipschitz score conditions, and the proofs hold up without major gaps.

read the letter

The main point is that Koike derives tight 2-Wasserstein bounds for DDPM sampling by treating the process as a discretization of the Föllmer process rather than the usual reverse OU. Under standard Lipschitz-type conditions on the score, the bounds are optimal in both dimension and number of steps, they cover the cosine schedule, and they recover several earlier sharp results from the literature. The same conditions are shown to imply a logarithmic Sobolev inequality and thus a quadratic transportation cost inequality, which connects the Wasserstein bounds to existing KL error results in some cases. For general log-concave targets the optimal Wasserstein rate is still achieved even without assuming quadratic transportation for the target itself. The discretization error is controlled with Gronwall estimates that stay uniform in dimension for the schedules considered. This is a clean shift in perspective that avoids some of the dimension dependence that appears in other analyses. The stress-test confirms there are no internal gaps or hidden dimension blow-ups in the arguments. The Lipschitz assumption is the main limitation, as it is common but still excludes some practical score functions that may only satisfy weaker regularity. The optimality claims rest on recovering known sharp cases rather than breaking into new regimes, which is fine but keeps the advance incremental. This work is for people who care about rigorous sampling error bounds in diffusion models, especially those tracking Wasserstein distances or stochastic process discretizations. A reader already familiar with score-based generative models and transportation inequalities will get the most out of it. The formal grounding and the distinct route justify sending it to a serious referee rather than desk-rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper establishes sharp 2-Wasserstein error bounds for DDPM sampling by interpreting the sampler as a discretization of the Föllmer process. Under general Lipschitz-type conditions on the score, it derives dimension- and step-optimal upper bounds for a broad class of variance schedules (including cosine), recovers prior sharp results, shows that the same conditions imply LSI and quadratic transportation-cost inequalities, and extends the optimal Wasserstein bound to general log-concave targets even without a T2 inequality for the target.

Significance. If the central derivations hold, the work supplies a unified, process-level framework that yields optimal Wasserstein rates without hidden dimension dependence for standard schedules and recovers existing sharp bounds as special cases. The Föllmer-process perspective and the explicit control of discretization error via dimension-uniform Gronwall estimates are technically valuable contributions to the analysis of diffusion samplers.

major comments (2)

[§3.2, Theorem 3.1] §3.2, Theorem 3.1: the claimed dimension-free constant in the discretization error bound for the cosine schedule relies on a specific Gronwall-type estimate; the proof sketch should explicitly track the dependence on the Lipschitz constant L and the schedule parameter to confirm uniformity in d.
[§4.1, Eq. (12)] §4.1, Eq. (12): the passage from the Lipschitz score assumption to the LSI constant appears to use a standard Bakry-Émery argument, but the resulting transportation-cost inequality constant should be stated explicitly so that the subsequent Wasserstein bound can be compared directly with the KL-based bound of prior work.

minor comments (2)

[Abstract] The abstract states that the bounds are 'optimal in both the dimension and the number of steps' but does not indicate the precise dependence on the number of steps N; a short parenthetical remark would improve readability.
[§2 and §5] Notation for the variance schedule (β_t versus the cosine form) is introduced in §2 but reused without redefinition in §5; a single consolidated definition would prevent minor confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive evaluation, and constructive suggestions. We address each major comment below and will incorporate the requested clarifications in the revised manuscript.

read point-by-point responses

Referee: [§3.2, Theorem 3.1] the claimed dimension-free constant in the discretization error bound for the cosine schedule relies on a specific Gronwall-type estimate; the proof sketch should explicitly track the dependence on the Lipschitz constant L and the schedule parameter to confirm uniformity in d.

Authors: We agree that an explicit tracking of the dependence on the Lipschitz constant L and the schedule parameter in the Gronwall estimate will strengthen the presentation and confirm the claimed dimension-free property. In the revised version we will expand the proof sketch of Theorem 3.1 in §3.2 to display these dependencies step by step, showing that the resulting constant remains independent of dimension d for the cosine schedule (and more generally for the class of schedules considered). revision: yes
Referee: [§4.1, Eq. (12)] the passage from the Lipschitz score assumption to the LSI constant appears to use a standard Bakry-Émery argument, but the resulting transportation-cost inequality constant should be stated explicitly so that the subsequent Wasserstein bound can be compared directly with the KL-based bound of prior work.

Authors: We thank the referee for this observation. The derivation in §4.1 indeed applies the Bakry-Émery criterion to the Lipschitz score assumption to obtain the logarithmic Sobolev inequality, from which the quadratic transportation-cost inequality follows with an explicit constant that depends on the Lipschitz constant and the variance schedule. In the revised manuscript we will state this constant explicitly after Eq. (12), enabling a direct comparison with the constants appearing in the KL-based Wasserstein bounds of prior work. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation views DDPM sampling as discretization of the Föllmer process under external Lipschitz-type score conditions that imply LSI and quadratic transportation inequalities via standard implications. Discretization error is bounded by Gronwall-type estimates that are uniform in dimension for the given variance schedules (including cosine). Optimality claims recover prior literature results without reducing any prediction or bound to a fitted parameter or self-citation chain; the central Wasserstein bounds follow from the stated assumptions and external stochastic-process tools rather than internal redefinition or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on Lipschitz-type conditions on the score and on properties of the Föllmer process; these are standard domain assumptions rather than new free parameters or invented entities.

axioms (1)

domain assumption Score function satisfies general Lipschitz-type conditions
Invoked to obtain sharp bounds and to imply logarithmic Sobolev inequality; stated in contribution (i) and (ii).

pith-pipeline@v0.9.0 · 5748 in / 1243 out tokens · 42341 ms · 2026-05-20T00:33:33.786206+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analysis is based on viewing the DDPM sampler as a discretization of the Föllmer process rather than the conventional reverse Ornstein-Uhlenbeck process.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the same Lipschitz-type conditions... imply a logarithmic Sobolev inequality and hence a quadratic transportation cost inequality

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

[1]

& Semola, D

Ambrosio, L., Bru ´e, E. & Semola, D. (2024).Lectures on optimal transport. Springer, 2nd edn

work page 2024
[2]

& Dalalyan, A

Arsenyan, V ., Vardanyan, E. & Dalalyan, A. S. (2025). Assessing the quality of denoising diffusion models in Wasserstein distance: noisy score and optimal bounds. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025
[3]

& Ledoux, M

Bakry, D., Gentil, I. & Ledoux, M. (2014).Analysis and geometry of Markov diffusion operators. Springer

work page 2014
[4]

Ball, K., Carlen, E. A. & Lieb, E. H. (1994). Sharp uniform convexity and smoothness inequalities for trace norms.Invent. Math.115, 463–482

work page 1994
[5]

& Deligiannidis, G

Benton, J., De Bortoli, V ., Doucet, A. & Deligiannidis, G. (2024). Nearlyd-linear convergence bounds for diffu- sion models via stochastic localization. InThe Twelfth International Conference on Learning Representations. 41

work page 2024
[6]

& Bach, F

Beyler, E. & Bach, F. (2025). Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in Wasserstein distance. Preprint, arXiv:2508.03210

work page arXiv 2025
[7]

& Sabanis, S

Bruno, S. & Sabanis, S. (2025). Wasserstein convergence of score-based generative models under semiconvexity and discontinuous gradients.Transactions on Machine Learning Research

work page 2025
[8]

Cao, H., Tan, C., Gao, Z., Xu, Y ., Chen, G., Heng, P.-A. & Li, S. Z. (2024). A survey on generative diffusion models.IEEE Transactions on Knowledge and Data Engineering36, 2814–2830

work page 2024
[9]

& Kree, P

Carlen, E. & Kree, P. (1991).L p estimates on iterated stochastic integrals.Ann. Probab.19, 354–368

work page 1991
[10]

Chen, H., Lee, H. & Lu, J. (2023). Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scar- lett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PML...

work page 2023
[11]

& Niles-Weed, J

Chen, H.-B. & Niles-Weed, J. (2022). Asymptotics of smoothed Wasserstein distances.Potential Anal.56, 571–595

work page 2022
[12]

& Zhang, A

Chen, S., Chewi, S., Li, J., Li, Y ., Salim, A. & Zhang, A. (2023). Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. InThe Eleventh International Conference on Learning Representations

work page 2023
[13]

& Gentiloni Silveri, M

Conforti, G., Durmus, A. & Gentiloni Silveri, M. (2025). KL convergence guarantees for score diffusion models under minimal data assumptions.SIAM J. Math. Data Sci.7, 86–109

work page 2025
[14]

& Pal, S

Conforti, G., Lacker, D. & Pal, S. (2025). Projected Langevin dynamics and a gradient flow for entropic optimal transport. To appear in J. Eur. Math. Soc. (JEMS)

work page 2025
[15]

Croitoru, F.-A., Hondru, V ., Ionescu, R. T. & Shah, M. (2023). Diffusion models in vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 10850–10869

work page 2023
[16]

De Bortoli, V . (2022). Convergence of denoising diffusion models under the manifold hypothesis.Transactions on Machine Learning Research

work page 2022
[17]

& Lee, J

Eldan, R. & Lee, J. R. (2018). Regularization under diffusion and anticoncentration of the information content. Duke Math. J.167, 969–993

work page 2018
[18]

& Shenfeld, Y

Eldan, R., Lehec, J. & Shenfeld, Y . (2020). Stability of the logarithmic Sobolev inequality via the F ¨ollmer process.Ann. Inst. Henri Poincar ´e Probab. Stat.56, 2253–2269

work page 2020
[19]

& Zhai, A

Eldan, R., Mikulincer, D. & Zhai, A. (2020). The CLT in high dimensions: quantitative bounds via martingale embedding.Ann. Probab.48, 2494–2524

work page 2020
[20]

& Koike, Y

Fang, X. & Koike, Y . (2024). Sharp high-dimensional central limit theorems for log-concave distributions.Ann. Inst. Henri Poincar´e Probab. Stat.60, 2129–2156

work page 2024
[21]

Gao, X., Nguyen, H. M. & Zhu, L. (2025). Wasserstein convergence guarantees for a general class of score-based generative models.J. Mach. Learn. Res.26, 1–54

work page 2025
[22]

& Ocello, A

Gentiloni-Silveri, M. & Ocello, A. (2025). Beyond log-concavity and score regularity: Improved convergence bounds for score-based generative models inW 2-distance. In A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff & J. Zhu, eds.,Proceedings of the 42nd International Conference on Machine Learning, vol. 267 ofProceedi...

work page 2025
[23]

Givens, C. R. & Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions.Michigan Math. J.31, 231–240

work page 1984
[24]

Guan, Q. (2024). A note on Bourgain’s slicing problem. Preprint, arXiv:2412.09075. 42

work page arXiv 2024
[25]

& Ding, X

Guo, Z., Lang, J., Huang, S., Gao, Y . & Ding, X. (2025). A comprehensive review on noise control of dif- fusion model. In2025 IEEE 6th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT). IEEE, pp. 1587–1593

work page 2025
[26]

& Abbeel, P

Ho, J., Jain, A. & Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, vol. 33. pp. 6840–6851

work page 2020
[27]

Horn, R. A. & Johnson, C. R. (2013).Matrix analysis. Cambridge University Press, 2nd edn

work page 2013
[28]

Huang, D., Niles-Weed, J., Tropp, J. A. & Ward, R. (2022). Matrix concentration for products.Found. Comput. Math.22, 1767–1799

work page 2022
[29]

& Chen, Y

Huang, Z., Wei, Y . & Chen, Y . (2026). Denoising diffusion probabilistic models are optimally adaptive to un- known low dimensionality.Math. Oper. Res. (forthcoming)

work page 2026
[30]

& Zhang, T

Jain, N. & Zhang, T. (2026). A sharp KL convergence analysis for diffusion models under minimal assumptions. InThe Fourteenth International Conference on Learning Representations

work page 2026
[31]

Jiao, Y ., Zhou, Y . & Li, G. (2025). Optimal convergence analysis of DDPM for general distributions. Preprint, arXiv:2510.27562

work page arXiv 2025
[32]

& Sztencel, R

Kallenberg, O. & Sztencel, R. (1991). Some dimension-free features of vector-valued martingales.Probab. Theory Related Fields88, 215–247

work page 1991
[33]

& Shreve, S

Karatzas, I. & Shreve, S. E. (1998).Brownian motion and stochastic calculus. Springer, 2nd edn

work page 1998
[34]

& Lehec, J

Klartag, B. & Lehec, J. (2025). Isoperimetric inequalities in high-dimensional convex sets.Bull. Amer. Math. Soc. (N.S.)62, 575–642

work page 2025
[35]

& Lehec, J

Klartag, B. & Lehec, J. (2025). Thin-shell bounds via parallel coupling. Preprint, arXiv:2507.15495v2

work page arXiv 2025
[36]

& Putterman, E

Klartag, B. & Putterman, E. (2023). Spectral monotonicity under Gaussian convolution.Ann. Fac. Sci. Toulouse Math. (6)32, 939–967

work page 2023
[37]

Koike, Y . (2026). A note on connections between the F ¨ollmer process and the denoising diffusion probabilistic model. Preprint

work page 2026
[38]

& Lederer, J

Kremling, G., Iafrate, F., Taheri, M. & Lederer, J. (2025). Non-asymptotic error bounds for probability flow ODEs under weak log-concavity. Preprint, arXiv:2510.17608

work page arXiv 2025
[39]

& Tan, Y

Lee, H., Lu, J. & Tan, Y . (2023). Convergence of score-based generative modeling for general data distributions. In S. Agrawal & F. Orabona, eds.,Proceedings of The 34th International Conference on Algorithmic Learning Theory, vol. 201 ofProceedings of Machine Learning Research. PMLR, pp. 946–985

work page 2023
[40]

& Chi, Y

Li, G., Wei, Y ., Chen, Y . & Chi, Y . (2024). Towards non-asymptotic convergence for diffusion-based generative models. InThe Twelfth International Conference on Learning Representations

work page 2024
[41]

& Yan, Y

Li, G. & Yan, Y . (2024). Adapting to unknown low-dimensional structures in score-based diffusion models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems

work page 2024
[42]

& Yan, Y

Li, G. & Yan, Y . (2025).O(d/T)convergence theory for diffusion probabilistic models under minimal assump- tions.J. Mach. Learn. Res.26, 1–55

work page 2025
[43]

& Chen, Y

Liang, J., Huang, Z. & Chen, Y . (2025). Low-dimensional adaptation of diffusion models: Convergence in total variation. In N. Haghtalab & A. Moitra, eds.,Proceedings of Thirty Eighth Conference on Learning Theory, vol. 291 ofProceedings of Machine Learning Research. PMLR, pp. 3723–3729

work page 2025
[44]

& Shenfeld, Y

Mikulincer, D. & Shenfeld, Y . (2023). On the Lipschitz properties of transportation along heat flows. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020- 2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 269–290. 43

work page 2023
[45]

Neeman, J. (2022). Lipschitz changes of variables via heat flow. Preprint, arXiv:2201.03403

work page arXiv 2022
[46]

Nichol, A. Q. & Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In M. Meila & T. Zhang, eds.,Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Research. PMLR, pp. 8162–8171

work page 2021
[47]

Niculescu, C. P. & Persson, L.-E. (2018).Convex functions and their applications: A contemporary approach. Springer, Switzerland, 2nd edn

work page 2018
[48]

& Suzuki, T

Oko, K., Akiyama, S. & Suzuki, T. (2023). Diffusion models are minimax optimal distribution estimators. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scarlett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PMLR, pp. 26517– 26582

work page 2023
[49]

& Villani, C

Otto, F. & Villani, C. (2000). Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal.173, 361–400

work page 2000
[50]

& Werner, F

Pfarr, E., Timofte, R. & Werner, F. (2026). Analyzing the error of generative diffusion models: From Euler- Maruyama to higher-order schemes. Preprint, arXiv:2601.18425

work page arXiv 2026
[51]

Polyanskiy, Y . & Wu, Y . (2025).Information theory: From coding to learning. Cambridge University Press

work page 2025
[52]

& Yor, M

Revuz, D. & Yor, M. (1999).Continuous martingales and Brownian motion. Springer, 3rd edn

work page 1999
[53]

Ricard, E. & Xu, Q. (2016). A noncommutative martingale convexity inequality.Ann. Probab.44, 867–882

work page 2016
[54]

Santos, J. E. & Lin, Y . T. (2023). Using Ornstein–Uhlenbeck process to understand Denoising Diffusion Proba- bilistic Model and its noise schedules. Preprint, arXiv:2311.17673

work page arXiv 2023
[55]

& Wellner, J

Saumard, A. & Wellner, J. A. (2014). Log-concavity and strong log-concavity: A review.Stat. Surv.8, 45–114

work page 2014
[56]

& Sobukawa, T

Seidler, J. & Sobukawa, T. (2003). Exponential integrability of stochastic convolutions.J. Lond. Math. Soc. (2) 67, 245–258

work page 2003
[57]

(2011).Convexity: an analytic viewpoint

Simon, B. (2011).Convexity: an analytic viewpoint. Cambridge University Press, New York

work page 2011
[58]

& Ganguli, S

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In F. Bach & D. Blei, eds.,Proceedings of the 32nd International Conference on Machine Learning, vol. 37 ofProceedings of Machine Learning Research. PMLR, Lille, France, pp. 2256– 2265

work page 2015
[59]

P., Kumar, A., Ermon, S

Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S. & Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations

work page 2021
[60]

St ´ephanovitch, A. (2025). Regularity of the score function in generative models. Preprint, arXiv:2506.19559

work page arXiv 2025
[61]

St ´ephanovitch, A. (2026). Lipschitz regularity in flow matching and diffusion models: sharp sampling rates and functional inequalities. Preprint, arXiv:2604.06065

work page internal anchor Pith review Pith/arXiv arXiv 2026
[62]

& Lemaire, V

Strasman, S., Ocello, A., Boyer, C., Le Corff, S. & Lemaire, V . (2025). An analysis of the noise schedule for score-based generative models.Transactions on Machine Learning Research

work page 2025
[63]

& Veraar, M

van Neerven, J. & Veraar, M. (2022). Maximal inequalities for stochastic convolutions and pathwise uniform convergence of time discretisation schemes.Stoch. Partial Differ. Equ. Anal. Comput.10, 516–581

work page 2022
[64]

Vempala, S. S. & Wibisono, A. (2023). Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020-2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 381–438. 44

work page 2023
[65]

& Wang, Z

Wang, X. & Wang, Z. (2026). Wasserstein bounds for generative diffusion models with Gaussian tail targets. Transactions on Machine Learning Research

work page 2026
[66]

& Vahdat, A

Xiao, Z., Kreis, K. & Vahdat, A. (2022). Tackling the generative learning trilemma with denoising diffusion GANs. InInternational Conference on Learning Representations

work page 2022
[67]

& Yang, M.-H

Yang, L., Zhang, Z., Song, Y ., Hong, S., Xu, R., Zhao, Y ., Zhang, W., Cui, B. & Yang, M.-H. (2023). Diffusion models: A comprehensive survey of methods and applications.ACM Computing Surveys56, 1–39

work page 2023
[68]

Yu, Y . & Yu, L. (2025). Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 45

work page 2025

[1] [1]

& Semola, D

Ambrosio, L., Bru ´e, E. & Semola, D. (2024).Lectures on optimal transport. Springer, 2nd edn

work page 2024

[2] [2]

& Dalalyan, A

Arsenyan, V ., Vardanyan, E. & Dalalyan, A. S. (2025). Assessing the quality of denoising diffusion models in Wasserstein distance: noisy score and optimal bounds. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025

[3] [3]

& Ledoux, M

Bakry, D., Gentil, I. & Ledoux, M. (2014).Analysis and geometry of Markov diffusion operators. Springer

work page 2014

[4] [4]

Ball, K., Carlen, E. A. & Lieb, E. H. (1994). Sharp uniform convexity and smoothness inequalities for trace norms.Invent. Math.115, 463–482

work page 1994

[5] [5]

& Deligiannidis, G

Benton, J., De Bortoli, V ., Doucet, A. & Deligiannidis, G. (2024). Nearlyd-linear convergence bounds for diffu- sion models via stochastic localization. InThe Twelfth International Conference on Learning Representations. 41

work page 2024

[6] [6]

& Bach, F

Beyler, E. & Bach, F. (2025). Convergence of deterministic and stochastic diffusion-model samplers: A simple analysis in Wasserstein distance. Preprint, arXiv:2508.03210

work page arXiv 2025

[7] [7]

& Sabanis, S

Bruno, S. & Sabanis, S. (2025). Wasserstein convergence of score-based generative models under semiconvexity and discontinuous gradients.Transactions on Machine Learning Research

work page 2025

[8] [8]

Cao, H., Tan, C., Gao, Z., Xu, Y ., Chen, G., Heng, P.-A. & Li, S. Z. (2024). A survey on generative diffusion models.IEEE Transactions on Knowledge and Data Engineering36, 2814–2830

work page 2024

[9] [9]

& Kree, P

Carlen, E. & Kree, P. (1991).L p estimates on iterated stochastic integrals.Ann. Probab.19, 354–368

work page 1991

[10] [10]

Chen, H., Lee, H. & Lu, J. (2023). Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scar- lett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PML...

work page 2023

[11] [11]

& Niles-Weed, J

Chen, H.-B. & Niles-Weed, J. (2022). Asymptotics of smoothed Wasserstein distances.Potential Anal.56, 571–595

work page 2022

[12] [12]

& Zhang, A

Chen, S., Chewi, S., Li, J., Li, Y ., Salim, A. & Zhang, A. (2023). Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. InThe Eleventh International Conference on Learning Representations

work page 2023

[13] [13]

& Gentiloni Silveri, M

Conforti, G., Durmus, A. & Gentiloni Silveri, M. (2025). KL convergence guarantees for score diffusion models under minimal data assumptions.SIAM J. Math. Data Sci.7, 86–109

work page 2025

[14] [14]

& Pal, S

Conforti, G., Lacker, D. & Pal, S. (2025). Projected Langevin dynamics and a gradient flow for entropic optimal transport. To appear in J. Eur. Math. Soc. (JEMS)

work page 2025

[15] [15]

Croitoru, F.-A., Hondru, V ., Ionescu, R. T. & Shah, M. (2023). Diffusion models in vision: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 10850–10869

work page 2023

[16] [16]

De Bortoli, V . (2022). Convergence of denoising diffusion models under the manifold hypothesis.Transactions on Machine Learning Research

work page 2022

[17] [17]

& Lee, J

Eldan, R. & Lee, J. R. (2018). Regularization under diffusion and anticoncentration of the information content. Duke Math. J.167, 969–993

work page 2018

[18] [18]

& Shenfeld, Y

Eldan, R., Lehec, J. & Shenfeld, Y . (2020). Stability of the logarithmic Sobolev inequality via the F ¨ollmer process.Ann. Inst. Henri Poincar ´e Probab. Stat.56, 2253–2269

work page 2020

[19] [19]

& Zhai, A

Eldan, R., Mikulincer, D. & Zhai, A. (2020). The CLT in high dimensions: quantitative bounds via martingale embedding.Ann. Probab.48, 2494–2524

work page 2020

[20] [20]

& Koike, Y

Fang, X. & Koike, Y . (2024). Sharp high-dimensional central limit theorems for log-concave distributions.Ann. Inst. Henri Poincar´e Probab. Stat.60, 2129–2156

work page 2024

[21] [21]

Gao, X., Nguyen, H. M. & Zhu, L. (2025). Wasserstein convergence guarantees for a general class of score-based generative models.J. Mach. Learn. Res.26, 1–54

work page 2025

[22] [22]

& Ocello, A

Gentiloni-Silveri, M. & Ocello, A. (2025). Beyond log-concavity and score regularity: Improved convergence bounds for score-based generative models inW 2-distance. In A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff & J. Zhu, eds.,Proceedings of the 42nd International Conference on Machine Learning, vol. 267 ofProceedi...

work page 2025

[23] [23]

Givens, C. R. & Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions.Michigan Math. J.31, 231–240

work page 1984

[24] [24]

Guan, Q. (2024). A note on Bourgain’s slicing problem. Preprint, arXiv:2412.09075. 42

work page arXiv 2024

[25] [25]

& Ding, X

Guo, Z., Lang, J., Huang, S., Gao, Y . & Ding, X. (2025). A comprehensive review on noise control of dif- fusion model. In2025 IEEE 6th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT). IEEE, pp. 1587–1593

work page 2025

[26] [26]

& Abbeel, P

Ho, J., Jain, A. & Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, vol. 33. pp. 6840–6851

work page 2020

[27] [27]

Horn, R. A. & Johnson, C. R. (2013).Matrix analysis. Cambridge University Press, 2nd edn

work page 2013

[28] [28]

Huang, D., Niles-Weed, J., Tropp, J. A. & Ward, R. (2022). Matrix concentration for products.Found. Comput. Math.22, 1767–1799

work page 2022

[29] [29]

& Chen, Y

Huang, Z., Wei, Y . & Chen, Y . (2026). Denoising diffusion probabilistic models are optimally adaptive to un- known low dimensionality.Math. Oper. Res. (forthcoming)

work page 2026

[30] [30]

& Zhang, T

Jain, N. & Zhang, T. (2026). A sharp KL convergence analysis for diffusion models under minimal assumptions. InThe Fourteenth International Conference on Learning Representations

work page 2026

[31] [31]

Jiao, Y ., Zhou, Y . & Li, G. (2025). Optimal convergence analysis of DDPM for general distributions. Preprint, arXiv:2510.27562

work page arXiv 2025

[32] [32]

& Sztencel, R

Kallenberg, O. & Sztencel, R. (1991). Some dimension-free features of vector-valued martingales.Probab. Theory Related Fields88, 215–247

work page 1991

[33] [33]

& Shreve, S

Karatzas, I. & Shreve, S. E. (1998).Brownian motion and stochastic calculus. Springer, 2nd edn

work page 1998

[34] [34]

& Lehec, J

Klartag, B. & Lehec, J. (2025). Isoperimetric inequalities in high-dimensional convex sets.Bull. Amer. Math. Soc. (N.S.)62, 575–642

work page 2025

[35] [35]

& Lehec, J

Klartag, B. & Lehec, J. (2025). Thin-shell bounds via parallel coupling. Preprint, arXiv:2507.15495v2

work page arXiv 2025

[36] [36]

& Putterman, E

Klartag, B. & Putterman, E. (2023). Spectral monotonicity under Gaussian convolution.Ann. Fac. Sci. Toulouse Math. (6)32, 939–967

work page 2023

[37] [37]

Koike, Y . (2026). A note on connections between the F ¨ollmer process and the denoising diffusion probabilistic model. Preprint

work page 2026

[38] [38]

& Lederer, J

Kremling, G., Iafrate, F., Taheri, M. & Lederer, J. (2025). Non-asymptotic error bounds for probability flow ODEs under weak log-concavity. Preprint, arXiv:2510.17608

work page arXiv 2025

[39] [39]

& Tan, Y

Lee, H., Lu, J. & Tan, Y . (2023). Convergence of score-based generative modeling for general data distributions. In S. Agrawal & F. Orabona, eds.,Proceedings of The 34th International Conference on Algorithmic Learning Theory, vol. 201 ofProceedings of Machine Learning Research. PMLR, pp. 946–985

work page 2023

[40] [40]

& Chi, Y

Li, G., Wei, Y ., Chen, Y . & Chi, Y . (2024). Towards non-asymptotic convergence for diffusion-based generative models. InThe Twelfth International Conference on Learning Representations

work page 2024

[41] [41]

& Yan, Y

Li, G. & Yan, Y . (2024). Adapting to unknown low-dimensional structures in score-based diffusion models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems

work page 2024

[42] [42]

& Yan, Y

Li, G. & Yan, Y . (2025).O(d/T)convergence theory for diffusion probabilistic models under minimal assump- tions.J. Mach. Learn. Res.26, 1–55

work page 2025

[43] [43]

& Chen, Y

Liang, J., Huang, Z. & Chen, Y . (2025). Low-dimensional adaptation of diffusion models: Convergence in total variation. In N. Haghtalab & A. Moitra, eds.,Proceedings of Thirty Eighth Conference on Learning Theory, vol. 291 ofProceedings of Machine Learning Research. PMLR, pp. 3723–3729

work page 2025

[44] [44]

& Shenfeld, Y

Mikulincer, D. & Shenfeld, Y . (2023). On the Lipschitz properties of transportation along heat flows. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020- 2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 269–290. 43

work page 2023

[45] [45]

Neeman, J. (2022). Lipschitz changes of variables via heat flow. Preprint, arXiv:2201.03403

work page arXiv 2022

[46] [46]

Nichol, A. Q. & Dhariwal, P. (2021). Improved denoising diffusion probabilistic models. In M. Meila & T. Zhang, eds.,Proceedings of the 38th International Conference on Machine Learning, vol. 139 ofProceedings of Machine Learning Research. PMLR, pp. 8162–8171

work page 2021

[47] [47]

Niculescu, C. P. & Persson, L.-E. (2018).Convex functions and their applications: A contemporary approach. Springer, Switzerland, 2nd edn

work page 2018

[48] [48]

& Suzuki, T

Oko, K., Akiyama, S. & Suzuki, T. (2023). Diffusion models are minimax optimal distribution estimators. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato & J. Scarlett, eds.,Proceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PMLR, pp. 26517– 26582

work page 2023

[49] [49]

& Villani, C

Otto, F. & Villani, C. (2000). Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.J. Funct. Anal.173, 361–400

work page 2000

[50] [50]

& Werner, F

Pfarr, E., Timofte, R. & Werner, F. (2026). Analyzing the error of generative diffusion models: From Euler- Maruyama to higher-order schemes. Preprint, arXiv:2601.18425

work page arXiv 2026

[51] [51]

Polyanskiy, Y . & Wu, Y . (2025).Information theory: From coding to learning. Cambridge University Press

work page 2025

[52] [52]

& Yor, M

Revuz, D. & Yor, M. (1999).Continuous martingales and Brownian motion. Springer, 3rd edn

work page 1999

[53] [53]

Ricard, E. & Xu, Q. (2016). A noncommutative martingale convexity inequality.Ann. Probab.44, 867–882

work page 2016

[54] [54]

Santos, J. E. & Lin, Y . T. (2023). Using Ornstein–Uhlenbeck process to understand Denoising Diffusion Proba- bilistic Model and its noise schedules. Preprint, arXiv:2311.17673

work page arXiv 2023

[55] [55]

& Wellner, J

Saumard, A. & Wellner, J. A. (2014). Log-concavity and strong log-concavity: A review.Stat. Surv.8, 45–114

work page 2014

[56] [56]

& Sobukawa, T

Seidler, J. & Sobukawa, T. (2003). Exponential integrability of stochastic convolutions.J. Lond. Math. Soc. (2) 67, 245–258

work page 2003

[57] [57]

(2011).Convexity: an analytic viewpoint

Simon, B. (2011).Convexity: an analytic viewpoint. Cambridge University Press, New York

work page 2011

[58] [58]

& Ganguli, S

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In F. Bach & D. Blei, eds.,Proceedings of the 32nd International Conference on Machine Learning, vol. 37 ofProceedings of Machine Learning Research. PMLR, Lille, France, pp. 2256– 2265

work page 2015

[59] [59]

P., Kumar, A., Ermon, S

Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S. & Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations

work page 2021

[60] [60]

St ´ephanovitch, A. (2025). Regularity of the score function in generative models. Preprint, arXiv:2506.19559

work page arXiv 2025

[61] [61]

St ´ephanovitch, A. (2026). Lipschitz regularity in flow matching and diffusion models: sharp sampling rates and functional inequalities. Preprint, arXiv:2604.06065

work page internal anchor Pith review Pith/arXiv arXiv 2026

[62] [62]

& Lemaire, V

Strasman, S., Ocello, A., Boyer, C., Le Corff, S. & Lemaire, V . (2025). An analysis of the noise schedule for score-based generative models.Transactions on Machine Learning Research

work page 2025

[63] [63]

& Veraar, M

van Neerven, J. & Veraar, M. (2022). Maximal inequalities for stochastic convolutions and pathwise uniform convergence of time discretisation schemes.Stoch. Partial Differ. Equ. Anal. Comput.10, 516–581

work page 2022

[64] [64]

Vempala, S. S. & Wibisono, A. (2023). Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In R. Eldan, B. Klartag, A. Litvak & E. Milman, eds.,Geometric aspects of functional analysis: Israel seminar (GAFA) 2020-2022, vol. 2327 ofLecture Notes in Math.Springer, pp. 381–438. 44

work page 2023

[65] [65]

& Wang, Z

Wang, X. & Wang, Z. (2026). Wasserstein bounds for generative diffusion models with Gaussian tail targets. Transactions on Machine Learning Research

work page 2026

[66] [66]

& Vahdat, A

Xiao, Z., Kreis, K. & Vahdat, A. (2022). Tackling the generative learning trilemma with denoising diffusion GANs. InInternational Conference on Learning Representations

work page 2022

[67] [67]

& Yang, M.-H

Yang, L., Zhang, Z., Song, Y ., Hong, S., Xu, R., Zhao, Y ., Zhang, W., Cui, B. & Yang, M.-H. (2023). Diffusion models: A comprehensive survey of methods and applications.ACM Computing Surveys56, 1–39

work page 2023

[68] [68]

Yu, Y . & Yu, L. (2025). Advancing Wasserstein convergence analysis of score-based models: Insights from discretization and second-order acceleration. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 45

work page 2025