pith. sign in

arxiv: 2605.15822 · v1 · pith:7CFCLFS6new · submitted 2026-05-15 · 💻 cs.LG · stat.ML

Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds

Pith reviewed 2026-05-20 20:45 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords score-based generative modelsWasserstein distanceintrinsic dimensionmanifold learningconvergence ratesReLU networksscore approximationdiffusion models
0
0 comments X

The pith

Variance-preserving score-based generative models achieve intrinsic Wasserstein-1 rates that scale with manifold dimension d rather than ambient dimension D.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that score-based generative models trained in high-dimensional space can still match the convergence rates one would expect if the data truly lived on a lower-dimensional manifold. For compact smooth d-dimensional manifolds inside the unit cube with beta-Holder densities bounded away from zero, the estimator reaches a sample complexity of order n to the power of minus (beta plus one) over (d plus two beta), multiplied by a polynomial factor in D that depends on beta and d. This result is obtained by splitting the score approximation task into a large-noise regime handled on tangent cells and a small-noise regime handled by de-Gaussianized projections, each implemented with explicit ReLU networks built from finite intrinsic anchors and Gauss-Newton steps. A sympathetic reader cares because many real datasets are believed to concentrate on such structures, so the analysis removes the usual curse-of-dimensionality penalty that appears when one treats the ambient space as fully high-dimensional.

Core claim

For compact d-dimensional smooth manifolds M inside [0,1]^D with d greater than 2 and beta-Holder densities strictly positive on M, a variance-preserving SGM estimator attains the intrinsic Wasserstein-1 sample exponent tilde O of D to the power O_beta(d) times n to the power minus (beta plus one) over (d plus two beta), up to logarithmic factors and explicit geometry and density factors. The non-asymptotic bound isolates the finite-order geometry envelope, Holder radius, density lower bound, ambient dependence, and finite-order correction terms. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered de-Gaussianized Laplace reg

What carries the argument

ReLU implementation of nearest-projection coordinates via finite intrinsic anchors and Gauss-Newton iterations, which separates score approximation into large-noise tangent-cell and small-noise projection-centered regimes.

If this is right

  • Score-network parameters remain polynomially dependent on ambient dimension D whenever geometry and density lower bounds are polynomially controlled.
  • The explicit non-asymptotic bound isolates the contribution of manifold curvature, Holder radius, and density lower bound from the statistical rate.
  • The same construction yields rates that improve as the intrinsic dimension d decreases while keeping ambient D fixed.
  • The separation into tangent-cell and projection-centered regimes gives a concrete recipe for building the score network without treating manifold projection as a black-box high-dimensional map.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar intrinsic-rate arguments could be tested on other generative models such as flow-based or diffusion models when the same projection-coordinate construction is inserted.
  • The polynomial ambient dependence suggests that the method remains practical even when D is several hundred and d is moderate, provided the manifold is sufficiently smooth.
  • One could examine whether relaxing compactness or allowing densities that touch zero at isolated points still preserves the leading exponent, though that would require new technical arguments.
  • The finite-anchor Gauss-Newton scheme might be adapted to learn the manifold itself from samples rather than assuming it known, offering a route to unsupervised manifold estimation inside the generative model.

Load-bearing premise

The data manifold is a compact smooth d-dimensional submanifold of the unit cube whose density is beta-Holder and bounded away from zero, and the score can be approximated separately in large-noise and small-noise regimes.

What would settle it

Numerical experiments on a known compact manifold with a beta-Holder density that measure the Wasserstein-1 error of the generated samples and check whether the observed scaling with n matches the exponent minus (beta plus one) over (d plus two beta) within the predicted polynomial prefactor in D.

read the original abstract

Score-based generative models are trained in high-dimensional ambient spaces, yet many data distributions are supported on low-dimensional nonlinear structures. We prove that, for compact $d$-dimensional smooth manifolds $\mathcal{M} \subset [0,1]^D$ with $d > 2$ and $\beta$-H\"older densities strictly positive on $\mathcal{M}$, a variance-preserving SGM estimator attains the intrinsic Wasserstein--1 sample exponent $\tilde{\mathcal{O}}(D^{\mathcal{O}_\beta(d)}n^{-(\beta+1)/(d+2\beta)})$, up to logarithmic factors and explicit geometry and density factors. The full nonasymptotic bound explicitly isolates the finite-order geometry envelope, H\"older radius, density lower bound, ambient dependence, and finite-order correction terms. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered, de-Gaussianized Laplace regime. The key technical ingredient is a ReLU implementation of nearest-projection coordinates via finite intrinsic anchors and Gauss--Newton iterations, rather than approximating the manifold projection as a black-box high-dimensional smooth map. Consequently, for families with polynomially controlled geometry and density lower bounds, the constructed score-network parameters have polynomial ambient dependence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proves a nonasymptotic bound showing that variance-preserving score-based generative models achieve the intrinsic Wasserstein-1 rate tilde O(D^{O_beta(d)} n^{-(beta+1)/(d+2beta)}) (up to logs and explicit geometry/density factors) for beta-Holder densities strictly positive on compact d-dimensional smooth submanifolds of [0,1]^D with d>2. The analysis separates score approximation into a large-noise tangent-cell regime and a small-noise projection-centered de-Gaussianized Laplace regime, and constructs the score network via a ReLU implementation of nearest-projection coordinates using finite intrinsic anchors and Gauss-Newton iterations rather than treating the manifold projection as a black-box map.

Significance. If the central derivation holds, the result is significant because it supplies the first explicit nonasymptotic intrinsic rates for SGMs on manifolds, isolating the finite-order geometry envelope, Holder radius, density lower bound, and ambient dependence while achieving only polynomial ambient dependence under polynomially controlled geometry. The constructive ReLU-based projection via anchors and Gauss-Newton iterations is a technical strength that avoids black-box high-dimensional approximation.

major comments (2)
  1. [score approximation analysis (regime separation)] The central claim rests on the regime split for score approximation (large-noise tangent-cell versus small-noise projection-centered). The cutoff sigma_* must be chosen so that the integrated error from both regimes plus any interface mismatch remains within the target rate after diffusion integration; without explicit uniform bounds on transition-scale errors (which depend on curvature and anchor density), extra factors may appear in the D exponent or degrade the n exponent.
  2. [projection map construction via anchors and Gauss-Newton] The error bounds for the finite intrinsic anchors plus Gauss-Newton iterations in the projection map are stated separately inside each regime. These must be shown to be uniform across the transition scale; otherwise the accumulated score error can pick up uncontrolled curvature-dependent terms that affect the claimed intrinsic rate.
minor comments (2)
  1. [notation and statement of main theorem] The precise definition and dependence of the O_beta(d) exponent on beta and d should be stated explicitly in the main text rather than only in the abstract.
  2. [introduction] A short comparison table or paragraph contrasting the derived rate with prior ambient-dimension rates would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the regime separation and projection construction. We have revised the manuscript to explicitly address the uniformity of bounds across the transition scale while preserving the claimed intrinsic rate.

read point-by-point responses
  1. Referee: [score approximation analysis (regime separation)] The central claim rests on the regime split for score approximation (large-noise tangent-cell versus small-noise projection-centered). The cutoff sigma_* must be chosen so that the integrated error from both regimes plus any interface mismatch remains within the target rate after diffusion integration; without explicit uniform bounds on transition-scale errors (which depend on curvature and anchor density), extra factors may appear in the D exponent or degrade the n exponent.

    Authors: We thank the referee for this observation. The cutoff sigma_* is chosen as n^{-1/(d+2beta)} times a geometry-dependent factor to balance the large-noise and small-noise regimes. In the revised manuscript we add an explicit uniform bound (new Lemma 5.3) on the transition-scale score error that depends only on the Holder radius, curvature bound, and anchor density; the interface mismatch integrates to at most a logarithmic factor under the diffusion process. Consequently the overall non-asymptotic bound retains the stated n exponent and polynomial D dependence with no additional factors. revision: yes

  2. Referee: [projection map construction via anchors and Gauss-Newton] The error bounds for the finite intrinsic anchors plus Gauss-Newton iterations in the projection map are stated separately inside each regime. These must be shown to be uniform across the transition scale; otherwise the accumulated score error can pick up uncontrolled curvature-dependent terms that affect the claimed intrinsic rate.

    Authors: We agree that uniformity is required. The projection error bounds depend only on the intrinsic geometry (compactness, smoothness order, and anchor separation) and are independent of the noise level sigma. In the revision we insert a short uniformity argument (Section 4.4) showing that for sigma in [sigma_*/2, 2 sigma_*] the Gauss-Newton iteration error remains controlled by the same curvature envelope used inside each regime; no additional curvature-dependent terms arise. This keeps the accumulated score error inside the target rate. revision: yes

Circularity Check

0 steps flagged

Derivation of intrinsic Wasserstein rate is self-contained theoretical analysis

full rationale

The paper derives a non-asymptotic convergence bound for variance-preserving SGMs on compact smooth manifolds by separating score approximation error into large-noise tangent-cell and small-noise projection-centered regimes, then integrating against the diffusion process. The key construction uses ReLU networks to implement nearest-projection coordinates via finite intrinsic anchors and Gauss-Newton iterations. No step reduces a claimed prediction or first-principles result to a fitted parameter, self-defined quantity, or load-bearing self-citation by construction. The bound explicitly isolates geometry, Holder radius, density lower bound, and ambient factors, making the derivation independent of its target exponent.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The proof rests on standard assumptions of manifold smoothness and Holder continuity of the density; no free parameters are fitted to data and no new entities are postulated.

axioms (3)
  • domain assumption M is a compact d-dimensional smooth manifold embedded in [0,1]^D with d > 2
    Stated in the abstract as the setting for the rate result.
  • domain assumption The density is beta-Holder and strictly positive on M
    Required for the intrinsic rate to hold.
  • domain assumption Score approximation can be separated into large-noise tangent-cell regime and small-noise projection-centered de-Gaussianized Laplace regime
    Described as the key technical separation in the analysis.

pith-pipeline@v0.9.0 · 5757 in / 1384 out tokens · 37291 ms · 2026-05-20T20:45:25.290697+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 2 internal anchors

  1. [1]

    Convergence of dif- fusion models under the manifold hypothesis in high-dimensions.arXiv preprint arXiv:2409.18804,

    Iskander Azangulov, George Deligiannidis, and Judith Rousseau. Convergence of diffusion models under the manifold hypothesis in high-dimensions. CoRR , abs/2409.18804, 2024

  2. [2]

    Nearly d-linear convergence bounds for diffusion models via stochastic localization

    Joe Benton, Valentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearly d-linear convergence bounds for diffusion models via stochastic localization. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net, 2024

  3. [3]

    The reach, metric distortion, geodesic convexity and the variation of tangent spaces

    Jean-Daniel Boissonnat, Andr \'e Lieutier, and Mathijs Wintraecken. The reach, metric distortion, geodesic convexity and the variation of tangent spaces. Journal of applied and computational topology , 3(1):29--58, 2019

  4. [4]

    Nonasymptotic bounds for forward processes in denoising diffusions: Ornstein--uhlenbeck is hard to beat

    Miha Bre s ar and Aleksandar Mijatovi \'c . Nonasymptotic bounds for forward processes in denoising diffusions: Ornstein--uhlenbeck is hard to beat. The Annals of Applied Probability , 35(6):4439--4463, 2025

  5. [5]

    On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates

    Stefano Bruno, Ying Zhang, Dong - Young Lim, \" O mer Deniz Akyildiz, and Sotirios Sabanis. On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates. CoRR , abs/2311.13584, 2023

  6. [6]

    Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

    Saptarshi Chakraborty, Quentin Berthet, and Peter L Bartlett. Generalization properties of score-matching diffusion models for intrinsically low-dimensional data. arXiv preprint arXiv:2603.03700 , 2026

  7. [7]

    Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

    Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023

  8. [8]

    Conforti, A

    Giovanni Conforti, Alain Durmus, and Marta Gentiloni Silveri. Score diffusion models without early stopping: finite fisher information is all you need. arXiv preprint arXiv:2308.12240 , 2023

  9. [9]

    Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data

    Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 of Proceedings of Machine Learning Research , pages 4672--4712. PMLR , 2023

  10. [10]

    Weiss, Mohammad Norouzi, and William Chan

    Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021

  11. [11]

    Zehao Dou, Subhodh Kotekal, Zhehao Xu, and Harrison H. Zhou. From optimal score matching to optimal sampling. CoRR , abs/2409.07032, 2024

  12. [12]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Quinn Nichol. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual , pages 8780--8794, 2021

  13. [13]

    Provable maximum entropy manifold exploration via diffusion models

    Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh, Zebang Shen, Niao He, and Andreas Krause. Provable maximum entropy manifold exploration via diffusion models. arXiv preprint arXiv:2506.15385 , 2025

  14. [14]

    Curvature measures

    Herbert Federer. Curvature measures. Transactions of the American Mathematical Society , 93(3):418--491, 1959

  15. [15]

    Approximation and generalization abilities of score-based neural network generative models for sub-gaussian distributions

    Guoji Fu and Wee Sun Lee. Approximation and generalization abilities of score-based neural network generative models for sub-gaussian distributions. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2025

  16. [16]

    Testing the manifold hypothesis

    Charles Fefferman, Sanjoy Mitter, and Hariharan Narayanan. Testing the manifold hypothesis. Journal of the American Mathematical Society , 29(4):983--1049, 2016

  17. [17]

    Analysis of diffusion models for manifold data

    Anand Jerry George, Rodrigo Veiga, and Nicolas Macris. Analysis of diffusion models for manifold data. In IEEE International Symposium on Information Theory, ISIT 2025, Ann Arbor, MI, USA, June 22-27, 2025 , pages 1--6. IEEE , 2025

  18. [18]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual , 2020

  19. [19]

    Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality

    Zhihan Huang, Yuting Wei, and Yuxin Chen. Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality. CoRR , abs/2410.18784, 2024

  20. [20]

    Estimation of non-normalized statistical models by score matching

    Aapo Hyv \" a rinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research , 6:695--709, 2005

  21. [21]

    Optimal convergence analysis of DDPM for general distributions.arXiv preprint arXiv:2510.27562,

    Yuchen Jiao, Yuchen Zhou, and Gen Li. Optimal convergence analysis of DDPM for general distributions. CoRR , abs/2510.27562, 2025

  22. [22]

    Diffwave: A versatile diffusion model for audio synthesis

    Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021

  23. [23]

    Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure

    Xiang Li, Yixiang Dai, and Qing Qu. Understanding generalizability of diffusion models requires rethinking the hidden gaussian structure. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , 2024

  24. [24]

    Low-dimensional adaptation of diffusion models: Convergence in total variation (extended abstract)

    Jiadong Liang, Zhihan Huang, and Yuxin Chen. Low-dimensional adaptation of diffusion models: Convergence in total variation (extended abstract). In Nika Haghtalab and Ankur Moitra, editors, The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France , volume 291 of Proceedings of Machine Learning Research , pages 3723--3729. PMLR , 2025

  25. [25]

    Accelerating convergence of score-based diffusion models, provably

    Gen Li, Yu Huang, Timofey Efimov, Yuting Wei, Yuejie Chi, and Yuxin Chen. Accelerating convergence of score-based diffusion models, provably. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 . OpenReview.net, 2024

  26. [26]

    Convergence for score-based generative modeling with polynomial complexity

    Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative modeling with polynomial complexity. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022 , 2022

  27. [27]

    Convergence of score-based generative modeling for general data distributions

    Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, February 20-23, 2023, Singapore , volume 201 of Proceedings of Machine Learning Research , pages 946--985. PMLR , 2023

  28. [28]

    Resolving memorization in empirical diffusion model for manifold data in high-dimensional spaces

    Yang Lyu, Tan Minh Nguyen, Yuchun Qian, and Xin T Tong. Resolving memorization in empirical diffusion model for manifold data in high-dimensional spaces. arXiv:2505.02508 , 2025

  29. [29]

    When scores learn geometry: Rate separations under the manifold hypothesis

    Xiang Li, Zebang Shen, Ya - Ping Hsieh, and Niao He. When scores learn geometry: Rate separations under the manifold hypothesis. CoRR , abs/2509.24912, 2025

  30. [30]

    Mathematical analysis of singularities in the diffusion model under the submanifold assumption

    Yubin Lu, Zhongjian Wang, and Guillaume Bal. Mathematical analysis of singularities in the diffusion model under the submanifold assumption. arXiv preprint arXiv:2301.07882 , 2023

  31. [31]

    Towards faster non-asymptotic convergence for diffusion-based generative models

    Gen Li, Yuting Wei, Yuxin Chen, and Yuejie Chi. Towards faster non-asymptotic convergence for diffusion-based generative models. CoRR , abs/2306.09251, 2023

  32. [32]

    O(d/t) convergence theory for diffusion probabilistic models under minimal assumptions

    Gen Li and Yuling Yan. O(d/t) convergence theory for diffusion probabilistic models under minimal assumptions. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net, 2025

  33. [33]

    Improving the euclidean diffusion generation of manifold data by mitigating score function singularity.arXiv preprint arXiv:2505.09922, 2025

    Zichen Liu, Wei Zhang, and Tiejun Li. Improving the euclidean diffusion generation of manifold data by mitigating score function singularity. arXiv preprint arXiv:2505.09922 , 2025

  34. [34]

    Diffusion models are minimax optimal distribution estimators

    Kazusato Oko, Shunta Akiyama, and Taiji Suzuki. Diffusion models are minimax optimal distribution estimators. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA , volume 202 of Proceedings of Machine Learning Research , pages 26517--26582. PMLR , 2023

  35. [35]

    Linear convergence of diffusion models under the manifold hypothesis

    Peter Potaptchik, Iskander Azangulov, and George Deligiannidis. Linear convergence of diffusion models under the manifold hypothesis. In Nika Haghtalab and Ankur Moitra, editors, The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France , volume 291 of Proceedings of Machine Learning Research , pages 4668--4685. PMLR , 2025

  36. [36]

    Score-based generative models detect manifolds

    Jakiw Pidstrigach. Score-based generative models detect manifolds. Advances in Neural Information Processing Systems , 35:35852--35865, 2022

  37. [37]

    Generalization bounds for score-based generative models: a synthetic proof

    Arthur St \'e phanovitch, Eddie Aamari, and Cl \'e ment Levrard. Generalization bounds for score-based generative models: a synthetic proof. CoRR , abs/2507.04794, 2025

  38. [38]

    Diffusion models encode the intrinsic dimension of data manifolds

    Jan Stanczuk, Georgios Batzolis, Teo Deveney, and Carola - Bibiane Sch \" o nlieb. Diffusion models encode the intrinsic dimension of data manifolds. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 . OpenReview.net, 2024

  39. [39]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada , pages 11895--11907, 2019

  40. [40]

    Nonparametric regression using deep neural networks with relu activation function

    Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics , 48(4):1875, 2020

  41. [41]

    An analysis of the noise schedule for score-based generative models

    Stanislas Strasman, Antonio Ocello, Claire Boyer, Sylvain Le Corff, and Vincent Lemaire. An analysis of the noise schedule for score-based generative models. Transactions on Machine Learning Research , 2025, 2025

  42. [42]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl - Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021

  43. [43]

    Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality

    Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019 . OpenReview.net, 2019

  44. [44]

    Weiss, Niru Maheswaranathan, and Surya Ganguli

    Jascha Sohl - Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 , volume 37 of JMLR Workshop and Conference Proceedings , pages 2256--2265. JMLR.org, 2015

  45. [45]

    Manifolds

    Loring W Tu. Manifolds. In An Introduction to Manifolds , pages 47--83. Springer, 2011

  46. [46]

    Adaptivity of diffusion models to manifold structures

    Rong Tang and Yun Yang. Adaptivity of diffusion models to manifold structures. In International Conference on Artificial Intelligence and Statistics, 2-4 May 2024, Palau de Congressos, Valencia, Spain , volume 238 of Proceedings of Machine Learning Research , pages 1648--1656. PMLR , 2024

  47. [47]

    Adaptivity and convergence of probability flow odes in diffusion generative models

    Jiaqi Tang and Yuling Yan. Adaptivity and convergence of probability flow odes in diffusion generative models. CoRR , abs/2501.18863, 2025

  48. [48]

    High-dimensional probability: An introduction with applications in data science , volume 47

    Roman Vershynin. High-dimensional probability: An introduction with applications in data science , volume 47. Cambridge university press, 2018

  49. [49]

    A connection between score matching and denoising autoencoders

    Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation , 23(7):1661--1674, 2011

  50. [50]

    Score-based generative modeling in latent space

    Arash Vahdat, Karsten Kreis, and Jan Kautz. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual , pages 11287--11302, 2021

  51. [51]

    Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance

    Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli , 25(4A):2620--2648, 2019

  52. [52]

    De novo design of protein structure and function with rfdiffusion

    Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al. De novo design of protein structure and function with rfdiffusion. Nature , 620(7976):1089--1100, 2023

  53. [53]

    Convergence in KL divergence of the inexact langevin algorithm with application to score-based generative models

    Andre Wibisono and Kaylee Yingxi Yang. Convergence in KL divergence of the inexact langevin algorithm with application to score-based generative models. CoRR , abs/2211.01512, 2022

  54. [54]

    Diffusion models learn low-dimensional distributions via subspace clustering

    Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, and Qing Qu. Diffusion models learn low-dimensional distributions via subspace clustering. CoRR , abs/2409.02426, 2024

  55. [55]

    Error bounds for approximations with deep relu networks

    Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks , 94:103--114, 2017

  56. [56]

    Generalization error bound for denoising score matching under relaxed manifold assumption

    Konstantin Yakovlev and Nikita Puchkin. Generalization error bound for denoising score matching under relaxed manifold assumption. In The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France , volume 291 of Proceedings of Machine Learning Research , pages 5824--5891. PMLR , 2025

  57. [57]

    Zhang, Stephen Huan, Jerry Huang, Nicholas M

    Matthew S. Zhang, Stephen Huan, Jerry Huang, Nicholas M. Boffi, Sitan Chen, and Sinho Chewi. Sublinear iterations can suffice even for ddpms. CoRR , abs/2511.04844, 2025

  58. [58]

    Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity

    Zixuan Zhang, Kaixuan Huang, Tuo Zhao, Mengdi Wang, and Minshuo Chen. Diffusion model for manifold data: Score decomposition, curvature, and statistical complexity. arXiv preprint arXiv:2603.20645 , 2026

  59. [59]

    Minimax optimality of score-based diffusion models: Beyond the density lower bound assumptions

    Kaihong Zhang, Heqi Yin, Feng Liang, and Jingbo Liu. Minimax optimality of score-based diffusion models: Beyond the density lower bound assumptions. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27 , 2024