Transformers converge globally to the optimal DDPM denoiser for multi-token GMMs via self-attention mean denoising, with explicit token and iteration requirements.
Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Score-based generative models attain intrinsic Wasserstein-1 sample rates of order n to the power of -(beta+1)/(d+2beta) on d-dimensional smooth manifolds with beta-Holder densities.
Diffusion models on manifold-supported data admit score decompositions whose statistical rates are controlled by intrinsic dimension and curvature.
citing papers explorer
-
Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
Transformers converge globally to the optimal DDPM denoiser for multi-token GMMs via self-attention mean denoising, with explicit token and iteration requirements.
-
Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds
Score-based generative models attain intrinsic Wasserstein-1 sample rates of order n to the power of -(beta+1)/(d+2beta) on d-dimensional smooth manifolds with beta-Holder densities.
-
Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity
Diffusion models on manifold-supported data admit score decompositions whose statistical rates are controlled by intrinsic dimension and curvature.