Transformers converge globally to the optimal DDPM denoiser for multi-token GMMs via self-attention mean denoising, with explicit token and iteration requirements.
arXiv preprint arXiv:2303.13336 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models to solve it without retraining.
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
citing papers explorer
-
Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
Transformers converge globally to the optimal DDPM denoiser for multi-token GMMs via self-attention mean denoising, with explicit token and iteration requirements.
-
Inverse Design for Conditional Distribution Matching
Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models to solve it without retraining.
-
Grokking of Diffusion Models: Case Study on Modular Addition
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
-
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.