Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
On the edge of memorization in diffusion models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
unclear 1representative citing papers
Defines diffusion processes on implicit data manifolds via proximity-graph approximations to the infinitesimal generator and carré-du-champ operator, proves convergence in law to the continuous manifold process, and provides an Euler-Maruyama integrator validated on synthetic and MNIST manifolds.
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
A dynamical systems analysis of constant-step SGD explains memorization in generative models by combining two-time-scale dynamics with a collapse model.
citing papers explorer
-
Grokking of Diffusion Models: Case Study on Modular Addition
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
-
Diffusion Processes on Implicit Manifolds
Defines diffusion processes on implicit data manifolds via proximity-graph approximations to the infinitesimal generator and carré-du-champ operator, proves convergence in law to the continuous manifold process, and provides an Euler-Maruyama integrator validated on synthetic and MNIST manifolds.
-
Diffusion Models Memorize in Training -- and Generalize in Inference
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
-
Adynamical systems view of training generativemodels and the memorization phenomenon
A dynamical systems analysis of constant-step SGD explains memorization in generative models by combining two-time-scale dynamics with a collapse model.