An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

· 2025 · cs.LG · arXiv 2503.03206

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

We develop an analytical framework for understanding how the generated distribution evolves during diffusion model training. Leveraging a Gaussian-equivalence principle, we solve the full-batch gradient-flow dynamics of linear and convolutional denoisers and integrate the resulting probability-flow ODE, yielding analytic expressions for the generated distribution. The theory exposes a universal inverse-variance spectral law: the time for an eigen- or Fourier mode to match its target variance scales as $\tau\propto\lambda^{-1}$, so high-variance (coarse) structure is mastered orders of magnitude sooner than low-variance (fine) detail. Extending the analysis to deep linear networks and circulant full-width convolutions shows that weight sharing merely multiplies learning rates -- accelerating but not eliminating the bias -- whereas local convolution introduces a qualitatively different bias. Experiments on Gaussian and natural-image datasets confirm the spectral law persists in deep MLP-based UNet. Convolutional U-Nets, however, display rapid near-simultaneous emergence of many modes, implicating local convolution in reshaping learning dynamics. These results underscore how data covariance governs the order and speed with which diffusion models learn, and they call for deeper investigation of the unique inductive biases introduced by local convolution.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

Higher-variance classes are learned first in diffusion models; strong class imbalance reverses the order and imposes distinct delayed learning times on minority classes.

Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

SiLD is a score-matching framework that learns both manifold projection and intrinsic density from a single objective, with proven sample complexity depending only on intrinsic dimension.

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

citing papers explorer

Showing 4 of 4 citing papers.

The Interplay of Data Structure and Imbalance in the Learning Dynamics of Diffusion Models stat.ML · 2026-05-07 · unverdicted · none · ref 57 · internal anchor
Higher-variance classes are learned first in diffusion models; strong class imbalance reverses the order and imposes distinct delayed learning times on minority classes.
Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine cs.LG · 2026-05-16 · unverdicted · none · ref 59 · internal anchor
SiLD is a score-matching framework that learns both manifold projection and intrinsic density from a single objective, with proven sample complexity depending only on intrinsic dimension.
The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 117 · internal anchor
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion cs.CV · 2026-05-08 · unverdicted · none · ref 84 · internal anchor
Prior-Aligned AutoEncoders shape latent manifolds with spatial coherence, local continuity, and global semantics to improve latent diffusion, achieving SOTA gFID 1.03 on ImageNet 256x256 with up to 13x faster convergence.

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer