pith. sign in

arxiv: 2603.09936 · v2 · pith:272X56QOnew · submitted 2026-03-10 · 💻 cs.LG

Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective

classification 💻 cs.LG
keywords driftdriftingoperatorkernelconvergencedistributionsdivergenceemph
0
0 comments X
read the original abstract

Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the score-matching family. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, potentially explaining the empirical preference for the Laplacian kernel. This suggests a fix: an exponential bandwidth annealing schedule $\sigma(t)=\sigma_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is not a heuristic but is derived from the frozen-field discretization mandated by the Jordan-Kinderlehrer-Otto (JKO) scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, which we demonstrate with a Sinkhorn divergence drift. We validate our analysis on toy datasets and scale it up to ImageNet.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DriftXpress: Faster Drifting Models via Projected RKHS Fields

    cs.LG 2026-05 unverdicted novelty 7.0

    DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.

  2. One-Step Generative Modeling via Wasserstein Gradient Flows

    cs.LG 2026-05 conditional novelty 7.0

    W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...

  3. Kernel-Gradient Drifting Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.

  4. Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

    stat.ML 2026-04 conditional novelty 7.0

    For companion-elliptic kernels vanishing drifting fields identify target measures exactly, and field convergence yields weak convergence once mass escape to infinity is detected by a single C0 scalar.

  5. Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

    stat.ML 2026-04 unverdicted novelty 7.0

    Companion-elliptic kernels (exactly the Gaussians and Matérn kernels with ν ≥ 1/2) ensure drifting-field identifiability for equal measures and restore stability via an asymptotic lower bound on the intrinsic overlap scalar.

  6. Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

    stat.ML 2026-05 unverdicted novelty 6.0

    Establishes finite-particle convergence rates for a conservative KDE-gradient drifting method in one-step generative modeling on R^d along with analysis of a non-conservative Laplace kernel variant, yielding explicit ...

  7. Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

    cs.LG 2026-05 unverdicted novelty 6.0

    DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.

  8. On the Wasserstein Gradient Flow Interpretation of Drifting Models

    cs.LG 2026-05 unverdicted novelty 6.0

    The paper interprets GMD algorithms as limiting points of Wasserstein gradient flows on KL divergence with Parzen smoothing and on Sinkhorn divergence, while extending the approach to MMD, sliced Wasserstein, and GAN critics.

  9. Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations

    cs.CV 2026-05 unverdicted novelty 5.0

    A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.

  10. On the Wasserstein Gradient Flow Interpretation of Drifting Models

    cs.LG 2026-05 unverdicted novelty 5.0

    GMD algorithms correspond to limiting points of Wasserstein gradient flows on the KL divergence with Parzen smoothing and bear resemblance to Sinkhorn divergence fixed points, with extensions to MMD and other divergences.