Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective
read the original abstract
Generative Modeling via Drifting~\citep{deng2026drifting} has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet its success is largely empirical and its theoretical foundations remain poorly understood. We observe that \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This answers three questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the score-matching family. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, potentially explaining the empirical preference for the Laplacian kernel. This suggests a fix: an exponential bandwidth annealing schedule $\sigma(t)=\sigma_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is not a heuristic but is derived from the frozen-field discretization mandated by the Jordan-Kinderlehrer-Otto (JKO) scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, which we demonstrate with a Sinkhorn divergence drift. We validate our analysis on toy datasets and scale it up to ImageNet.
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
DriftXpress: Faster Drifting Models via Projected RKHS Fields
DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
-
Kernel-Gradient Drifting Models
Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.
-
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
For companion-elliptic kernels vanishing drifting fields identify target measures exactly, and field convergence yields weak convergence once mass escape to infinity is detected by a single C0 scalar.
-
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
Companion-elliptic kernels (exactly the Gaussians and Matérn kernels with ν ≥ 1/2) ensure drifting-field identifiability for equal measures and restore stability via an asymptotic lower bound on the intrinsic overlap scalar.
-
Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
Establishes finite-particle convergence rates for a conservative KDE-gradient drifting method in one-step generative modeling on R^d along with analysis of a non-conservative Laplace kernel variant, yielding explicit ...
-
Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.
-
On the Wasserstein Gradient Flow Interpretation of Drifting Models
The paper interprets GMD algorithms as limiting points of Wasserstein gradient flows on KL divergence with Parzen smoothing and on Sinkhorn divergence, while extending the approach to MMD, sliced Wasserstein, and GAN critics.
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
-
On the Wasserstein Gradient Flow Interpretation of Drifting Models
GMD algorithms correspond to limiting points of Wasserstein gradient flows on the KL divergence with Parzen smoothing and bear resemblance to Sinkhorn divergence fixed points, with extensions to MMD and other divergences.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.