A Unified View of Score-Based and Drifting Models

arxiv: 2603.07514 · v3 · pith:KQJA7RN3new · submitted 2026-03-08 · 💻 cs.LG · cs.AI· cs.CV

A Unified View of Score-Based and Drifting Models

Chieh-Hsin Lai , Bac Nguyen , Naoki Murata , Yuhta Takida , Toshimitsu Uesaka , Yuki Mitsufuji , Stefano Ermon , Molei Tao This is my paper

classification 💻 cs.LG cs.AIcs.CV

keywords driftingmodelsscore-baseddatadiffusiondistributionsfieldkernels

0 comments p. Extension

pith:KQJA7RN3 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{KQJA7RN3}

Prints a linked pith:KQJA7RN3 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Drifting models train one-step generators by optimizing a kernel-induced mean-shift discrepancy between the data and model distributions, with Laplace kernels used by default in practice. At each point, this discrepancy compares the kernel-weighted displacement toward nearby data samples with the corresponding displacement toward nearby model samples, thereby defining a transport direction for generated samples. In this paper, we show that drifting is more closely connected to score-based generative modeling than it may first appear, establishing a precise link to the score-matching principle underlying diffusion models. For Gaussian kernels, the population mean-shift field exactly equals the difference between the scores (i.e., the gradient-log-densities) of the Gaussian-smoothed data and model distributions. This identity follows from Tweedie's formula, which links the score of a Gaussian-smoothed density to its conditional mean, and implies that Gaussian-kernel drifting is exactly a score-matching objective on smoothed distributions. More generally, we derive an exact decomposition for radial kernels in which mean shift equals a score-based field plus a residual term. For the practical Laplace kernel, we further show theoretically and empirically that this residual is negligible in high dimension, implying that the transport field used in practice is nearly score-based. Our results reveal a structural connection to diffusion models: both methods use score-mismatch transport directions, but drifting realizes the score nonparametrically through kernel-based estimates, whereas diffusion models learn it parametrically with neural networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative Modeling with Flux Matching
cs.LG 2026-05 unverdicted novelty 8.0

Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices be...
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
stat.ML 2026-04 conditional novelty 7.0

For companion-elliptic kernels vanishing drifting fields identify target measures exactly, and field convergence yields weak convergence once mass escape to infinity is detected by a single C0 scalar.
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
stat.ML 2026-04 unverdicted novelty 7.0

Companion-elliptic kernels (exactly the Gaussians and Matérn kernels with ν ≥ 1/2) ensure drifting-field identifiability for equal measures and restore stability via an asymptotic lower bound on the intrinsic overlap scalar.
Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow
cs.LG 2026-05 unverdicted novelty 6.0

DFP is a one-step generative policy using Wasserstein gradient flow on a drifting model backbone, with a top-K behavior cloning surrogate, that reaches SOTA on Robomimic and OGBench manipulation tasks.
Lookahead Drifting Model
cs.LG 2026-04 unverdicted novelty 6.0

The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
cs.CV 2026-05 unverdicted novelty 5.0

A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
On the Wasserstein Gradient Flow Interpretation of Drifting Models
cs.LG 2026-05 unverdicted novelty 5.0

GMD algorithms correspond to limiting points of Wasserstein gradient flows on the KL divergence with Parzen smoothing and bear resemblance to Sinkhorn divergence fixed points, with extensions to MMD and other divergences.