On the Wasserstein Gradient Flow Interpretation of Drifting Models

Alexandre Galashov; Arnaud Doucet; Arthur Gretton; James Thornton; Li Kevin Wenliang; Valentin De Bortoli

arxiv: 2605.05118 · v2 · pith:ICPMUVYHnew · submitted 2026-05-06 · 💻 cs.LG · cs.AI· stat.ML

On the Wasserstein Gradient Flow Interpretation of Drifting Models

Arthur Gretton , Li Kevin Wenliang , Alexandre Galashov , James Thornton , Valentin De Bortoli , Arnaud Doucet This is my paper

Pith reviewed 2026-05-22 10:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords Wasserstein gradient flowsgenerative modelingdrifting modelsKL divergenceSinkhorn divergenceoptimal transportParzen smoothing

0 comments

The pith

Generative Modeling via Drifting targets the fixed point of a Wasserstein gradient flow on the KL divergence with Parzen smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reinterprets Generative Modeling via Drifting as a process that directly seeks the limiting point of a Wasserstein gradient flow. One algorithm from Deng et al. aligns with the fixed point of a flow on the KL divergence after Parzen smoothing of the densities. The version actually run instead follows a procedure close to a Sinkhorn-divergence flow but without all of its properties. The same fixed-point view extends the drifting idea to other flows including those based on maximum mean discrepancy, sliced Wasserstein distance, and GAN critic functions.

Core claim

GMD can be thought of as directly targeting a fixed point of a specific Wasserstein gradient flow. One algorithm proposed by Deng et al. corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. The algorithm actually implemented by Deng et al. corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. The same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy, the sliced Wasserstein distance, and GAN critic functions.

What carries the argument

Wasserstein Gradient Flow on a functional such as the KL or Sinkhorn divergence, whose limiting points are identified with the fixed points reached by the drifting process in GMD.

Load-bearing premise

The drifting process in GMD can be directly identified with the limiting behavior of the specified Wasserstein gradient flows on KL or Sinkhorn divergences without unaccounted approximation errors or additional regularization terms that would alter the fixed point.

What would settle it

Implement the GMD algorithm and compare the final distribution against the known minimizer of the KL divergence under Parzen smoothing; exact agreement or systematic deviation would confirm or refute the correspondence.

Figures

Figures reproduced from arXiv: 2605.05118 by Alexandre Galashov, Arnaud Doucet, Arthur Gretton, James Thornton, Li Kevin Wenliang, Valentin De Bortoli.

**Figure 1.** Figure 1: MMD between true and generated samples trained by different drift types. view at source ↗

**Figure 1.** Figure 1: An example of the WGF associated with Fp(q) = KL(q∥p), where p is the target distribution and q the model distribution. The contour shows the first variation δKL(q∥p)/δq = log q − log p + 1, where blue is positive and pink is negative. The arrows show the WGF velocity vectors evaluated at samples from q, namely V KL p,q = ∇ log p − ∇ log q. In practice, the flow vectors are estimated from samples from unk… view at source ↗

**Figure 2.** Figure 2: True and generated samples for different types of drift and hyperparameters. Empty view at source ↗

**Figure 2.** Figure 2: MMD between true and generated samples trained by different drift types. [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Results for the 8 Gaussian dataset. Empty panel means the samples have diverged. view at source ↗

**Figure 3.** Figure 3: True and generated samples for different types of drift and hyperparameters. Empty [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Results for the Circles dataset. Empty panel means the samples have diverged. view at source ↗

**Figure 5.** Figure 5: Results for the Pinwheel dataset. Empty panel means the samples have diverged. view at source ↗

**Figure 6.** Figure 6: Results for the Swiss roll dataset. Empty panel means the samples have diverged. view at source ↗

**Figure 7.** Figure 7: Results for the Swiss roll dataset. Empty panel means the samples have diverged. [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗

read the original abstract

Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This note maps GMD drifting to WGF fixed points on KL and Sinkhorn but shows no derivations to confirm the links hold exactly.

read the letter

The useful part is the concrete identification of two GMD variants with limiting points of Wasserstein gradient flows. One matches a KL flow under Parzen smoothing on the densities. The implemented version resembles a Sinkhorn fixed point but drops some of its nicer properties. The authors also sketch how the same framing could apply to MMD, sliced Wasserstein, and GAN critics. That interpretive step is new relative to the Deng et al. work it cites and gives a cleaner way to think about what the drifting process is actually doing at equilibrium.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes Generative Modeling via Drifting (GMD) from Deng et al. (2026) through the lens of Wasserstein gradient flows. It claims three correspondences: one GMD algorithm matches the limiting point of a WGF on the KL divergence with Parzen smoothing on densities; the actually implemented algorithm resembles the fixed point of a WGF on the Sinkhorn divergence but lacks some of its properties; and the approach extends to limiting points of WGFs on MMD, sliced Wasserstein distance, and GAN critic functions.

Significance. If the claimed correspondences are rigorously derived, the work supplies a geometric interpretation that unifies drifting generative models with optimal transport flows. This could clarify convergence behavior and suggest principled extensions, strengthening the theoretical toolkit for generative modeling beyond ad-hoc drifting procedures.

major comments (2)

[§3] §3 (KL correspondence): the continuous-time limit from the discrete GMD update rule to the WGF on KL + Parzen is asserted but the explicit calculation of the velocity field or the identification of the fixed point is not shown; without this step the first claim remains formal rather than derived.
[§4] §4 (Sinkhorn resemblance): the statement that the implemented procedure 'lacks certain desirable properties' of the Sinkhorn WGF fixed point is not accompanied by a side-by-side comparison of the resulting optimality conditions or the missing regularization terms; this comparison is load-bearing for distinguishing the two procedures.

minor comments (2)

[Abstract] Abstract: 'the same same idea' is a typographical repetition.
Notation for the Parzen kernel bandwidth and the entropic regularization parameter should be introduced once and used consistently across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and positive recommendation for minor revision. We appreciate the detailed feedback on the derivations and comparisons in Sections 3 and 4. Below, we address each major comment and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (KL correspondence): the continuous-time limit from the discrete GMD update rule to the WGF on KL + Parzen is asserted but the explicit calculation of the velocity field or the identification of the fixed point is not shown; without this step the first claim remains formal rather than derived.

Authors: We agree with the referee that the explicit calculation of the continuous-time limit was not provided in sufficient detail. In the revised manuscript, we will add the derivation showing how the discrete GMD update rule converges to the Wasserstein gradient flow on the KL divergence with Parzen smoothing. Specifically, we will compute the velocity field by considering the infinitesimal limit of the update and identify the fixed point as the minimizer of the smoothed KL functional. revision: yes
Referee: [§4] §4 (Sinkhorn resemblance): the statement that the implemented procedure 'lacks certain desirable properties' of the Sinkhorn WGF fixed point is not accompanied by a side-by-side comparison of the resulting optimality conditions or the missing regularization terms; this comparison is load-bearing for distinguishing the two procedures.

Authors: We acknowledge that a side-by-side comparison is necessary to clearly distinguish the procedures. In the revision, we will include a detailed comparison of the optimality conditions derived from the implemented GMD algorithm and those from the Sinkhorn divergence WGF. This will explicitly show the missing regularization terms and explain the resulting differences in properties, such as convergence guarantees or stability. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper analyzes GMD via Wasserstein gradient flows by claiming explicit correspondences between proposed algorithms and limiting points of WGFs on KL, Sinkhorn, MMD, and related functionals. These claims rest on standard definitions of WGFs and divergences drawn from external literature rather than on any fitted parameters, self-defined quantities, or load-bearing self-citations internal to the present work. With no derivations, update rules, or continuous-limit calculations supplied that reduce the claimed fixed points to the paper's own inputs by construction, the argument remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard background results from optimal transport and gradient flows in Wasserstein space; no free parameters, ad-hoc axioms, or new invented entities are introduced in the abstract.

axioms (1)

standard math Wasserstein gradient flows describe paths of steepest descent for functionals over probability measures equipped with optimal transport geometry.
Invoked as the lens for reinterpreting GMD algorithms.

pith-pipeline@v0.9.0 · 5756 in / 1296 out tokens · 46947 ms · 2026-05-22T10:08:07.303927+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the algorithm actually implemented ... resembles the fixed point of a WGF on the Sinkhorn divergence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models
stat.ML 2026-05 unverdicted novelty 6.0

Establishes finite-particle convergence rates for a conservative KDE-gradient drifting method in one-step generative modeling on R^d along with analysis of a non-conservative Laplace kernel variant, yielding explicit ...

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Ambrosio, L., Gigli, N., and Savar \'e , G. (2008). Gradient Flows in Metric Spaces and in the Space of Probability Measures . Birkh \"a user

work page 2008
[2]

Arbel, M., Korba, A., Salim, A., and Gretton, A. (2019). Maximum mean discrepancy gradient flow. In Advances in Neural Information Processing Systems

work page 2019
[3]

Cao, J., Wei, Z., and Liu, Y. (2026). Gradient flow drifting: Generative modeling via W asserstein gradient flows of KDE -approximated divergences. arXiv preprint arXiv:2603.10592

work page arXiv 2026
[4]

Chen, Z., Mustafi, A., Glaser, P., Korba, A., Gretton, A., and Sriperumbudur, B. K. (2025). ( D e)-regularized maximum mean discrepancy gradient flow. Journal of Machine Learning Research , 26(235):1--77

work page 2025
[5]

Cortes, C., Mohri, M., and Rostamizadeh, A. (2009). L2 regularization for learning kernels. In Uncertainty in Artificial Intelligence

work page 2009
[6]

and Santambrogio, F

Cozzi, G. and Santambrogio, F. (2025). Long-time asymptotics of the sliced- W asserstein flow. SIAM Journal on Imaging Sciences , 18(1):1--19

work page 2025
[7]

R., De Bortoli, V., Doucet, A., and Johansen, A

Crucinio, F. R., De Bortoli, V., Doucet, A., and Johansen, A. M. (2024). Solving F redholm integral equations of the first kind via W asserstein gradient flows. Stochastic Processes and Their Applications , 173

work page 2024
[8]

Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems

work page 2013
[9]

Deng, M., Li, H., Li, T., Du, Y., and He, K. (2026). Generative modeling via drifting. arXiv preprint arXiv:2602.04770

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

K., Roy, D

Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. In Uncertainty in Artificial Intelligence

work page 2015
[11]

Feydy, J., S \'e journ \'e , T., Vialard, F.-X., Amari, S.-i., Trouv \'e , A., and Peyr \'e , G. (2019). Interpolating between optimal transport and MMD using S inkhorn divergences. In International Conference on Artificial Intelligence and Statistics

work page 2019
[12]

Franz, L., Hoffmann, S., and Martius, G. (2026). Drifting fields are not conservative. arXiv preprint arXiv:2604.06333

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Galashov, A., De Bortoli, V., and Gretton, A. (2025). Deep MMD gradient flow without adversarial training. In International Conference on Learning Representations

work page 2025
[14]

Glaser, P., Arbel, M., and Gretton, A. (2021). KALE flow: A relaxed KL gradient flow for probabilities with disjoint support. In Advances in Neural Information Processing Systems

work page 2021
[15]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \" o lkopf, B., and Smola, A. J. (2012). A kernel two-sample test. Journal of Machine Learning Research , 13

work page 2012
[16]

He, P., Khangaonkar, O., Pirsiavash, H., Bai, Y., and Kolouri, S. (2026). Sinkhorn-drifting generative models. arXiv preprint arXiv:2603.12366

work page arXiv 2026
[17]

A., Ruiz, D., and U c ar, B

Knight, P. A., Ruiz, D., and U c ar, B. (2014). A symmetry preserving algorithm for matrix scaling. SIAM Journal on Matrix Analysis and Applications , 35(3):931--955. hal-00569250

work page 2014
[18]

Lai, C.-H., Nguyen, B., Murata, N., Takida, Y., Uesaka, T., Mitsufuji, Y., Ermon, S., and Tao, M. (2026). A unified view of drifting and score-based models. arXiv preprint arXiv:2603.07514

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

Li, Y., Swersky, K., and Zemel, R. (2015). Generative moment matching networks. In International Conference on Machine Learning

work page 2015
[20]

and Zhu, B

Li, Z. and Zhu, B. (2026). A long-short flow-map perspective for drifting models. arXiv preprint arXiv:2602.20463

work page arXiv 2026
[21]

Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., and St \"o ter, F.-R. (2019). Sliced- W asserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning

work page 2019
[22]

Mroueh, Y., Sercu, T., and Raj, A. (2019). Sobolev descent. In International Conference on Artificial Intelligence and Statistics

work page 2019
[23]

Nowozin, S., Cseke, B., and Tomioka, R. (2016). f- GAN : training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems

work page 2016
[24]

Ramdas, A., Trillos, N., and Cuturi, M. (2017). On W asserstein two-sample testing and related families of nonparametric tests. Entropy , 19(2)

work page 2017
[25]

Santambrogio, F. (2017). \ Euclidean, metric, and Wasserstein \ gradient flows: an overview. Bulletin of Mathematical Sciences , 7(1):87--154

work page 2017
[26]

Sriperumbudur, B., Fukumizu, K., and Lanckriet, G. (2011). Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research , 12:2389--2410

work page 2011
[27]

Sriperumbudur, B., Gretton, A., Fukumizu, K., Lanckriet, G., and Sch \"o lkopf, B. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research , 11:1517--1561

work page 2010
[28]

and Ovsjanikov, M

Turan, E. and Ovsjanikov, M. (2026). Generative drifting is secretly score matching: a spectral and variational perspective. arXiv preprint arXiv:2603.09936

work page arXiv 2026
[29]

Weber, R. M. (2023). The score-difference flow for implicit generative modeling. Transactions on Machine Learning Research

work page 2023
[30]

Wenliang, L. K. and Kanagawa, H. (2020). Blindness of score-based methods to isolated components and mixing proportions. arXiv preprint arXiv:2008.10087

work page arXiv 2020
[31]

Zhou, L., Ermon, S., and Song, J. (2025). Inductive moment matching. In International Conference on Machine Learning

work page 2025

[1] [1]

Ambrosio, L., Gigli, N., and Savar \'e , G. (2008). Gradient Flows in Metric Spaces and in the Space of Probability Measures . Birkh \"a user

work page 2008

[2] [2]

Arbel, M., Korba, A., Salim, A., and Gretton, A. (2019). Maximum mean discrepancy gradient flow. In Advances in Neural Information Processing Systems

work page 2019

[3] [3]

Cao, J., Wei, Z., and Liu, Y. (2026). Gradient flow drifting: Generative modeling via W asserstein gradient flows of KDE -approximated divergences. arXiv preprint arXiv:2603.10592

work page arXiv 2026

[4] [4]

Chen, Z., Mustafi, A., Glaser, P., Korba, A., Gretton, A., and Sriperumbudur, B. K. (2025). ( D e)-regularized maximum mean discrepancy gradient flow. Journal of Machine Learning Research , 26(235):1--77

work page 2025

[5] [5]

Cortes, C., Mohri, M., and Rostamizadeh, A. (2009). L2 regularization for learning kernels. In Uncertainty in Artificial Intelligence

work page 2009

[6] [6]

and Santambrogio, F

Cozzi, G. and Santambrogio, F. (2025). Long-time asymptotics of the sliced- W asserstein flow. SIAM Journal on Imaging Sciences , 18(1):1--19

work page 2025

[7] [7]

R., De Bortoli, V., Doucet, A., and Johansen, A

Crucinio, F. R., De Bortoli, V., Doucet, A., and Johansen, A. M. (2024). Solving F redholm integral equations of the first kind via W asserstein gradient flows. Stochastic Processes and Their Applications , 173

work page 2024

[8] [8]

Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems

work page 2013

[9] [9]

Deng, M., Li, H., Li, T., Du, Y., and He, K. (2026). Generative modeling via drifting. arXiv preprint arXiv:2602.04770

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

K., Roy, D

Dziugaite, G. K., Roy, D. M., and Ghahramani, Z. (2015). Training generative neural networks via maximum mean discrepancy optimization. In Uncertainty in Artificial Intelligence

work page 2015

[11] [11]

Feydy, J., S \'e journ \'e , T., Vialard, F.-X., Amari, S.-i., Trouv \'e , A., and Peyr \'e , G. (2019). Interpolating between optimal transport and MMD using S inkhorn divergences. In International Conference on Artificial Intelligence and Statistics

work page 2019

[12] [12]

Franz, L., Hoffmann, S., and Martius, G. (2026). Drifting fields are not conservative. arXiv preprint arXiv:2604.06333

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

Galashov, A., De Bortoli, V., and Gretton, A. (2025). Deep MMD gradient flow without adversarial training. In International Conference on Learning Representations

work page 2025

[14] [14]

Glaser, P., Arbel, M., and Gretton, A. (2021). KALE flow: A relaxed KL gradient flow for probabilities with disjoint support. In Advances in Neural Information Processing Systems

work page 2021

[15] [15]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \" o lkopf, B., and Smola, A. J. (2012). A kernel two-sample test. Journal of Machine Learning Research , 13

work page 2012

[16] [16]

He, P., Khangaonkar, O., Pirsiavash, H., Bai, Y., and Kolouri, S. (2026). Sinkhorn-drifting generative models. arXiv preprint arXiv:2603.12366

work page arXiv 2026

[17] [17]

A., Ruiz, D., and U c ar, B

Knight, P. A., Ruiz, D., and U c ar, B. (2014). A symmetry preserving algorithm for matrix scaling. SIAM Journal on Matrix Analysis and Applications , 35(3):931--955. hal-00569250

work page 2014

[18] [18]

Lai, C.-H., Nguyen, B., Murata, N., Takida, Y., Uesaka, T., Mitsufuji, Y., Ermon, S., and Tao, M. (2026). A unified view of drifting and score-based models. arXiv preprint arXiv:2603.07514

work page internal anchor Pith review Pith/arXiv arXiv 2026

[19] [19]

Li, Y., Swersky, K., and Zemel, R. (2015). Generative moment matching networks. In International Conference on Machine Learning

work page 2015

[20] [20]

and Zhu, B

Li, Z. and Zhu, B. (2026). A long-short flow-map perspective for drifting models. arXiv preprint arXiv:2602.20463

work page arXiv 2026

[21] [21]

Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., and St \"o ter, F.-R. (2019). Sliced- W asserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In International Conference on Machine Learning

work page 2019

[22] [22]

Mroueh, Y., Sercu, T., and Raj, A. (2019). Sobolev descent. In International Conference on Artificial Intelligence and Statistics

work page 2019

[23] [23]

Nowozin, S., Cseke, B., and Tomioka, R. (2016). f- GAN : training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems

work page 2016

[24] [24]

Ramdas, A., Trillos, N., and Cuturi, M. (2017). On W asserstein two-sample testing and related families of nonparametric tests. Entropy , 19(2)

work page 2017

[25] [25]

Santambrogio, F. (2017). \ Euclidean, metric, and Wasserstein \ gradient flows: an overview. Bulletin of Mathematical Sciences , 7(1):87--154

work page 2017

[26] [26]

Sriperumbudur, B., Fukumizu, K., and Lanckriet, G. (2011). Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research , 12:2389--2410

work page 2011

[27] [27]

Sriperumbudur, B., Gretton, A., Fukumizu, K., Lanckriet, G., and Sch \"o lkopf, B. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research , 11:1517--1561

work page 2010

[28] [28]

and Ovsjanikov, M

Turan, E. and Ovsjanikov, M. (2026). Generative drifting is secretly score matching: a spectral and variational perspective. arXiv preprint arXiv:2603.09936

work page arXiv 2026

[29] [29]

Weber, R. M. (2023). The score-difference flow for implicit generative modeling. Transactions on Machine Learning Research

work page 2023

[30] [30]

Wenliang, L. K. and Kanagawa, H. (2020). Blindness of score-based methods to isolated components and mixing proportions. arXiv preprint arXiv:2008.10087

work page arXiv 2020

[31] [31]

Zhou, L., Ermon, S., and Song, J. (2025). Inductive moment matching. In International Conference on Machine Learning

work page 2025