Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

Krishnakumar Balasubramanian

arxiv: 2605.22795 · v1 · pith:R6WJTUJRnew · submitted 2026-05-21 · 📊 stat.ML · cs.AI· cs.LG· math.ST· stat.TH

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

Krishnakumar Balasubramanian This is my paper

Pith reviewed 2026-05-22 02:59 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGmath.STstat.TH

keywords drifting modelsfinite-particle convergencekernel density estimationgenerative modelingconservative velocityStein driftFisher discrepancyone-step generation

0 comments

The pith

A conservative drifting method using kernel density estimator gradients achieves finite-particle convergence rates of N to the power of minus one over d plus four for one-step generative modeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a conservative drifting method for one-step generative modeling by using a velocity that is the difference of kernel-smoothed data and model scores. This choice makes the velocity a gradient field, fixing the non-conservatism that appears in displacement-based drifting. A joint-entropy identity then supplies continuous-time bounds on the empirical Stein drift, the smoothed Fisher discrepancy of the kernel estimate, and the squared center velocity for any finite number of particles in R^d. The leading finite-particle correction is a reciprocal-kernel self-interaction term that remains controlled under local-occupancy conditions on the particles. Under an extra uniform quadrature regularity condition on the bandwidth, these bounds deliver a root residual-velocity rate of N to the power of minus one over d plus four.

Core claim

For the conservative drifting method the joint-entropy identity bounds the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The dominant finite-particle correction is a reciprocal-KDE self-interaction term, which is controlled by deterministic and high-probability local-occupancy conditions. Under an additional h-uniform quadrature regularity condition the root residual-velocity rate is N^{-1/(d+4)}; a more general growth condition yields the optimized rate N^{-(2-β)/(2(d+4-β))} for 0 ≤ β < 2. The same style of analysis for the non-conservative Laplace-kernel method produces an analogous rate that necessarily contains an extra residual term.

What carries the argument

The KDE-gradient velocity, defined as the difference between the kernel-smoothed data score and the kernel-smoothed model score, which forms a gradient field and therefore guarantees the conservative property of the drifting process.

If this is right

The continuous-time residual-velocity bounds convert directly into one-step generation guarantees once the explicit drift size η is chosen.
The non-conservative drifting method with Laplace kernel admits a companion-kernel decomposition that isolates a preconditioned score mismatch plus a scale-mismatch residual, producing a comparable finite-particle rate.
The bounds keep quadrature constants explicit and track their possible dependence on the kernel bandwidth.
Local-occupancy conditions suffice to control the self-interaction correction in both deterministic and high-probability settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the quadrature regularity condition can be verified for common kernels, the same proof strategy could be applied to other score-based generative procedures to obtain dimension-dependent rates.
The explicit dependence on dimension in the exponent suggests that the method may remain practical only for moderate d unless the bandwidth or occupancy assumptions are strengthened.
Testing whether the local-occupancy condition holds for typical data sets would give a practical check on whether the predicted rates are attainable in real applications.

Load-bearing premise

The h-uniform quadrature regularity condition or the more general growth condition must hold so that the quadrature constants and their bandwidth dependence can be controlled inside the finite-particle bounds.

What would settle it

A numerical experiment with a known target distribution in which the observed residual velocity fails to decrease at the predicted rate when the particle count N is increased would falsify the convergence claims.

read the original abstract

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-\beta)/(2(d+4-\beta))}$, where $0\le \beta<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in~\cite{deng2026drifting}. For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $\eta$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives finite-particle bounds for a conservative KDE-gradient drifting model with explicit constants, but the sharpest rates rest on an extra quadrature condition without checks on standard targets.

read the letter

The main contribution is a conservative drifting velocity built from KDE gradients of the data and model scores. This turns the velocity into a true gradient field, which directly fixes the non-conservatism problem that displacement-based fields had in the earlier drifting work. They also give a companion analysis for the original non-conservative Laplace-kernel version, decomposing the velocity into a preconditioned score mismatch plus a scale residual. Both parts use a joint-entropy identity to bound the empirical Stein drift, smoothed Fisher discrepancy, and center velocity in continuous time. The finite-particle correction is the reciprocal-KDE self-interaction term, controlled by local-occupancy conditions that come in both deterministic and high-probability forms. They keep the quadrature constants explicit and track their bandwidth dependence, which is a useful level of detail. Under the extra h-uniform quadrature regularity they recover the rate N^{-1/(d+4)}; a milder growth condition gives the slightly weaker optimized rate N^{-(2-β)/(2(d+4-β))}. The non-conservative case produces an analogous rate but with an unavoidable residual. The explicit constants and the local-occupancy conditions are the parts that feel most reusable. The soft spot is the h-uniform quadrature regularity itself. The abstract presents it as an additional assumption needed to control bandwidth dependence in the quadrature terms, yet there is no argument or numerical check showing it holds for Gaussians, mixtures, or other typical targets. Without that verification the headline rate does not apply in the usual settings, and one falls back to the weaker bound. The derivations appear to rest on the joint-entropy identity and kernel decompositions rather than on fitted quantities, so the circularity risk looks low. This is for readers already working on finite-particle analysis of drifting or score-based generative models. Someone who cares about how particle number and bandwidth interact will find the conditions and constant tracking helpful. The paper is coherent on its own terms and shows clear engagement with the prior drifting literature, so it deserves a serious referee to verify the proofs and see whether the quadrature condition can be relaxed or replaced.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a conservative drifting method for one-step generative modeling that replaces the displacement-based drifting velocity with a KDE-gradient velocity (difference of kernel-smoothed data and model scores) to ensure the field is conservative. It proves continuous-time finite-particle convergence bounds on R^d: a joint-entropy identity is used to bound the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term controlled under deterministic and high-probability local-occupancy conditions. Under an additional h-uniform quadrature regularity condition the root residual-velocity rate N^{-1/(d+4)} is obtained, while a general growth condition yields the optimized rate N^{-(2-β)/(2(d+4-β))}. An analogous analysis is given for the non-conservative Laplace-kernel case, producing rates with an unavoidable residual term. The continuous-time bounds are translated to one-step generation guarantees via the explicit drift size η.

Significance. If the results hold, the work supplies useful theoretical guarantees for finite-particle behavior in conservative drifting models for generative modeling, with explicit rates, bandwidth tracking, and conditions that directly address non-conservatism. The joint-entropy identity and kernel decomposition approach, together with the dual conservative/non-conservative analysis, would be a positive addition to the literature on particle-based sampling and one-step generation.

major comments (1)

[Derivation of finite-particle convergence bounds (following the joint-entropy identity and local-occupancy conditions)] The root residual-velocity rate N^{-1/(d+4)} (stated in the abstract and derived after the joint-entropy identity) is obtained only after invoking the additional h-uniform quadrature regularity condition to control bandwidth dependence of the quadrature constants in the reciprocal-KDE self-interaction term. This condition is presented as an extra assumption whose validity for standard target measures (e.g., Gaussians or mixtures) is neither proved nor checked numerically; without it the rate reverts to the weaker N^{-(2-β)/(2(d+4-β))}. Because the stronger rate is load-bearing for the main finite-particle claim, the manuscript must either verify the condition or clearly qualify the result.

minor comments (1)

[Abstract] The abstract is information-dense; a short bullet list of the two main rates and the key assumptions would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on the finite-particle convergence analysis. We address the major comment below and will revise the manuscript to better qualify the stronger convergence rate.

read point-by-point responses

Referee: The root residual-velocity rate N^{-1/(d+4)} (stated in the abstract and derived after the joint-entropy identity) is obtained only after invoking the additional h-uniform quadrature regularity condition to control bandwidth dependence of the quadrature constants in the reciprocal-KDE self-interaction term. This condition is presented as an extra assumption whose validity for standard target measures (e.g., Gaussians or mixtures) is neither proved nor checked numerically; without it the rate reverts to the weaker N^{-(2-β)/(2(d+4-β))}. Because the stronger rate is load-bearing for the main finite-particle claim, the manuscript must either verify the condition or clearly qualify the result.

Authors: We thank the referee for this observation. The manuscript already qualifies the rate N^{-1/(d+4)} as holding under the additional h-uniform quadrature regularity condition (see abstract and Theorem 3.4), while presenting the general rate under the growth condition with parameter β. To address the concern, we will revise the abstract, introduction, and the statement of the main theorem to more prominently emphasize this distinction and add a short paragraph discussing applicability to standard targets. For smooth densities such as Gaussians or finite mixtures, the condition holds with bandwidth h scaling as N^{-1/(d+4)} because the local occupancy and quadrature constants remain bounded (as the KDE approximates the density uniformly on compact sets). We will also include a brief numerical illustration verifying the quadrature constants for a standard Gaussian target. This strengthens the presentation without changing the technical results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper derives its finite-particle convergence bounds from a joint-entropy identity that directly produces controls on the empirical Stein drift, smoothed Fisher discrepancy of the KDE, and squared center velocity. The reciprocal-KDE self-interaction term is bounded via explicit deterministic and high-probability local-occupancy conditions, with quadrature constants tracked explicitly and rates expressed parametrically in N, d, beta, and h under additional regularity assumptions. No load-bearing step reduces a claimed prediction to a fitted quantity or self-citation by construction; the non-conservative analysis similarly decomposes the velocity via a companion kernel without circular reduction. The derivation remains self-contained against its stated assumptions and external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard mathematical identities such as the joint-entropy identity and kernel decomposition properties; no data-fitted free parameters or new postulated entities are introduced.

axioms (2)

standard math Joint-entropy identity
Used to obtain bounds on empirical Stein drift, smoothed Fisher discrepancy, and squared center velocity.
domain assumption Kernel density estimator properties and Laplace kernel decomposition
Enables the velocity decomposition into preconditioned score mismatch plus scale-mismatch residual for the non-conservative case.

pith-pipeline@v0.9.0 · 5839 in / 1639 out tokens · 74676 ms · 2026-05-22T02:59:00.554107+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

root residual-velocity rate N^{-1/(d+4)} under an additional h-uniform quadrature regularity condition

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 8 internal anchors

[1]

URLhttps://openreview.net/forum? id=cqDH0e6ak2

ISSN 2835-8856. URLhttps://openreview.net/forum? id=cqDH0e6ak2. Jiarui Cao, Zixuan Wei, and Yuxin Liu. Gradient flow drifting: Generative modeling via Wasserstein gradient flows of KDE-approximated divergences.arXiv preprint arXiv:2603.10592,

work page arXiv
[2]

Generative Modeling via Drifting

Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting. arXiv preprint arXiv:2602.04770,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Learning Monge maps with constrained drifting models

40 Th´ eo Dumont, Th´ eo Lacombe, and Fran¸ cois-Xavier Vialard. Learning Monge maps with con- strained drifting models.arXiv preprint arXiv:2603.25182,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Kernel-Gradient Drifting Models

Maria Esteban-Casadevall, Jorge Carrasco-Pollo, Max Welling, Jan-Willem van de Meent, Erik J. Bekkers, and Floor Eijkelboom. Kernel-gradient drifting models.arXiv preprint arXiv:2605.10727,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Drifting Fields are not Conservative

Leonard Franz, Sebastian Hoffmann, and Georg Martius. Drifting fields are not conservative.arXiv preprint arXiv:2604.06333,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

On the Wasserstein Gradient Flow Interpretation of Drifting Models

Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, and Arnaud Doucet. On the Wasserstein Gradient Flow Interpretation of Drifting Models.arXiv preprint arXiv:2605.05118,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Sinkhorn-Drifting Generative Models.arXiv preprint arXiv:2603.12366, 2026a

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, and Soheil Kolouri. Sinkhorn-Drifting Generative Models.arXiv preprint arXiv:2603.12366, 2026a. Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, and Promit Ghosal. Finite-Particle Rates for Regularized Stein Variational Gradient Descent.arXiv preprint arXiv:2602.05172, 2026b. Jonathan Ho, Ajay Jain...

work page arXiv
[8]

A Unified View of Score-Based and Drifting Models

Chieh-Hsin Lai, Bac Nguyen, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, and Molei Tao. A unified view of drifting and score-based models.arXiv preprint arXiv:2603.07514,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

Hak Geun Lee and Hyonho Chun. Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families.arXiv preprint arXiv:2604.24196,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

and Zhu, B

Zhiqi Li and Bo Zhu. A Long-Short Flow-Map Perspective for Drifting Models.arXiv preprint arXiv:2602.20463,

work page arXiv
[11]

and Ovsjanikov, M

Erkan Turan and Maks Ovsjanikov. Generative drifting is secretly score matching: A spectral and variational perspective.arXiv preprint arXiv:2603.09936,

work page arXiv
[12]

Lookahead Drifting Model

ISSN 2835-8856. URLhttps://openreview.net/forum?id= dpGSNLUCzu. Guoqiang Zhang, Kenta Niwa, and W. Bastiaan Kleijn. Lookahead drifting model.arXiv preprint arXiv:2605.04060,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Letg θ :Z →R d be the generator and let xk i :=g θk(ξi), i= 1,

A Stop-gradient training and the particle ODE We briefly explain how the continuous-time particle dynamics arise from the practical stop-gradient training rule. Letg θ :Z →R d be the generator and let xk i :=g θk(ξi), i= 1, . . . , N, be the generated batch at training stepk, for fixed latent variablesξ 1, . . . , ξN. Letv xk denote the drift field comput...

work page 2026

[1] [1]

URLhttps://openreview.net/forum? id=cqDH0e6ak2

ISSN 2835-8856. URLhttps://openreview.net/forum? id=cqDH0e6ak2. Jiarui Cao, Zixuan Wei, and Yuxin Liu. Gradient flow drifting: Generative modeling via Wasserstein gradient flows of KDE-approximated divergences.arXiv preprint arXiv:2603.10592,

work page arXiv

[2] [2]

Generative Modeling via Drifting

Mingyang Deng, He Li, Tianhong Li, Yilun Du, and Kaiming He. Generative modeling via drifting. arXiv preprint arXiv:2602.04770,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Learning Monge maps with constrained drifting models

40 Th´ eo Dumont, Th´ eo Lacombe, and Fran¸ cois-Xavier Vialard. Learning Monge maps with con- strained drifting models.arXiv preprint arXiv:2603.25182,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Kernel-Gradient Drifting Models

Maria Esteban-Casadevall, Jorge Carrasco-Pollo, Max Welling, Jan-Willem van de Meent, Erik J. Bekkers, and Floor Eijkelboom. Kernel-gradient drifting models.arXiv preprint arXiv:2605.10727,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Drifting Fields are not Conservative

Leonard Franz, Sebastian Hoffmann, and Georg Martius. Drifting fields are not conservative.arXiv preprint arXiv:2604.06333,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

On the Wasserstein Gradient Flow Interpretation of Drifting Models

Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, and Arnaud Doucet. On the Wasserstein Gradient Flow Interpretation of Drifting Models.arXiv preprint arXiv:2605.05118,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Sinkhorn-Drifting Generative Models.arXiv preprint arXiv:2603.12366, 2026a

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, and Soheil Kolouri. Sinkhorn-Drifting Generative Models.arXiv preprint arXiv:2603.12366, 2026a. Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, and Promit Ghosal. Finite-Particle Rates for Regularized Stein Variational Gradient Descent.arXiv preprint arXiv:2602.05172, 2026b. Jonathan Ho, Ajay Jain...

work page arXiv

[8] [8]

A Unified View of Score-Based and Drifting Models

Chieh-Hsin Lai, Bac Nguyen, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, and Molei Tao. A unified view of drifting and score-based models.arXiv preprint arXiv:2603.07514,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

Hak Geun Lee and Hyonho Chun. Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families.arXiv preprint arXiv:2604.24196,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

and Zhu, B

Zhiqi Li and Bo Zhu. A Long-Short Flow-Map Perspective for Drifting Models.arXiv preprint arXiv:2602.20463,

work page arXiv

[11] [11]

and Ovsjanikov, M

Erkan Turan and Maks Ovsjanikov. Generative drifting is secretly score matching: A spectral and variational perspective.arXiv preprint arXiv:2603.09936,

work page arXiv

[12] [12]

Lookahead Drifting Model

ISSN 2835-8856. URLhttps://openreview.net/forum?id= dpGSNLUCzu. Guoqiang Zhang, Kenta Niwa, and W. Bastiaan Kleijn. Lookahead drifting model.arXiv preprint arXiv:2605.04060,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Letg θ :Z →R d be the generator and let xk i :=g θk(ξi), i= 1,

A Stop-gradient training and the particle ODE We briefly explain how the continuous-time particle dynamics arise from the practical stop-gradient training rule. Letg θ :Z →R d be the generator and let xk i :=g θk(ξi), i= 1, . . . , N, be the generated batch at training stepk, for fixed latent variablesξ 1, . . . , ξN. Letv xk denote the drift field comput...

work page 2026