A convergence rate for the entropic JKO scheme

Aymeric Baradat; Sofiane Cherf

arxiv: 2604.08283 · v1 · submitted 2026-04-09 · 🧮 math.AP · cs.NA· math.NA

A convergence rate for the entropic JKO scheme

Aymeric Baradat , Sofiane Cherf This is my paper

Pith reviewed 2026-05-10 17:40 UTC · model grok-4.3

classification 🧮 math.AP cs.NAmath.NA

keywords JKO schemeentropic regularizationWasserstein gradient flowsconvergence rateconvexityPDEs

0 comments

The pith

The entropic JKO scheme converges to the original PDE solution at a specific rate when the regularization parameter and time step both approach zero under convexity assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that replacing the Wasserstein distance with its entropic regularization in the JKO scheme still yields convergence to the target PDE. The convergence holds with an explicit rate as both the regularization strength alpha and the time step tau go to zero. The result follows from a new inequality that bounds how much the entropic iterates differ from the classical ones. Sympathetic readers care because the entropic version is much easier to compute numerically, so this justifies its use for simulating gradient flows in probability measures.

Core claim

Under convexity assumptions, the entropic JKO scheme with ε = α τ converges to the solution of the initial PDE with a certain rate as α and τ tend to zero. This is a consequence of a new bound between the classical and entropic JKO schemes.

What carries the argument

The new bound between the classical JKO scheme and its entropic counterpart, which quantifies their difference in terms of alpha and tau.

Load-bearing premise

Convexity assumptions on the energy functional are needed for the new bound between classical and entropic JKO schemes to hold.

What would settle it

Numerical computation of the error between the entropic JKO iterates and the true PDE solution for a convex energy, measured as alpha and tau decrease, would confirm or refute the claimed rate.

read the original abstract

The so-called JKO scheme, named after Jordan, Kinderlehrer and Otto, provides a variational way to construct discrete time approximations of certain partial differential equations (PDEs) appearing as gradient flows in the space of probability measures equipped with the Wasserstein metric. The method consists of an implicit Euler scheme, which can be implemented numerically. Yet, in practice, evaluating the Wasserstein distance can be numerically expensive. To address this problem, a common strategy introduced by Peyr\'e in 2015 and which has been shown to produce faster computations, is to replace the Wasserstein distance with its entropic regularization, also known as the Schr\"odinger cost. In 2026, the first author, Hraivoronska and Santambrogio, proved that if the regularization parameter $\varepsilon$ is proportional to the time step $\tau$, that is, $\varepsilon = \alpha \tau$ for some $\alpha > 0$, then as $\tau \to 0$, this change results in adding to the limiting PDE the additional linear diffusion term $\frac{\alpha}{2} \Delta \rho$. Our goal in this article is to provide a convergence rate under convexity assumptions between the entropic JKO scheme and the solution of the initial PDE as both $\alpha$ and $\tau$ tend to zero. This will appear as a consequence of a new bound between the classical and entropic JKO schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A new comparison bound between entropic and classical JKO schemes yields an explicit convergence rate to the PDE under convexity.

read the letter

The main takeaway is a new bound that controls the difference between classical JKO steps and their entropic counterparts. Under convexity of the driving functional, this bound plus the existing convergence rate for the unregularized scheme produces a rate at which the entropic scheme approaches the target PDE as both α and τ tend to zero. They handle this by establishing the comparison first, then composing with the known result. It is honest about the convexity hypothesis and does not claim more than the limit statement allows. The earlier observation that fixed α adds diffusion is sidestepped cleanly by letting α vanish too. I see no serious gaps in the logic from the description. The proof structure is standard and the assumptions are flagged up front. A minor point is that without the explicit constants or the full estimates, it is unclear how the rate behaves when α and τ approach zero at different speeds, or whether the bound is sharp. That is the sort of thing a referee would check. The paper is aimed at people who implement or analyze numerical schemes for measure-valued gradient flows. It supplies a concrete error control that was missing. I would put it through peer review. The result is modest but the reasoning appears sound and the contribution is well-defined.

Referee Report

1 major / 2 minor

Summary. The manuscript proves a quantitative convergence rate for the entropic JKO scheme to the solution of the underlying PDE, under convexity assumptions on the driving energy. The rate is obtained as a consequence of a new bound comparing the classical JKO iterates to their entropic counterparts; the known convergence rate of the classical scheme is then used to control the distance to the continuous limit as both the regularization parameter α and the time step τ tend to zero.

Significance. If the result holds, the work supplies an explicit error estimate that justifies entropic regularization for numerical approximation of Wasserstein gradient flows when α is taken small. The comparison bound between the two discrete schemes appears to be the main technical novelty and may be of independent interest in optimal transport. The argument structure is standard and leverages existing theory without introducing new ad-hoc parameters.

major comments (1)

[§3] §3 (main theorem): the precise dependence of the convergence rate on α and τ (including any factors depending on the convexity modulus or initial data) must be stated explicitly in the theorem; the current phrasing 'a certain rate' is too vague for a quantitative result.

minor comments (2)

[Abstract] Abstract: replace 'a certain rate' with a brief indication of the order (e.g., O(√τ + α)) to give readers an immediate sense of the result.
[Introduction] Notation: ensure the entropic cost and the relation ε = ατ are introduced with the same symbols used in the main statements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the positive overall assessment. We address the major comment below and will implement the requested clarification.

read point-by-point responses

Referee: [§3] §3 (main theorem): the precise dependence of the convergence rate on α and τ (including any factors depending on the convexity modulus or initial data) must be stated explicitly in the theorem; the current phrasing 'a certain rate' is too vague for a quantitative result.

Authors: We agree that the main theorem statement should explicitly display the dependence of the error bound on α, τ, the convexity modulus of the driving energy, and suitable norms of the initial datum. The proof already yields an explicit rate (obtained by combining the new comparison estimate between classical and entropic JKO schemes with the known rate for the classical scheme), but the theorem was phrased concisely. In the revised manuscript we will restate the theorem with the precise quantitative bound, including the explicit dependence on all relevant quantities. This is a minor textual clarification that leaves the arguments unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity; minor self-citation not load-bearing

full rationale

The paper establishes a new quantitative bound between classical and entropic JKO schemes under convexity assumptions on the energy, then combines this bound with the known convergence rate of the classical JKO scheme to the target PDE. This yields the claimed rate for the entropic scheme to the original PDE as both α and τ tend to zero. The self-citation in the abstract to prior work by the first author et al. (on the fixed-α limit adding a diffusion term) provides context and motivation but is not invoked as a load-bearing step in the derivation of the new bound or the rate; the central argument relies on an independent comparison and standard external results on classical JKO convergence. No self-definitional reductions, fitted predictions, or ansatz smuggling appear in the provided structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are mentioned; the work relies on standard convexity assumptions from the field of optimal transport and gradient flows.

pith-pipeline@v0.9.0 · 5561 in / 965 out tokens · 69129 ms · 2026-05-10T17:40:11.479961+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1.3 (Convergence estimate) ... W2(J0n,τ(μ0),Jαn,τ(μ0)) ≤ ... under λ-convexity along generalized geodesics and heat-kernel bound K
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hypothesis 1.1 ... λ-convex along generalized geodesics ... F(μ∗σt)≤F(μ)+Kt/2

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Adams, N

S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. From a large-deviations principle to the Wasser- stein gradient flow: a new micro-macro passage.Communications in Mathematical Physics, 307:791– 815, 2011

work page 2011
[2]

Ambrosio and N

L. Ambrosio and N. Gigli. A User’s Guide to Optimal Transport. InModelling and Optimisation of Flows on Networks, pages 1–155. 2013

work page 2013
[3]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005. 46 AYMERIC BARADAT AND SOFIANE CHERF

work page 2005
[4]

Baradat, A

A. Baradat, A. Hraivoronska, and F. Santambrogio. Using Sinkhorn in the JKO scheme adds linear diffusion, 2025

work page 2025
[5]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York, 2nd edition, 2017

work page 2017
[6]

Benamou and Y

J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

work page 2000
[7]

Benamou, G

J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

work page 2015
[8]

Benamou, G

J.-D. Benamou, G. Carlier, and L. Nenna. Generalized incompressible flows, multi-marginal trans- port and Sinkhorn algorithm.Numerische Mathematik, 142(1):33–54, 2019

work page 2019
[9]

Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions.Communi- cations on Pure and Applied Mathematics, 44(4):375–417, 1991

work page 1991
[10]

Brezis.Functional analysis, Sobolev spaces and partial differential equations

H. Brezis.Functional analysis, Sobolev spaces and partial differential equations. New York, NY: Springer, 2011

work page 2011
[11]

Carlier, V

G. Carlier, V. Duval, G. Peyré, and B. Schmitzer. Convergence of Entropic Schemes for Optimal Transport and Gradient Flows.SIAM Journal on Mathematical Analysis, 49(2):1385–1418, 2017

work page 2017
[12]

Carlier, K

G. Carlier, K. Eichinger, and A. Kroshnin. Entropic-Wasserstein barycenters: PDE characterization, regularity, and CLT.SIAM J. Math. Anal., 53(5):5880–5914, 2021

work page 2021
[13]

Conforti and L

G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. J. Funct. Anal., 280(11), 2021

work page 2021
[14]

M. Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. InAdvances in Neural Information Processing Systems, volume 26, 2013

work page 2013
[15]

M. H. Duong, V. Laschos, and M. Renger. Wasserstein gradient flows from large deviations of many-particle limits.ESAIM Control Optim. Calc. Var., 19(4):1166–1188, 2013

work page 2013
[16]

Erbar, J

M. Erbar, J. Maas, and D. R. M. Renger. From large deviations to Wasserstein gradient flows in multiple dimensions.Electron. Commun. Probab., 20, 2015

work page 2015
[17]

Gentil, C

I. Gentil, C. Léonard, and L. Ripani. About the analogy between optimal transport and minimal entropy.Annales de la Faculté des sciences de Toulouse : Mathématiques, Ser. 6, 26(3):569–600, 2017

work page 2017
[18]

Jordan, D

R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998
[19]

Kallenberg.Foundations of Modern Probability

O. Kallenberg.Foundations of Modern Probability. Springer, New York, 2 edition, 2002

work page 2002
[20]

C. Léonard. From the Schrödinger problem to the Monge–Kantorovich problem.Journal of Func- tional Analysis, 262(4):1879–1920, 2012

work page 1920
[21]

C. Léonard. A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34(4):1533–1574, 2014

work page 2014
[22]

Malamut and M

H. Malamut and M. Sylvestre. Convergence rates of the regularized optimal transport: disentangling suboptimality and entropy.SIAM J. Math. Anal., 57(3):2533–2558, 2025

work page 2025
[23]

R. J. McCann. A convexity principle for interacting gases.Adv. Math., 128(1):153–179, 1997

work page 1997
[24]

F. Otto. Evolution of microstructure in unstable porous media flow: A relaxational approach. Communications on Pure and Applied Mathematics, 52(7):873–915, 1999

work page 1999
[25]

G. Peyré. Entropic Approximation of Wasserstein Gradient Flows.SIAM Journal on Imaging Sciences, 8(4):2323–2351, 2015

work page 2015
[26]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94, 2015

work page 2015
[27]

Sinkhorn

R. Sinkhorn. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums.The American Mathematical Monthly, 74(4):402–405, 1967. Universite Claude Bernard Lyon 1, CNRS, Centrale Lyon, INSA Lyon, Université Jean Monnet, ICJ UMR5208, 43 bd du 11 Novembre 1918, 69622 Villeurbanne, France Email address:{baradat,cherf}@math.univ-lyon1.fr

work page 1967

[1] [1]

Adams, N

S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. From a large-deviations principle to the Wasser- stein gradient flow: a new micro-macro passage.Communications in Mathematical Physics, 307:791– 815, 2011

work page 2011

[2] [2]

Ambrosio and N

L. Ambrosio and N. Gigli. A User’s Guide to Optimal Transport. InModelling and Optimisation of Flows on Networks, pages 1–155. 2013

work page 2013

[3] [3]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005. 46 AYMERIC BARADAT AND SOFIANE CHERF

work page 2005

[4] [4]

Baradat, A

A. Baradat, A. Hraivoronska, and F. Santambrogio. Using Sinkhorn in the JKO scheme adds linear diffusion, 2025

work page 2025

[5] [5]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York, 2nd edition, 2017

work page 2017

[6] [6]

Benamou and Y

J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

work page 2000

[7] [7]

Benamou, G

J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyré. Iterative Bregman projections for regularized transportation problems.SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015

work page 2015

[8] [8]

Benamou, G

J.-D. Benamou, G. Carlier, and L. Nenna. Generalized incompressible flows, multi-marginal trans- port and Sinkhorn algorithm.Numerische Mathematik, 142(1):33–54, 2019

work page 2019

[9] [9]

Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions.Communi- cations on Pure and Applied Mathematics, 44(4):375–417, 1991

work page 1991

[10] [10]

Brezis.Functional analysis, Sobolev spaces and partial differential equations

H. Brezis.Functional analysis, Sobolev spaces and partial differential equations. New York, NY: Springer, 2011

work page 2011

[11] [11]

Carlier, V

G. Carlier, V. Duval, G. Peyré, and B. Schmitzer. Convergence of Entropic Schemes for Optimal Transport and Gradient Flows.SIAM Journal on Mathematical Analysis, 49(2):1385–1418, 2017

work page 2017

[12] [12]

Carlier, K

G. Carlier, K. Eichinger, and A. Kroshnin. Entropic-Wasserstein barycenters: PDE characterization, regularity, and CLT.SIAM J. Math. Anal., 53(5):5880–5914, 2021

work page 2021

[13] [13]

Conforti and L

G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. J. Funct. Anal., 280(11), 2021

work page 2021

[14] [14]

M. Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. InAdvances in Neural Information Processing Systems, volume 26, 2013

work page 2013

[15] [15]

M. H. Duong, V. Laschos, and M. Renger. Wasserstein gradient flows from large deviations of many-particle limits.ESAIM Control Optim. Calc. Var., 19(4):1166–1188, 2013

work page 2013

[16] [16]

Erbar, J

M. Erbar, J. Maas, and D. R. M. Renger. From large deviations to Wasserstein gradient flows in multiple dimensions.Electron. Commun. Probab., 20, 2015

work page 2015

[17] [17]

Gentil, C

I. Gentil, C. Léonard, and L. Ripani. About the analogy between optimal transport and minimal entropy.Annales de la Faculté des sciences de Toulouse : Mathématiques, Ser. 6, 26(3):569–600, 2017

work page 2017

[18] [18]

Jordan, D

R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998

[19] [19]

Kallenberg.Foundations of Modern Probability

O. Kallenberg.Foundations of Modern Probability. Springer, New York, 2 edition, 2002

work page 2002

[20] [20]

C. Léonard. From the Schrödinger problem to the Monge–Kantorovich problem.Journal of Func- tional Analysis, 262(4):1879–1920, 2012

work page 1920

[21] [21]

C. Léonard. A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems, 34(4):1533–1574, 2014

work page 2014

[22] [22]

Malamut and M

H. Malamut and M. Sylvestre. Convergence rates of the regularized optimal transport: disentangling suboptimality and entropy.SIAM J. Math. Anal., 57(3):2533–2558, 2025

work page 2025

[23] [23]

R. J. McCann. A convexity principle for interacting gases.Adv. Math., 128(1):153–179, 1997

work page 1997

[24] [24]

F. Otto. Evolution of microstructure in unstable porous media flow: A relaxational approach. Communications on Pure and Applied Mathematics, 52(7):873–915, 1999

work page 1999

[25] [25]

G. Peyré. Entropic Approximation of Wasserstein Gradient Flows.SIAM Journal on Imaging Sciences, 8(4):2323–2351, 2015

work page 2015

[26] [26]

Santambrogio

F. Santambrogio. Optimal transport for applied mathematicians.Birkäuser, NY, 55(58-63):94, 2015

work page 2015

[27] [27]

Sinkhorn

R. Sinkhorn. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums.The American Mathematical Monthly, 74(4):402–405, 1967. Universite Claude Bernard Lyon 1, CNRS, Centrale Lyon, INSA Lyon, Université Jean Monnet, ICJ UMR5208, 43 bd du 11 Novembre 1918, 69622 Villeurbanne, France Email address:{baradat,cherf}@math.univ-lyon1.fr

work page 1967