Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
Pith reviewed 2026-05-24 07:54 UTC · model grok-4.3
The pith
The Sinkhorn algorithm converges in a scaled limit to a Wasserstein mirror gradient flow on the space of probability measures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The sequence of marginals from the Sinkhorn iterations converges to an absolutely continuous curve on the 2-Wasserstein space as ε → 0 with iterations scaled as 1/ε. This Sinkhorn flow is a Wasserstein mirror gradient flow, where the gradient is that of the relative entropy functional with respect to one marginal and the mirror is half of the squared Wasserstein distance functional from the other marginal. The norm of its velocity field can be interpreted as the metric derivative with respect to the linearized optimal transport distance. An equivalent description is the parabolic Monge-Ampère PDE. Conditions for exponential convergence are derived, and a McKean-Vlasov diffusion isconstructed
What carries the argument
The Sinkhorn flow, the limit of scaled Sinkhorn iterations, acting as a Wasserstein mirror gradient flow with relative entropy gradient and Wasserstein distance mirror.
If this is right
- The limiting flow satisfies the parabolic Monge-Ampère PDE.
- Exponential convergence of the flow to equilibrium holds under the derived conditions.
- Marginals of a constructed McKean-Vlasov diffusion coincide with the Sinkhorn flow.
- The speed of the flow equals the metric derivative with respect to the linearized optimal transport distance.
Where Pith is reading between the lines
- Sinkhorn iterations could serve as a practical numerical scheme to approximate solutions of the parabolic Monge-Ampère equation.
- The mirror flow structure may extend to other regularized optimal transport algorithms beyond the classical Sinkhorn procedure.
- The McKean-Vlasov construction opens a route to particle-based simulations of the flow for sampling or mean-field models.
Load-bearing premise
The number of Sinkhorn iterations must scale precisely as one over the regularization parameter epsilon, together with suitable technical conditions on the joint densities.
What would settle it
Numerical computation showing that marginals from Sinkhorn iterations with N exactly 1/epsilon fail to approach the solution of the parabolic Monge-Ampère PDE in the 2-Wasserstein metric as epsilon goes to zero, for some initial joint densities meeting the paper's assumptions.
Figures
read the original abstract
We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the $2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Amp\`{e}re PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves a convergence result for the Sinkhorn algorithm (or IPFP) applied to joint densities: as the regularization parameter ε tends to zero and the number of iterations is scaled proportionally to 1/ε (under additional technical assumptions on the densities), the sequence of marginals converges to an absolutely continuous curve in the 2-Wasserstein space. This limiting curve, termed the Sinkhorn flow, is identified as a Wasserstein mirror gradient flow, where the gradient is that of the relative entropy with respect to one marginal and the mirror is half the squared Wasserstein distance to the other marginal. The flow is also shown to satisfy a parabolic Monge-Ampère PDE, conditions for exponential convergence are derived, and a McKean-Vlasov diffusion is constructed whose marginals follow the flow.
Significance. If the main convergence theorem holds, the paper makes a significant contribution by bridging discrete iterative algorithms in optimal transport with continuous gradient flow dynamics in Wasserstein space through the new notion of Wasserstein mirror gradient flows. The explicit connection to the parabolic Monge-Ampère equation and the construction of the associated diffusion process provide concrete analytical tools. The result is self-contained with stated assumptions and offers potential for further developments in mean-field limits and numerical analysis of OT methods.
minor comments (3)
- [Abstract] Abstract: the scaling 'number of iterations is scaled as 1/ε' should be stated more precisely (e.g., whether N_ε ∼ C/ε for a fixed C or an exact floor function) since this scaling is load-bearing for the convergence statement.
- [References] The citation to Berman (2020) appears in the abstract and introduction but lacks a corresponding entry in the bibliography; this should be added for completeness.
- [§2] §2: the notation distinguishing the regularized joint densities from their marginals is introduced inline but would benefit from a short dedicated notation subsection to improve readability for readers unfamiliar with IPFP variants.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our manuscript, the accurate summary of the main results, and the recommendation for minor revision. The report correctly identifies the core contribution as the convergence of Sinkhorn iterations (with appropriate scaling) to a Wasserstein mirror gradient flow, along with the connections to the parabolic Monge-Ampère equation and the associated McKean-Vlasov diffusion. We are pleased that the significance of introducing this new notion of mirror gradient flows in Wasserstein space is recognized.
Circularity Check
No significant circularity; self-contained limit theorem
full rationale
The paper establishes a convergence result: under the scaling of iterations ~1/ε and technical assumptions on joint densities, the Sinkhorn/IPFP marginals converge in 2-Wasserstein space to an absolutely continuous curve (the Sinkhorn flow). This limit is then identified as a Wasserstein mirror gradient flow (with relative entropy gradient and W2 mirror) whose velocity norm matches the LOT metric derivative, and which satisfies a parabolic Monge-Ampère PDE previously linked to Sinkhorn by Berman (2020). The central derivation is a limit theorem with assumptions stated upfront; the new concept is introduced by direct analogy to Euclidean mirror flows rather than by self-referential definition or fitted parameters. No load-bearing self-citations, ansatz smuggling, or reductions by construction appear in the argument structure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption technical assumptions on joint densities and iteration scaling as ε → 0
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean; Foundation/BranchSelection.lean; Foundation/AbsoluteFloorClosure.leanwashburn_uniqueness_aczel; reality_from_one_distinction; branch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the sequence of marginals ... converges to an absolutely continuous curve on the 2-Wasserstein space ... Sinkhorn flow ... Wasserstein mirror gradient flow ... parabolic Monge-Ampère PDE
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
The entropic optimal (self-)transport problem: Limit distributions for decreasing regularization with application to score function estimation
Provides asymptotic distributions for entropic OT plans and potentials under vanishing regularization and links self-transport barycentric projections to score functions.
Reference graph
Works this paper leans on
-
[1]
[AC21] Kwangjun Ahn and Sinho Chewi, Efficient constrained sampling via the mirror-Langevin algorithm, Advances in Neural Information Pro- cessing Systems 34 (2021), 28405–28418. [ACB17] Martin Arjovsky, Soumith Chintala, and L´ eon Bottou, Wasserstein generative adversarial networks, International conference on machine learning, PMLR, 2017, pp. 214–223. ...
work page 2021
-
[2]
[AMTU01] Anton Arnold, Peter Markowich, Giuseppe Toscani, and Andreas Un- terreiter, On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker-Planck type equations , Communications in Partial Differential Equations 26 (2001), no. 1-2, 43–100. [AS07] Luigi Ambrosio and Giuseppe Savar´ e, Gradient flows of probability measures, H...
work page 2001
-
[3]
[CCCC20] Tianji Cai, Junyi Cheng, Nathaniel Craig, and Katy Craig, Linearized optimal transport for collider events , Physical Review D 102 (2020), no. 11, 116019. [CCGT22] Alberto Chiarini, Giovanni Conforti, Giacomo Greco, and Luca Tamanini, Gradient estimates for the Schr¨ odinger potentials: conver- gence to the Brenier map and quantitative stability ...
-
[4]
, Kinetic and Related Models 15 (2022), no
MR 4316725 [CD22] Louis-Pierre Chaintron and Antoine Diez, Propagation of chaos: A review of models, methods and applications. , Kinetic and Related Models 15 (2022), no. 6, 1017–1173. [CGP16] Yongxin Chen, Tryphon Georgiou, and Michele Pavon, Entropic and displacement interpolation: a computational approach using the Hilbert metric , SIAM J. Appl. Math. ...
-
[5]
MR 4232667 [Cut13] Marco Cuturi, Sinkhorn distances: Lightspeed computation of opti- mal transport, Advances in neural information processing systems 26 46 NABARUN DEB, YOUNG-HEON KIM, SOUMIK PAL, AND GEOFFREY SCHIEBINGER (2013). [CZHS22] L´ ena¨ ıc Chizat, Stephen Zhang, Matthieu Heitz, and Geoffrey Schiebinger, Trajectory inference via mean-field Langev...
-
[6]
[EN22a] Stephan Eckstein and Marcel Nutz, Convergence rates for regularized optimal transport via quantization , arXiv preprint arXiv:2208.14391 (2022). [EN22b] , Quantitative stability of regularized optimal transport and convergence of Sinkhorn’s algorithm, SIAM J. Math. Anal. 54 (2022), no. 6, 5922–5948. MR 4506579 [FL89] Joel Franklin and Jens Lorenz,...
-
[7]
MR 4460341 [GT18] Nicola Gigli and Luca Tamanini, Second order differentiation formula on RCD(K, N) spaces, Atti Accad. Naz. Lincei Rend. Lincei Mat. Appl. 29 (2018), no. 2, 377–386. MR 3797990 [HJ94] Roger A Horn and Charles R Johnson, Topics in matrix analysis , Cambridge university press,
work page 2018
-
[8]
[HS87] Richard Holley and Daniel Stroock, Logarithmic Sobolev inequalities and stochastic Ising models, J. Statist. Phys.46 (1987), no. 5-6, 1159–
work page 1987
-
[9]
MR 893137 [HY97] Shima Hirohiko and Katsumi Yagi, Geometry of Hessian maifolds , Differential Geometry and its Applications 7 (1997), 277–290. [Jab14] Pierre-Emmanuel Jabin, A review of the mean field limits for Vlasov equations, Kinetic and Related Models 7 (2014), no. 4, 661–711. [JKO98] Richard Jordan, David Kinderlehrer, and Felix Otto, The variationa...
work page 1997
-
[10]
on the theory of Brownian motion
[KSW12] Young-Heon Kim, Jeffrey Streets, and Micah Warren, Parabolic op- timal transport equations on manifolds , International Mathematics Research Notices 2012 (2012), no. 19, 4325–4350. [L´ eg21] Flavien L´ eger,A gradient descent perspective on Sinkhorn , Applied Mathematics & Optimization 84 (2021), no. 2, 1843–1855. [L´ eo12] Christian L´ eonard,Fro...
work page 2012
-
[11]
[LM00] Beatrice Laurent and Pascal Massart, Adaptive estimation of a qua- dratic functional by model selection, Annals of Statistics (2000), 1302–
work page 2000
-
[12]
[MC23] Caroline Moosm¨ uller and Alexander Cloninger,Linear optimal trans- port embedding: provable Wasserstein classification for certain rigid transformations and perturbations , Information and Inference: A Journal of the IMA 12 (2023), no. 1, 363–389. [McC95] Robert J. McCann, Existence and uniqueness of monotone measure- preserving maps, Duke Mathema...
work page 2023
-
[13]
[McK75] H. P. McKean, Fluctuations in the kinetic theory of gases , Communi- cations on Pure and Applied Mathematics 28 (1975), no. 4, 435–455. [Mik04] Toshio Mikami, Monge’s problem with a quadratic cost by the zero- noise limit of h-path processes, Probab. Theory Related Fields 129 (2004), no. 2, 245–260. MR 2063377 [MJ66] Henry P McKean Jr, A class of ...
work page 1975
-
[14]
[Ott01] Felix Otto, The geometry of dissipative evolution equations: The porous medium equation , Communications in Partial Differential Equations (2001). [OV00] F. Otto and C. Villani, Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality , J. Funct. Anal. 173 (2000), no. 2, 361–400. MR 1760620 [Pal19] Soumik Pal,...
-
[15]
[R¨ us95] Ludger R¨ uschendorf,Convergence of the iterative proportional fitting procedure, The Annals of Statistics (1995), 1160–1174. [RW23] Cale Rankin and Leonard Wong, Bregman-Wasserstein divergence: geometry and applications , Math arxiv preprint arXiv:2302.05833,
-
[16]
[San15] Filippo Santambrogio, Optimal transport for applied mathematicians, Birk¨ auser, NY55 (2015), no. 58-63,
work page 2015
-
[17]
Score-Based Generative Modeling through Stochastic Differential Equations
[SDWMG15] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli, Deep unsupervised learning using nonequilibrium thermody- namics, International Conference on Machine Learning, PMLR, 2015, pp. 2256–2265. [Sin67] Richard Sinkhorn, Diagonal equivalence to matrices with prescribed row and column sums , The American Mathematical Monthly 7...
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
MR 1964483 [Vil12] C. Villani, Optimal transportation, dissipative PDE’s and func- tional inequalities , Unpublished lecture notes. Accessed from https://cedricvillani.org/sites/dev/files/old_images/ 2012/08/B04.MFranca.pdf,
work page 2012
-
[19]
[Wan01] Feng-Yu Wang, Logarithmic Sobolev inequalities: conditions and counterexamples, J. Operator Theory 46 (2001), no. 1, 183–197. MR 1862186 [Wil18] Ashia Wilson, Lyapunov arguments in optimization , Phd thesis, UC Berkeley, 2018, Available at https://escholarship.org/uc/item/ 1116c975. [WL20] Yifei Wang and Wuchen Li, Information Newton’s flow: secon...
-
[20]
[WSB+13] Wei Wang, Dejan Slepˇ cev, Saurav Basu, John A. Ozolek, and Gus- tavo K. Rohde, A linear optimal transportation framework for quan- tifying and visualizing variations in sets of images , Int. J. Comput. Vis. 101 (2013), no. 2, 254–269. MR 3021062 [ZPFP20] Kelvin Shuangjian Zhang, Gabriel Peyr´ e, Jalal Fadili, and Marcelo Pereyra, Wasserstein con...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.