L² over Wasserstein: Statistical Analysis for Optimal Transport
Pith reviewed 2026-05-21 03:01 UTC · model grok-4.3
The pith
The L² over Wasserstein space equips random probability measures with the Riemannian structure of optimal transport.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper introduces the L² over Wasserstein space and establishes that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. Ensemble statistical convergence results of the optimal transport machinery are obtained using the empirical measure within the L² over Wasserstein framework. In the setting of Bayesian non-parametrics, Schwartz's consistency theorem is refined to the Wasserstein topology, yielding posterior convergence of the same machin
What carries the argument
The L² over Wasserstein space of square-integrable random probability measures, equipped with a metric and geodesic structure that directly inherits the Riemannian geometry of the Wasserstein space via distance and geodesic characterizations.
If this is right
- Random gradient flow dynamics become definable on spaces of uncertain probability measures.
- Statistical convergence of optimal transport quantities holds in an ensemble sense via the empirical measure.
- Bayesian posterior distributions converge in the L² over Wasserstein space once they converge in the Wasserstein topology.
- Random token sampling paths in transformer models can be embedded as instances of the random gradient flow dynamics.
Where Pith is reading between the lines
- The framework may support new sampling procedures that propagate uncertainty directly through the measure space.
- Links could be explored to stochastic differential equations on spaces of measures for more robust generative models.
- Numerical checks on synthetic random measures could verify whether predicted convergence rates match observed behavior.
Load-bearing premise
Random probability measures are square-integrable with respect to the Wasserstein metric in a manner that permits the L² construction to inherit the full Riemannian structure without additional regularity or measurability conditions.
What would settle it
An explicit pair of random measures for which the distance in the L² over Wasserstein space deviates from the integrated squared Wasserstein distance between their realizations would falsify the claimed inheritance of the Riemannian structure.
Figures
read the original abstract
Optimal transport provides an inherently geometric and highly structured framework for studying spaces of probability measures, supplying a rich theoretical toolkit for contemporary statistics, machine learning, and generative modelling. In applications, however, the measures of interest are almost never known precisely, calling for a theory of optimal transport that accounts for statistical uncertainty. We construct such a framework, lifting the classical theory to the setting of random probability measures. We introduce the $L^2$ over Wasserstein space establishing that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. We ensemble statistical convergence results of the optimal transport machinery using the empirical measure within the $L^2$ over Wasserstein framework. Moreover, in the setting of Bayesian non-parametrics, we refine Schwartz's consistency theorem to the Wasserstein topology and deduce posterior convergence of the same machinery in the $L^2$ over Wasserstein space. We demonstrate that the growing theory of random token sampling for transformer models using self-attention flow paths can be embedded into the our framework. The results provide a unified treatment of random optimal transport and its consequences for principled inference and generative modelling under the statistical uncertainty of random sampling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the L² over Wasserstein space for random probability measures. It claims that this space inherits the formal Riemannian structure of the Wasserstein space through explicit characterizations of distances and geodesic geometry. The inherited structure is used to induce random flows whose sample paths are Wasserstein gradient flows. The paper further derives ensemble statistical convergence results for empirical measures inside this L² framework, refines Schwartz's consistency theorem to obtain posterior convergence in the Wasserstein topology, and embeds self-attention flow paths from transformer token sampling into the same setting.
Significance. If the claimed inheritance of the Riemannian structure is established with the necessary regularity, the construction supplies a unified geometric setting for optimal transport under statistical uncertainty. This would directly support rigorous analysis of random gradient flows, empirical convergence, and Bayesian posterior consistency in the Wasserstein metric, with immediate relevance to generative modeling and transformer dynamics.
major comments (1)
- [Section introducing the L² space and its Riemannian structure (distance and geodesic characterization)] The central claim that the L² over Wasserstein space inherits the full Riemannian structure (including the Otto metric on tangent spaces) rests on characterizing distances and geodesics. However, the lift of tangent vectors requires that almost-sure sample paths admit densities whose velocity fields solve the continuity equation in L²; without explicit integrability or finite-Fisher-information conditions on these paths, the inner product defined by expectation of base inner products may fail to reproduce the base geometry or gradient-flow dynamics. Please supply the precise statement of these conditions and the verification that they hold under the stated assumptions on the random measures.
minor comments (1)
- [Abstract] Abstract contains the phrase 'embedded into the our framework'; this should be corrected to 'embedded into our framework'.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. The observation concerning regularity conditions for the tangent-space lift is well taken, and we have revised the manuscript to supply the requested precise statements and verifications while preserving the original claims.
read point-by-point responses
-
Referee: [Section introducing the L² space and its Riemannian structure (distance and geodesic characterization)] The central claim that the L² over Wasserstein space inherits the full Riemannian structure (including the Otto metric on tangent spaces) rests on characterizing distances and geodesics. However, the lift of tangent vectors requires that almost-sure sample paths admit densities whose velocity fields solve the continuity equation in L²; without explicit integrability or finite-Fisher-information conditions on these paths, the inner product defined by expectation of base inner products may fail to reproduce the base geometry or gradient-flow dynamics. Please supply the precise statement of these conditions and the verification that they hold under the stated assumptions on the random measures.
Authors: We agree that the full inheritance of the Otto metric on tangent spaces requires explicit regularity on the sample paths. In the revised manuscript we have inserted, immediately after the definition of the L² over Wasserstein space, the standing assumption that almost every realization is absolutely continuous with respect to Lebesgue measure, possesses a density in L² with finite Fisher information, and that the associated velocity fields lie in L² and satisfy the continuity equation. Under these conditions we prove (new Lemma 3.4) that the expectation of the base inner products reproduces the Otto metric almost surely and that the induced random flows remain Wasserstein gradient flows. The verification is now stated as a proposition with a short proof in the appendix; the main theorems are unaffected. revision: yes
Circularity Check
No circularity: L² over Wasserstein is a constructive lift via distance and geodesic characterization
full rationale
The paper constructs the L² over Wasserstein space as a direct extension of the classical Wasserstein space to random probability measures. Inheritance of the Riemannian structure is claimed through explicit characterization of distances and geodesic geometry, which constitutes a definitional construction rather than a reduction to prior fitted parameters, self-citations, or renamed empirical patterns. Subsequent results on statistical convergence, refined Schwartz consistency, and embedding of transformer sampling flows are presented as applications of the framework, not as inputs that force the core claims. No load-bearing step in the derivation chain reduces by construction to the paper's own inputs or unverified self-references; the framework remains self-contained as a theoretical extension.
Axiom & Free-Parameter Ledger
invented entities (1)
-
L² over Wasserstein space
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce the L² over Wasserstein space L²_W(R^d) … inherits the formal Riemannian structure … random gradient flow dynamics
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
constant-speed geodesics … Benamou-Brenier … displacement interpolation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Monge, Gaspard , publisher =. M
- [3]
- [4]
-
[5]
Communications on Pure and Applied Mathematics , volume=
Polar decomposition and monotone rearrangement of vector fields , author=. Communications on Pure and Applied Mathematics , volume=. 1991 , publisher=
work page 1991
-
[6]
Random Measures, Theory and Applications , author=. 2017 , publisher=. doi:10.1007/978-3-319-41598-7 , isbn=
-
[7]
Alessandro Pinzi and Giuseppe Savaré , year=. Totally convex functions,. 2509.01768 , archivePrefix=
- [8]
-
[9]
Gradient Flows in Metric Spaces and in the Space of Probability Measures , author =. 2008 , edition =
work page 2008
-
[10]
Weak convergence of measures on separable metric spaces , author=. Sankhy. 1958 , publisher=
work page 1958
-
[11]
Panaretos and Yoav Zemel , year=
Victor M. Panaretos and Yoav Zemel , year=. An Invitation to Statistics in
-
[12]
Fournier, Nicolas and Guillin, Arnaud , journal=. On the rate of convergence in. 2015 , publisher=. doi:10.1007/s00440-014-0583-7 , url=
-
[13]
The Annals of Probability , year =
Central Limit Theorems for Empirical Transportation Cost in General Dimension , author =. The Annals of Probability , year =
-
[14]
Information and Inference: A Journal of the
Goldfeld, Ziv and Kato, Kengo and Rioux, Gabriel and Sadhu, Ritwik , title =. Information and Inference: A Journal of the. 2024 , issn =. doi:10.1093/imaiai/iaad056 , url =
-
[15]
IMA Journal of Numerical Analysis , author =
Li, Wenbo and Nochetto, Ricardo H. , title =. IMA Journal of Numerical Analysis , volume =. 2020 , issn =. doi:10.1093/imanum/draa045 , url =
-
[16]
Goldfeld, Ziv and Greenewald, Kristjan and Polyanskiy, Yury and Niles-Weed, Jonathan , title =. 2019 , eprint =
work page 2019
-
[17]
Schwartz, Lorraine , journal =. On. 1965 , volume =
work page 1965
-
[18]
Nested superposition principle for random measures and the geometry of the
Alessandro Pinzi and Giuseppe Savaré , year=. Nested superposition principle for random measures and the geometry of the. 2510.07523 , archivePrefix=
-
[19]
Merging Rate of Opinions via Optimal Transport on Random Measures , author=. 2023 , eprint=
work page 2023
-
[20]
Bartl, Daniel and Beiglb. The. Journal of the European Mathematical Society , year =
-
[21]
Absolutely Continuous Curves of Stochastic Processes , author=. 2025 , eprint=
work page 2025
- [22]
-
[23]
Riemannian Geometry , author =. 1992 , publisher =. doi:10.1007/978-1-4757-2201-7 , note =
-
[24]
Archive for Rational Mechanics and Analysis , volume=
The geometry of dissipative evolution equations: the porous medium equation , author=. Archive for Rational Mechanics and Analysis , volume=. 2001 , publisher=
work page 2001
-
[25]
A General Duality Theorem for the
Beiglb. A General Duality Theorem for the. 2012 , journal =
work page 2012
-
[26]
A computational fluid mechanics solution to the
Benamou, Jean-David and Brenier, Yann , journal=. A computational fluid mechanics solution to the. 2000 , publisher=
work page 2000
-
[27]
Dudley, R. M. , year =. Uniform Central Limit Theorems , publisher =
-
[28]
Chassagneux, J.-F. and Elie, R. and Frikha, N. , journal=. Quantitative approximation of the. 2020 , note=
work page 2020
-
[29]
Nietert, Scott and Goldfeld, Ziv and Kato, Kengo , booktitle=. Smooth p -. 2021 , organization=
work page 2021
- [30]
-
[31]
Wang, Feng-Yu , journal=
-
[32]
Zhou, Yidong and M. Biometrics , publisher=. 2024 , month=. doi:10.1093/biomtc/ujae127 , number=
-
[33]
Gelman, Andrew and Carlin, John B. and Stern, Hal S. and Dunson, David B. and Vehtari, Aki and Rubin, Donald B. , edition =. 2013 , isbn =
work page 2013
-
[34]
Bernton, Espen and Jacob, Pierre E and Gerber, Mathieu and Robert, Christian P , journal=. Approximate. 2019 , publisher=
work page 2019
-
[35]
Deshpande, Aditya and others , year =. Approximate. 1910.12815 , archivePrefix =
-
[36]
and Tsiamis, Anastasios and Lygeros, John , year =
Micheli, Francesco and Balta, Efe C. and Tsiamis, Anastasios and Lygeros, John , year =. 2503.20341 , archivePrefix =
-
[37]
An Optimal Transport-Based Generative Model for
Ke Li and Wei Han and Yuexi Wang and Yun Yang , year=. An Optimal Transport-Based Generative Model for. 2504.08214 , archivePrefix=
-
[38]
Anton Mallasto and Markus Heinonen and Samuel Kaski , year=. 2010.09327 , archivePrefix=
-
[39]
Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =
Cuturi, Marco , booktitle =. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =. 2013 , url =
work page 2013
- [40]
-
[41]
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention Is All You Need , booktitle =. 2017 , publisher =
work page 2017
- [42]
- [43]
-
[44]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
A multiscale analysis of mean-field transformers in the moderate interaction regime , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[45]
and Ablin, Pierre and Blondel, Mathieu and Peyr
Sander, Michael E. and Ablin, Pierre and Blondel, Mathieu and Peyr. Sinkformers: Transformers with Doubly Stochastic Attention , year =. 2110.11773 , archivePrefix =
-
[46]
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models , author=. 2026 , eprint=
work page 2026
-
[47]
On the Structure of Stationary Solutions to
Krishnakumar Balasubramanian and Sayan Banerjee and Philippe Rigollet , year=. On the Structure of Stationary Solutions to. 2510.20094 , archivePrefix=
- [48]
-
[49]
Proceedings of the National Academy of Sciences , volume=
Proof of the ergodic theorem , author=. Proceedings of the National Academy of Sciences , volume=. 1931 , publisher=
work page 1931
-
[50]
Covariance inequalities for strongly mixing processes , author=. Annales de la facult. 1993 , publisher=
work page 1993
-
[51]
Mathematical Centre Tracts , volume=
Random walks with stationary increments and renewal theory , author=. Mathematical Centre Tracts , volume=. 1979 , publisher=
work page 1979
-
[52]
Theory of Probability & Its Applications , volume=
Some limit theorems for stationary processes , author=. Theory of Probability & Its Applications , volume=. 1962 , publisher=
work page 1962
- [53]
- [54]
-
[55]
von Renesse, Max-K and Sturm, Karl-Theodor , journal=. Entropic measure and. 2009 , publisher=. doi:10.1214/08-AOP430 , url=
-
[56]
Fukushima, Masatoshi and Oshima, Yoichi and Takeda, Masayoshi , year =
-
[57]
Lorenzo Dello Schiavo , year=. Massive Particle Systems,. 2411.14936 , archivePrefix=
-
[58]
Lorenzo Dello Schiavo , journal=. A. 2018 , url=
work page 2018
-
[59]
Advances in Mathematics , volume=
A convexity principle for interacting gases , author=. Advances in Mathematics , volume=. 1997 , publisher=
work page 1997
- [60]
-
[61]
L. Ambrosio, N. Gigli, and G. Savar \'e . Gradient Flows in Metric Spaces and in the Space of Probability Measures . Lectures in Mathematics ETH Z \"u rich . Birkh \"a user, Basel, second edition, 2008
work page 2008
-
[62]
B. Acciaio, D. Kršek, G. Pammer, and M. Rodrigues. Absolutely continuous curves of stochastic processes, 2025
work page 2025
-
[63]
A. S. Bandeira. Ten lectures and forty-two problems in the mathematics of information, 2015. Lecture Notes, ETH Zurich
work page 2015
-
[64]
J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik , 84(3):375--393, 2000
work page 2000
- [65]
- [66]
-
[67]
K. Balasubramanian, S. Banerjee, and P. Rigollet. On the structure of stationary solutions to McKean-Vlasov equations with applications to noisy transformers, 2025
work page 2025
-
[68]
H. Berbee. Random walks with stationary increments and renewal theory. Mathematical Centre Tracts , 112, 1979
work page 1979
-
[69]
G. D. Birkhoff. Proof of the ergodic theorem. Proceedings of the National Academy of Sciences , 17(2):656--660, 1931
work page 1931
-
[70]
E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate B ayesian computation with the W asserstein distance. Journal of the Royal Statistical Society Series B: Statistical Methodology , 81(2):235--269, 2019
work page 2019
-
[71]
M. Beiglb \"o ck, C. L \'e onard, and W. Schachermayer. A general duality theorem for the Monge--Kantorovich transport problem. Studia Mathematica , 209(2):151--167, 2012
work page 2012
- [72]
-
[73]
Y. Brenier. Polar decomposition and monotone rearrangement of vector fields. Communications on Pure and Applied Mathematics , 44(4):375--417, 1991
work page 1991
-
[74]
J.-F. Chassagneux, R. Elie, and N. Frikha. Quantitative approximation of the B urgers and K eller-- S egel equations by moderately interacting particles. HAL preprint , 2020. hal-02537226
work page 2020
-
[75]
M. Catalano and H. Lavenant. Merging rate of opinions via optimal transport on random measures, 2023
work page 2023
- [76]
-
[77]
M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems , volume 26. Curran Associates, Inc., 2013
work page 2013
-
[78]
A. Deshpande et al. Approximate Bayesian computation with the sliced- Wasserstein distance, 2019
work page 2019
-
[79]
E. del Barrio and J.-M. Loubes. Central limit theorems for empirical transportation cost in general dimension. The Annals of Probability , 47(2):926--951, 2019
work page 2019
-
[80]
M. P. do Carmo. Riemannian Geometry . Birkh \"a user Boston, Boston, MA, 1992. Translated from the Portuguese by Francis Flaherty
work page 1992
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.