pith. sign in

arxiv: 2605.21365 · v1 · pith:UGI5TRCVnew · submitted 2026-05-20 · 🧮 math.ST · stat.ML· stat.TH

L² over Wasserstein: Statistical Analysis for Optimal Transport

Pith reviewed 2026-05-21 03:01 UTC · model grok-4.3

classification 🧮 math.ST stat.MLstat.TH
keywords optimal transportWasserstein spaceL2 over Wassersteinrandom probability measuresgradient flowsstatistical convergenceBayesian consistency
0
0 comments X

The pith

The L² over Wasserstein space equips random probability measures with the Riemannian structure of optimal transport.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a statistical extension of optimal transport by defining the L² over Wasserstein space for random probability measures. It shows that this space inherits the formal Riemannian structure of the classical Wasserstein space through explicit characterizations of distances and geodesic geometry. The resulting structure supports random flows whose sample paths follow Wasserstein gradient flows and enables ensemble convergence results for empirical measures. It also refines Bayesian consistency theorems so that posterior convergence holds in the new space. This setup matters because it supplies a unified way to perform inference and generative modeling when the underlying measures themselves carry statistical uncertainty.

Core claim

The paper introduces the L² over Wasserstein space and establishes that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. Ensemble statistical convergence results of the optimal transport machinery are obtained using the empirical measure within the L² over Wasserstein framework. In the setting of Bayesian non-parametrics, Schwartz's consistency theorem is refined to the Wasserstein topology, yielding posterior convergence of the same machin

What carries the argument

The L² over Wasserstein space of square-integrable random probability measures, equipped with a metric and geodesic structure that directly inherits the Riemannian geometry of the Wasserstein space via distance and geodesic characterizations.

If this is right

  • Random gradient flow dynamics become definable on spaces of uncertain probability measures.
  • Statistical convergence of optimal transport quantities holds in an ensemble sense via the empirical measure.
  • Bayesian posterior distributions converge in the L² over Wasserstein space once they converge in the Wasserstein topology.
  • Random token sampling paths in transformer models can be embedded as instances of the random gradient flow dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework may support new sampling procedures that propagate uncertainty directly through the measure space.
  • Links could be explored to stochastic differential equations on spaces of measures for more robust generative models.
  • Numerical checks on synthetic random measures could verify whether predicted convergence rates match observed behavior.

Load-bearing premise

Random probability measures are square-integrable with respect to the Wasserstein metric in a manner that permits the L² construction to inherit the full Riemannian structure without additional regularity or measurability conditions.

What would settle it

An explicit pair of random measures for which the distance in the L² over Wasserstein space deviates from the integrated squared Wasserstein distance between their realizations would falsify the claimed inheritance of the Riemannian structure.

Figures

Figures reproduced from arXiv: 2605.21365 by Pengcheng Ye, Riccardo Passeggeri, Rohan M. Shenoy.

Figure 3.1
Figure 3.1. Figure 3.1: Interaction between the different spaces. Stochasticity denotes moving from a space to L 2 functions (random elements) on the space. Laws denote moving from an L 2 random variable to its probability distribution. Superposition denotes the embedding of one space into a higher space via a Dirac distribution (recall from section 2.2 the superposition of dynamics on R d to dynamics on P2(R d )). (Nested) Sup… view at source ↗
read the original abstract

Optimal transport provides an inherently geometric and highly structured framework for studying spaces of probability measures, supplying a rich theoretical toolkit for contemporary statistics, machine learning, and generative modelling. In applications, however, the measures of interest are almost never known precisely, calling for a theory of optimal transport that accounts for statistical uncertainty. We construct such a framework, lifting the classical theory to the setting of random probability measures. We introduce the $L^2$ over Wasserstein space establishing that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. We ensemble statistical convergence results of the optimal transport machinery using the empirical measure within the $L^2$ over Wasserstein framework. Moreover, in the setting of Bayesian non-parametrics, we refine Schwartz's consistency theorem to the Wasserstein topology and deduce posterior convergence of the same machinery in the $L^2$ over Wasserstein space. We demonstrate that the growing theory of random token sampling for transformer models using self-attention flow paths can be embedded into the our framework. The results provide a unified treatment of random optimal transport and its consequences for principled inference and generative modelling under the statistical uncertainty of random sampling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces the L² over Wasserstein space for random probability measures. It claims that this space inherits the formal Riemannian structure of the Wasserstein space through explicit characterizations of distances and geodesic geometry. The inherited structure is used to induce random flows whose sample paths are Wasserstein gradient flows. The paper further derives ensemble statistical convergence results for empirical measures inside this L² framework, refines Schwartz's consistency theorem to obtain posterior convergence in the Wasserstein topology, and embeds self-attention flow paths from transformer token sampling into the same setting.

Significance. If the claimed inheritance of the Riemannian structure is established with the necessary regularity, the construction supplies a unified geometric setting for optimal transport under statistical uncertainty. This would directly support rigorous analysis of random gradient flows, empirical convergence, and Bayesian posterior consistency in the Wasserstein metric, with immediate relevance to generative modeling and transformer dynamics.

major comments (1)
  1. [Section introducing the L² space and its Riemannian structure (distance and geodesic characterization)] The central claim that the L² over Wasserstein space inherits the full Riemannian structure (including the Otto metric on tangent spaces) rests on characterizing distances and geodesics. However, the lift of tangent vectors requires that almost-sure sample paths admit densities whose velocity fields solve the continuity equation in L²; without explicit integrability or finite-Fisher-information conditions on these paths, the inner product defined by expectation of base inner products may fail to reproduce the base geometry or gradient-flow dynamics. Please supply the precise statement of these conditions and the verification that they hold under the stated assumptions on the random measures.
minor comments (1)
  1. [Abstract] Abstract contains the phrase 'embedded into the our framework'; this should be corrected to 'embedded into our framework'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The observation concerning regularity conditions for the tangent-space lift is well taken, and we have revised the manuscript to supply the requested precise statements and verifications while preserving the original claims.

read point-by-point responses
  1. Referee: [Section introducing the L² space and its Riemannian structure (distance and geodesic characterization)] The central claim that the L² over Wasserstein space inherits the full Riemannian structure (including the Otto metric on tangent spaces) rests on characterizing distances and geodesics. However, the lift of tangent vectors requires that almost-sure sample paths admit densities whose velocity fields solve the continuity equation in L²; without explicit integrability or finite-Fisher-information conditions on these paths, the inner product defined by expectation of base inner products may fail to reproduce the base geometry or gradient-flow dynamics. Please supply the precise statement of these conditions and the verification that they hold under the stated assumptions on the random measures.

    Authors: We agree that the full inheritance of the Otto metric on tangent spaces requires explicit regularity on the sample paths. In the revised manuscript we have inserted, immediately after the definition of the L² over Wasserstein space, the standing assumption that almost every realization is absolutely continuous with respect to Lebesgue measure, possesses a density in L² with finite Fisher information, and that the associated velocity fields lie in L² and satisfy the continuity equation. Under these conditions we prove (new Lemma 3.4) that the expectation of the base inner products reproduces the Otto metric almost surely and that the induced random flows remain Wasserstein gradient flows. The verification is now stated as a proposition with a short proof in the appendix; the main theorems are unaffected. revision: yes

Circularity Check

0 steps flagged

No circularity: L² over Wasserstein is a constructive lift via distance and geodesic characterization

full rationale

The paper constructs the L² over Wasserstein space as a direct extension of the classical Wasserstein space to random probability measures. Inheritance of the Riemannian structure is claimed through explicit characterization of distances and geodesic geometry, which constitutes a definitional construction rather than a reduction to prior fitted parameters, self-citations, or renamed empirical patterns. Subsequent results on statistical convergence, refined Schwartz consistency, and embedding of transformer sampling flows are presented as applications of the framework, not as inputs that force the core claims. No load-bearing step in the derivation chain reduces by construction to the paper's own inputs or unverified self-references; the framework remains self-contained as a theoretical extension.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the central new object; full paper would be needed to audit them.

invented entities (1)
  • L² over Wasserstein space no independent evidence
    purpose: To model random probability measures while inheriting Wasserstein Riemannian geometry
    Introduced in the abstract as the core new construction; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5777 in / 1129 out tokens · 48015 ms · 2026-05-21T03:01:44.975776+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages

  1. [1]

    2025 , publisher=

    Statistical Optimal Transport , author=. 2025 , publisher=

  2. [2]

    Monge, Gaspard , publisher =. M

  3. [3]

    , title =

    Kantorovich, Leonid V. , title =. Comptes Rendus (Doklady) de l'Acad. 1942 , volume =

  4. [4]

    2009 , publisher=

    Optimal Transport: Old and New , author=. 2009 , publisher=

  5. [5]

    Communications on Pure and Applied Mathematics , volume=

    Polar decomposition and monotone rearrangement of vector fields , author=. Communications on Pure and Applied Mathematics , volume=. 1991 , publisher=

  6. [6]

    2017 , publisher=

    Random Measures, Theory and Applications , author=. 2017 , publisher=. doi:10.1007/978-3-319-41598-7 , isbn=

  7. [7]

    Totally convex functions,

    Alessandro Pinzi and Giuseppe Savaré , year=. Totally convex functions,. 2509.01768 , archivePrefix=

  8. [8]

    Topics in

    Villani, Cédric , year =. Topics in

  9. [9]

    2008 , edition =

    Gradient Flows in Metric Spaces and in the Space of Probability Measures , author =. 2008 , edition =

  10. [10]

    Weak convergence of measures on separable metric spaces , author=. Sankhy. 1958 , publisher=

  11. [11]

    Panaretos and Yoav Zemel , year=

    Victor M. Panaretos and Yoav Zemel , year=. An Invitation to Statistics in

  12. [12]

    On the rate of convergence in

    Fournier, Nicolas and Guillin, Arnaud , journal=. On the rate of convergence in. 2015 , publisher=. doi:10.1007/s00440-014-0583-7 , url=

  13. [13]

    The Annals of Probability , year =

    Central Limit Theorems for Empirical Transportation Cost in General Dimension , author =. The Annals of Probability , year =

  14. [14]

    Information and Inference: A Journal of the

    Goldfeld, Ziv and Kato, Kengo and Rioux, Gabriel and Sadhu, Ritwik , title =. Information and Inference: A Journal of the. 2024 , issn =. doi:10.1093/imaiai/iaad056 , url =

  15. [15]

    IMA Journal of Numerical Analysis , author =

    Li, Wenbo and Nochetto, Ricardo H. , title =. IMA Journal of Numerical Analysis , volume =. 2020 , issn =. doi:10.1093/imanum/draa045 , url =

  16. [16]

    2019 , eprint =

    Goldfeld, Ziv and Greenewald, Kristjan and Polyanskiy, Yury and Niles-Weed, Jonathan , title =. 2019 , eprint =

  17. [17]

    Schwartz, Lorraine , journal =. On. 1965 , volume =

  18. [18]

    Nested superposition principle for random measures and the geometry of the

    Alessandro Pinzi and Giuseppe Savaré , year=. Nested superposition principle for random measures and the geometry of the. 2510.07523 , archivePrefix=

  19. [19]

    2023 , eprint=

    Merging Rate of Opinions via Optimal Transport on Random Measures , author=. 2023 , eprint=

  20. [20]

    Bartl, Daniel and Beiglb. The. Journal of the European Mathematical Society , year =

  21. [21]

    2025 , eprint=

    Absolutely Continuous Curves of Stochastic Processes , author=. 2025 , eprint=

  22. [22]

    2001 , publisher=

    A Course in Metric Geometry , author=. 2001 , publisher=

  23. [23]

    Riemannian geometry

    Riemannian Geometry , author =. 1992 , publisher =. doi:10.1007/978-1-4757-2201-7 , note =

  24. [24]

    Archive for Rational Mechanics and Analysis , volume=

    The geometry of dissipative evolution equations: the porous medium equation , author=. Archive for Rational Mechanics and Analysis , volume=. 2001 , publisher=

  25. [25]

    A General Duality Theorem for the

    Beiglb. A General Duality Theorem for the. 2012 , journal =

  26. [26]

    A computational fluid mechanics solution to the

    Benamou, Jean-David and Brenier, Yann , journal=. A computational fluid mechanics solution to the. 2000 , publisher=

  27. [27]

    Dudley, R. M. , year =. Uniform Central Limit Theorems , publisher =

  28. [28]

    and Elie, R

    Chassagneux, J.-F. and Elie, R. and Frikha, N. , journal=. Quantitative approximation of the. 2020 , note=

  29. [29]

    Smooth p -

    Nietert, Scott and Goldfeld, Ziv and Kato, Kengo , booktitle=. Smooth p -. 2021 , organization=

  30. [30]

    and Nendel, M

    Fischer, M. and Nendel, M. , journal=. Sharp L^q -Convergence Rate in p -

  31. [31]

    Wang, Feng-Yu , journal=

  32. [32]

    Biometrics , publisher=

    Zhou, Yidong and M. Biometrics , publisher=. 2024 , month=. doi:10.1093/biomtc/ujae127 , number=

  33. [33]

    and Stern, Hal S

    Gelman, Andrew and Carlin, John B. and Stern, Hal S. and Dunson, David B. and Vehtari, Aki and Rubin, Donald B. , edition =. 2013 , isbn =

  34. [34]

    Approximate

    Bernton, Espen and Jacob, Pierre E and Gerber, Mathieu and Robert, Christian P , journal=. Approximate. 2019 , publisher=

  35. [35]

    Approximate

    Deshpande, Aditya and others , year =. Approximate. 1910.12815 , archivePrefix =

  36. [36]

    and Tsiamis, Anastasios and Lygeros, John , year =

    Micheli, Francesco and Balta, Efe C. and Tsiamis, Anastasios and Lygeros, John , year =. 2503.20341 , archivePrefix =

  37. [37]

    An Optimal Transport-Based Generative Model for

    Ke Li and Wei Han and Yuexi Wang and Yun Yang , year=. An Optimal Transport-Based Generative Model for. 2504.08214 , archivePrefix=

  38. [38]

    2010.09327 , archivePrefix=

    Anton Mallasto and Markus Heinonen and Samuel Kaski , year=. 2010.09327 , archivePrefix=

  39. [39]

    Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =

    Cuturi, Marco , booktitle =. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =. 2013 , url =

  40. [40]

    2023 , eprint=

    High-Dimensional Statistics , author=. 2023 , eprint=

  41. [41]

    and Kaiser,

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention Is All You Need , booktitle =. 2017 , publisher =

  42. [42]

    2025 , eprint=

    A mathematical perspective on Transformers , author=. 2025 , eprint=

  43. [43]

    2026 , eprint=

    The Mean-Field Dynamics of Transformers , author=. 2026 , eprint=

  44. [44]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    A multiscale analysis of mean-field transformers in the moderate interaction regime , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  45. [45]

    and Ablin, Pierre and Blondel, Mathieu and Peyr

    Sander, Michael E. and Ablin, Pierre and Blondel, Mathieu and Peyr. Sinkformers: Transformers with Doubly Stochastic Attention , year =. 2110.11773 , archivePrefix =

  46. [46]

    2026 , eprint=

    Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models , author=. 2026 , eprint=

  47. [47]

    On the Structure of Stationary Solutions to

    Krishnakumar Balasubramanian and Sayan Banerjee and Philippe Rigollet , year=. On the Structure of Stationary Solutions to. 2510.20094 , archivePrefix=

  48. [48]

    2026 , eprint=

    Homogenized Transformers , author=. 2026 , eprint=

  49. [49]

    Proceedings of the National Academy of Sciences , volume=

    Proof of the ergodic theorem , author=. Proceedings of the National Academy of Sciences , volume=. 1931 , publisher=

  50. [50]

    Annales de la facult

    Covariance inequalities for strongly mixing processes , author=. Annales de la facult. 1993 , publisher=

  51. [51]

    Mathematical Centre Tracts , volume=

    Random walks with stationary increments and renewal theory , author=. Mathematical Centre Tracts , volume=. 1979 , publisher=

  52. [52]

    Theory of Probability & Its Applications , volume=

    Some limit theorems for stationary processes , author=. Theory of Probability & Its Applications , volume=. 1962 , publisher=

  53. [53]

    1965 , publisher=

    Survey Sampling , author=. 1965 , publisher=

  54. [54]

    , title =

    Bandeira, Afonso S. , title =. 2015 , url =

  55. [55]

    Entropic measure and

    von Renesse, Max-K and Sturm, Karl-Theodor , journal=. Entropic measure and. 2009 , publisher=. doi:10.1214/08-AOP430 , url=

  56. [56]

    Fukushima, Masatoshi and Oshima, Yoichi and Takeda, Masayoshi , year =

  57. [57]

    Massive Particle Systems,

    Lorenzo Dello Schiavo , year=. Massive Particle Systems,. 2411.14936 , archivePrefix=

  58. [58]

    Lorenzo Dello Schiavo , journal=. A. 2018 , url=

  59. [59]

    Advances in Mathematics , volume=

    A convexity principle for interacting gases , author=. Advances in Mathematics , volume=. 1997 , publisher=

  60. [60]

    Agazzi, G

    A. Agazzi, G. Bruno, E. M. García, S. Saviozzi, and M. Romito. Stochastic scaling limits and synchronization by noise in deep transformer models, 2026

  61. [61]

    u rich . Birkh \

    L. Ambrosio, N. Gigli, and G. Savar \'e . Gradient Flows in Metric Spaces and in the Space of Probability Measures . Lectures in Mathematics ETH Z \"u rich . Birkh \"a user, Basel, second edition, 2008

  62. [62]

    Acciaio, D

    B. Acciaio, D. Kršek, G. Pammer, and M. Rodrigues. Absolutely continuous curves of stochastic processes, 2025

  63. [63]

    A. S. Bandeira. Ten lectures and forty-two problems in the mathematics of information, 2015. Lecture Notes, ETH Zurich

  64. [64]

    Benamou and Y

    J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik , 84(3):375--393, 2000

  65. [65]

    Burago, Y

    D. Burago, Y. Burago, and S. Ivanov. A Course in Metric Geometry , volume 33 of Graduate Studies in Mathematics . American Mathematical Society, Providence, RI, 2001

  66. [66]

    Bartl, M

    D. Bartl, M. Beiglb \"o ck, and G. Pammer. The Wasserstein space of stochastic processes. Journal of the European Mathematical Society , 26(11):4113--4142, 2024

  67. [67]

    Balasubramanian, S

    K. Balasubramanian, S. Banerjee, and P. Rigollet. On the structure of stationary solutions to McKean-Vlasov equations with applications to noisy transformers, 2025

  68. [68]

    H. Berbee. Random walks with stationary increments and renewal theory. Mathematical Centre Tracts , 112, 1979

  69. [69]

    G. D. Birkhoff. Proof of the ergodic theorem. Proceedings of the National Academy of Sciences , 17(2):656--660, 1931

  70. [70]

    Bernton, P

    E. Bernton, P. E. Jacob, M. Gerber, and C. P. Robert. Approximate B ayesian computation with the W asserstein distance. Journal of the Royal Statistical Society Series B: Statistical Methodology , 81(2):235--269, 2019

  71. [71]

    Beiglb \"o ck, C

    M. Beiglb \"o ck, C. L \'e onard, and W. Schachermayer. A general duality theorem for the Monge--Kantorovich transport problem. Studia Mathematica , 209(2):151--167, 2012

  72. [72]

    Bruno, F

    G. Bruno, F. Pasqualotto, and A. Agazzi. A multiscale analysis of mean-field transformers in the moderate interaction regime. In The Thirty-ninth Annual Conference on Neural Information Processing Systems , 2026

  73. [73]

    Y. Brenier. Polar decomposition and monotone rearrangement of vector fields. Communications on Pure and Applied Mathematics , 44(4):375--417, 1991

  74. [74]

    Chassagneux, R

    J.-F. Chassagneux, R. Elie, and N. Frikha. Quantitative approximation of the B urgers and K eller-- S egel equations by moderately interacting particles. HAL preprint , 2020. hal-02537226

  75. [75]

    Catalano and H

    M. Catalano and H. Lavenant. Merging rate of opinions via optimal transport on random measures, 2023

  76. [76]

    Chewi, J

    S. Chewi, J. Niles-Weed, and P. Rigollet. Statistical Optimal Transport , volume 2364 of Lecture Notes in Mathematics . Springer, 2025

  77. [77]

    M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems , volume 26. Curran Associates, Inc., 2013

  78. [78]

    Deshpande et al

    A. Deshpande et al. Approximate Bayesian computation with the sliced- Wasserstein distance, 2019

  79. [79]

    del Barrio and J.-M

    E. del Barrio and J.-M. Loubes. Central limit theorems for empirical transportation cost in general dimension. The Annals of Probability , 47(2):926--951, 2019

  80. [80]

    M. P. do Carmo. Riemannian Geometry . Birkh \"a user Boston, Boston, MA, 1992. Translated from the Portuguese by Francis Flaherty

Showing first 80 references.