Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence
Pith reviewed 2026-06-27 02:52 UTC · model grok-4.3
The pith
WT-PCA gives a variational dynamical view of log-PCA and proves statistical convergence of the empirical version measured by 2-Wasserstein distance to the barycenter.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The WT-PCA captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter, and the empirical WT-PCA converges to the population version at a general statistical rate in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.
What carries the argument
Covariance operator at the barycenter, connected across tangent spaces by the parallel transport induced by optimal transport maps.
If this is right
- The variational formulation directly yields a differentiable version of principal geodesic analysis on the Wasserstein space.
- Convergence holds for weighted measures and is expressed solely through the barycenter reference distance.
- No extra regularity is imposed that would break the variational interpretation or the parallel-transport argument.
- The rate applies to any estimator whose barycenter converges in 2-Wasserstein distance.
Where Pith is reading between the lines
- The same dynamical perspective could be used to define tangential PCA on other spaces that admit a parallel-transport structure from optimal transport.
- Applications that already compute Wasserstein barycenters (image histograms, point-cloud summaries) could immediately plug in the WT-PCA estimator.
- Synthetic experiments with known low-dimensional geodesic variations would directly test whether the extracted modes recover the ground-truth directions.
Load-bearing premise
The parallel transport structure of the optimal transport problems is well-defined and can be leveraged to connect the covariance operators across tangent spaces at the barycenter without additional regularity conditions.
What would settle it
A numerical experiment in which the observed 2-Wasserstein error between population and empirical WT-PCA fails to decay at the predicted rate once the barycenter distance is increased while keeping sample size fixed.
Figures
read the original abstract
This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a dynamical formulation of log-PCA for random probability measures under Wasserstein geometry. It defines the Wasserstein Tangential PCA (WT-PCA) via the covariance operator of geodesic variations at the barycenter and uses parallel transport of optimal transport maps to derive a general statistical convergence rate of the empirical WT-PCA to the population version, expressed in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures; the result is claimed to hold without additional regularity conditions on the measures.
Significance. If the central derivation is valid, the work supplies a variational interpretation of linearized PCA on the Wasserstein space together with explicit statistical rates; the dynamical perspective and explicit use of parallel transport constitute a clear technical contribution that could support further reproducible analysis of distribution-valued data.
major comments (1)
- [Abstract / convergence derivation] Abstract and the derivation of the convergence rate (around the parallel-transport step): the claim that the result holds 'without additional regularity conditions' is load-bearing for the identification of the empirical covariance operator with its population counterpart via parallel transport. In Wasserstein geometry this transport is single-valued and isometric only when optimal maps exist and are unique, which fails for discrete or singular measures; the manuscript must either exhibit the precise conditions under which the transport remains well-defined or show that the rate derivation does not rely on uniqueness.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on the convergence derivation. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract / convergence derivation] Abstract and the derivation of the convergence rate (around the parallel-transport step): the claim that the result holds 'without additional regularity conditions' is load-bearing for the identification of the empirical covariance operator with its population counterpart via parallel transport. In Wasserstein geometry this transport is single-valued and isometric only when optimal maps exist and are unique, which fails for discrete or singular measures; the manuscript must either exhibit the precise conditions under which the transport remains well-defined or show that the rate derivation does not rely on uniqueness.
Authors: We appreciate the referee highlighting this important technical point regarding the parallel transport step. The derivation identifies the empirical covariance operator with its population counterpart by transporting the optimal maps, and the rate is expressed in terms of the 2-Wasserstein distance between barycenters. This step does rely on the transport map being single-valued, which requires uniqueness of the optimal map. The manuscript's claim of holding without additional regularity conditions is therefore imprecise for measures where uniqueness may fail (e.g., discrete or singular supports). We will revise the paper to explicitly state the assumption that the population and empirical measures admit unique optimal transport maps (for instance, when they are absolutely continuous w.r.t. Lebesgue measure on R^m). This will be added to the theorem statement, with a clarifying remark in the abstract and Section 3. We view this as a necessary clarification rather than a weakening of the result. revision: yes
Circularity Check
No circularity: derivation self-contained via dynamical formulation and OT parallel transport
full rationale
The abstract and description present WT-PCA as a new variational interpretation of log-PCA using covariance operators at the barycenter, with convergence derived from the dynamical perspective and parallel transport structure of OT problems. No equations, self-citations, or reductions are exhibited that make any prediction equivalent to its inputs by construction, fit a parameter then rename it as prediction, or rely on load-bearing self-citations. The central statistical rate claim is framed as following from the stated assumptions on the Wasserstein geometry without reduction to fitted quantities. This is the common honest non-finding for papers whose core derivation remains independent of the target result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Wasserstein space on R^m admits a parallel transport structure for optimal transport problems that allows consistent definition of covariance operators at the barycenter.
Reference graph
Works this paper leans on
-
[1]
Principal component analysis,
I. Jolliffe, “Principal component analysis,” inInternational Encyclopedia of Statistical Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 1094–1096,isbn: 978-3-642-04898-2.doi: 10.1007/ 978-3-642-04898-2_455
2011
-
[2]
Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric,
V. Seguy and M. Cuturi, “Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28, Curran Associates, Inc., 2015. [Online]. Available: https : / / proceedings . neurips . cc / paper / 2015 / file ...
2015
-
[3]
Annales de l'Institut Henri Poincaré, Probabilités et Statistiques , year =
J. Bigot, R. Gouet, T. Klein, and A. L´ opez, “Geodesic PCA in the Wasserstein space by convex PCA,” Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, vol. 53, no. 1, pp. 1–26, 2017.doi: 10.1214/15-AIHP706[Online]. Available:https://doi.org/10.1214/15-AIHP706
-
[4]
Geodesic PCA versus Log-PCA of Histograms in the Wasserstein Space,
E. Cazelles, V. Seguy, J. Bigot, M. Cuturi, and N. Papadakis, “Geodesic PCA versus Log-PCA of Histograms in the Wasserstein Space,”SIAM Journal on Scientific Computing, vol. 40, no. 2, B429– B456, 2018.doi:10.1137/17M1143459
-
[5]
Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011
M. Agueh and G. Carlier, “Barycenters in the Wasserstein Space,”SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011.doi:10.1137/100805741
-
[6]
Optimal transport barycenter via nonconvex concave minimax optimization,
K. Kim, R. Yao, C. Zhu, and X. Chen, “Optimal transport barycenter via nonconvex concave minimax optimization,” inInternational Conference on Machine Learning (ICML), Jul. 2025
2025
-
[7]
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis,
K. Kim, B. Zhou, C. Zhu, and X. Chen, “Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis,” inInternational Conference on Learning Representations (ICLR), 2026
2026
-
[8]
P. Xu, C. Zhu, and X. Chen,A Unified Approach for Computing Wasserstein Barycenters of Discrete and Continuous Measures, 2026. arXiv: 2605.11270 [math.OC]. [Online]. Available: https://arxiv. org/abs/2605.11270
Pith/arXiv arXiv 2026
-
[9]
Ambrosio, E
L. Ambrosio, E. Bru´ e, and D. Semola,Lectures on Optimal Transport(UNITEXT). Springer Inter- national Publishing, 2021,isbn: 9783030721626. [Online]. Available: https://books.google.com/ books?id=vcI5EAAAQBAJ
2021
-
[10]
Principal geodesic analysis for the study of nonlinear statistics of shape,
P. Fletcher, C. Lu, S. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging, vol. 23, no. 8, pp. 995–1005, 2004.doi: 10.1109/TMI.2004.831793
-
[11]
On the Wasserstein Geodesic Principal Com- ponent Analysis of probability measures,
N. Vesseron, E. Cazelles, A. L. Brigant, and Klein, “On the Wasserstein Geodesic Principal Com- ponent Analysis of probability measures,” inThe Fourteenth International Conference on Learning Representations, 2026. [Online]. Available:https://openreview.net/forum?id=OJupg4mDjS
2026
-
[12]
W. Wang, D. Slepˇ cev, S. Basu, J. A. Ozolek, and G. K. Rohde, “A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images,”International Journal of Computer Vision, vol. 101, no. 2, pp. 254–269, 2013.doi:10.1007/s11263-012-0566-z
-
[13]
Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations,
S. Sommer, F. Lauze, S. Hauberg, and M. Nielsen, “Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations,” inComputer Vision – ECCV 2010, K. Daniilidis, P. Maragos, and N. Paragios, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 43–56, isbn: 978-3-642-15567-3
2010
-
[14]
L. V. Santoro and V. M. Panaretos,Statistical Inference for Bures-Wasserstein Flows, 2024. arXiv: 2310.13764 [stat.ME]. [Online]. Available:https://arxiv.org/abs/2310.13764
arXiv 2024
-
[15]
Wasserstein barycenters over Riemannian manifolds,
Y.-H. Kim and B. Pass, “Wasserstein barycenters over Riemannian manifolds,”Advances in Mathe- matics, vol. 307, pp. 640–683, 2017,issn: 0001-8708.doi:10.1016/j.aim.2016.11.026 13
-
[16]
Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,”Commu- nications on Pure and Applied Mathematics, vol. 44, no. 4, pp. 375–417, 1991.doi: 10.1002/cpa. 3160440402eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160440402
work page doi:10.1002/cpa 1991
-
[17]
Villani,Topics in Optimal Transportation(Graduate studies in mathematics)
C. Villani,Topics in Optimal Transportation(Graduate studies in mathematics). American Mathemat- ical Society, 2003,isbn: 9780821833124. [Online]. Available: https://books.google.com/books? id=R%5C_nWqjq89oEC
2003
-
[18]
The Geometry of Dissipative Evolution Equations: The Porous Medium Equation,
F. Otto, “The Geometry of Dissipative Evolution Equations: The Porous Medium Equation,”Com- munications in Partial Differential Equations, vol. 26, no. 1-2, pp. 101–174, 2001.doi: 10.1081/PDE- 100002243
-
[19]
Ambrosio, N
L. Ambrosio, N. Gigli, and G. Savar´ e,Gradient Flows in Metric Spaces and in the Space of Probability Measures(Lectures in Mathematics ETH Z¨ urich), Second. Birkh¨ auser Baseluser Basel, 2008
2008
-
[20]
Ricci curvature for metric-measure spaces via optimal transport,
J. Lott and C. Villani, “Ricci curvature for metric-measure spaces via optimal transport,”Ann. of Math., vol. 169, no. 3, pp. 903–991, 2009
2009
-
[21]
A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem,
J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem,”Numerische Mathematik, vol. 84, no. 3, pp. 375–393, 2000.doi: 10.1007/ s002110050002
2000
-
[22]
Construction of the Parallel Transport in the Wasserstein Space,
L. Ambrosio and N. Gigli, “Construction of the Parallel Transport in the Wasserstein Space,”Methods and Applications of Analysis, vol. 15, no. 1, pp. 1–30, 2008
2008
-
[23]
Advances in Mathematics , author =
R. J. McCann, “A Convexity Principle for Interacting Gases,”Advances in Mathematics, vol. 128, no. 1, pp. 153–179, 1997,issn: 0001-8708.doi:10.1006/aima.1997.1634
-
[24]
Birkhäuser Cham, 1 edition, 2015
F. Santambrogio,Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling(Progress in Nonlinear Differential Equations and Their Applications). Springer International Publishing, 2015,isbn: 978-3-319-20828-2.doi:doi.org/10.1007/978-3-319-20828-2
-
[25]
Gigli,Second Order Analysis on( P2(M), W2) (Memoirs of the American Mathematical Society)
N. Gigli,Second Order Analysis on( P2(M), W2) (Memoirs of the American Mathematical Society). American Mathematical Society, 2012,isbn: 978-0-8218-8529-1
2012
-
[26]
Wasserstein regression,
Y. Chen, Z. Lin, and H. -G. M¨ uller, “Wasserstein regression,”Journal of the American Statistical Association, vol. 118, no. 542, pp. 869–882, 2023
2023
-
[27]
Hsing and R
T. Hsing and R. Eubank,Theoretical foundations of functional data analysis, with an introduction to linear operators. John Wiley & Sons, 2015, vol. 997
2015
-
[28]
Convergence rates for discretized Monge–Amp` ere equations and quantitative stability of optimal transport,
R. J. Berman, “Convergence rates for discretized Monge–Amp` ere equations and quantitative stability of optimal transport,”Foundations of Computational Mathematics, vol. 21, no. 4, pp. 1099–1140, 2021
2021
-
[29]
Chewi, J
S. Chewi, J. Niles-Weed, and P. Rigollet,Statistical optimal transport(Lecture Notes in Mathematics). Springer Cham, 2024,isbn: 978-3-031-85160-5
2024
-
[30]
Statistical inference for Bures–Wasserstein barycenters,
A. Kroshnin, V. Spokoiny, and A. Suvorikova, “Statistical inference for Bures–Wasserstein barycenters,” The Annals of Applied Probability, vol. 31, no. 3, pp. 1264–1298, 2021
2021
-
[31]
Bosq,Linear processes in function spaces: theory and applications
D. Bosq,Linear processes in function spaces: theory and applications. Springer Science & Business Media, 2000, vol. 149
2000
-
[32]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recog- nition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.doi:10.1109/5.726791
-
[33]
Accessed: Jan
WallpaperAccess.com,WallpaperAccess, 2026. Accessed: Jan. 28, 2026. [Online]. Available: https: //wallpaperaccess.com/
2026
-
[34]
Bhatia,Positive definite matrices
R. Bhatia,Positive definite matrices. Princeton university press, 2009
2009
-
[35]
On the Bures–Wasserstein distance between positive definite matrices,
R. Bhatia, T. Jain, and Y. Lim, “On the Bures–Wasserstein distance between positive definite matrices,” Expositiones Mathematicae, vol. 37, no. 2, pp. 165–191, 2019. 14 A Technical Details A.1 Alternative Proof of Lemma 3.2 Proof. First, since the objective function in (14) is quadratic and thus convex in ξ, we shall consider the equivalent constraint ∥ξ∥...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.