pith. sign in

arxiv: 2606.17196 · v1 · pith:Y47VMCABnew · submitted 2026-06-15 · 📊 stat.ML · cs.LG· stat.ME

Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence

Pith reviewed 2026-06-27 02:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords Wasserstein spacelog-PCAprincipal geodesic analysisoptimal transportstatistical convergenceprobability measuresbarycentertangential PCA
0
0 comments X

The pith

WT-PCA gives a variational dynamical view of log-PCA and proves statistical convergence of the empirical version measured by 2-Wasserstein distance to the barycenter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a dynamical formulation that recasts log-PCA as a variational problem for identifying principal variations among random probability measures equipped with Wasserstein geometry. From this it defines the Wasserstein Tangential PCA, which extracts local principal modes through the covariance operator evaluated at the barycenter. The work then establishes a general convergence rate for the estimator computed from finite samples, expressed directly in terms of the 2-Wasserstein distance between the population and empirical barycenters. A reader would care because the result supplies a geometrically consistent way to reduce dimension for distribution-valued data while retaining the optimal-transport structure.

Core claim

The WT-PCA captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter, and the empirical WT-PCA converges to the population version at a general statistical rate in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

What carries the argument

Covariance operator at the barycenter, connected across tangent spaces by the parallel transport induced by optimal transport maps.

If this is right

  • The variational formulation directly yields a differentiable version of principal geodesic analysis on the Wasserstein space.
  • Convergence holds for weighted measures and is expressed solely through the barycenter reference distance.
  • No extra regularity is imposed that would break the variational interpretation or the parallel-transport argument.
  • The rate applies to any estimator whose barycenter converges in 2-Wasserstein distance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dynamical perspective could be used to define tangential PCA on other spaces that admit a parallel-transport structure from optimal transport.
  • Applications that already compute Wasserstein barycenters (image histograms, point-cloud summaries) could immediately plug in the WT-PCA estimator.
  • Synthetic experiments with known low-dimensional geodesic variations would directly test whether the extracted modes recover the ground-truth directions.

Load-bearing premise

The parallel transport structure of the optimal transport problems is well-defined and can be leveraged to connect the covariance operators across tangent spaces at the barycenter without additional regularity conditions.

What would settle it

A numerical experiment in which the observed 2-Wasserstein error between population and empirical WT-PCA fails to decay at the predicted rate once the barycenter distance is increased while keeping sample size fixed.

Figures

Figures reproduced from arXiv: 2606.17196 by Changbo Zhu, Peng Xu, Xiaohui Chen, Young-Heon Kim.

Figure 1
Figure 1. Figure 1: First two principal modes of geodesic variation of the WT-PCA in the example of three bivariate [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top 10 eigenvalues of the Wasserstein covariance in the example of three bivariate Gaussians. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The top 6 principal modes for digit 2. The histogram of the barycenter is shown in the middle column [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of color palette averaging [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Perturbations of the color-palette barycenter (center) along the first principal mode of variation. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces a dynamical formulation of log-PCA for random probability measures under Wasserstein geometry. It defines the Wasserstein Tangential PCA (WT-PCA) via the covariance operator of geodesic variations at the barycenter and uses parallel transport of optimal transport maps to derive a general statistical convergence rate of the empirical WT-PCA to the population version, expressed in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures; the result is claimed to hold without additional regularity conditions on the measures.

Significance. If the central derivation is valid, the work supplies a variational interpretation of linearized PCA on the Wasserstein space together with explicit statistical rates; the dynamical perspective and explicit use of parallel transport constitute a clear technical contribution that could support further reproducible analysis of distribution-valued data.

major comments (1)
  1. [Abstract / convergence derivation] Abstract and the derivation of the convergence rate (around the parallel-transport step): the claim that the result holds 'without additional regularity conditions' is load-bearing for the identification of the empirical covariance operator with its population counterpart via parallel transport. In Wasserstein geometry this transport is single-valued and isometric only when optimal maps exist and are unique, which fails for discrete or singular measures; the manuscript must either exhibit the precise conditions under which the transport remains well-defined or show that the rate derivation does not rely on uniqueness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on the convergence derivation. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract / convergence derivation] Abstract and the derivation of the convergence rate (around the parallel-transport step): the claim that the result holds 'without additional regularity conditions' is load-bearing for the identification of the empirical covariance operator with its population counterpart via parallel transport. In Wasserstein geometry this transport is single-valued and isometric only when optimal maps exist and are unique, which fails for discrete or singular measures; the manuscript must either exhibit the precise conditions under which the transport remains well-defined or show that the rate derivation does not rely on uniqueness.

    Authors: We appreciate the referee highlighting this important technical point regarding the parallel transport step. The derivation identifies the empirical covariance operator with its population counterpart by transporting the optimal maps, and the rate is expressed in terms of the 2-Wasserstein distance between barycenters. This step does rely on the transport map being single-valued, which requires uniqueness of the optimal map. The manuscript's claim of holding without additional regularity conditions is therefore imprecise for measures where uniqueness may fail (e.g., discrete or singular supports). We will revise the paper to explicitly state the assumption that the population and empirical measures admit unique optimal transport maps (for instance, when they are absolutely continuous w.r.t. Lebesgue measure on R^m). This will be added to the theorem statement, with a clarifying remark in the abstract and Section 3. We view this as a necessary clarification rather than a weakening of the result. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation self-contained via dynamical formulation and OT parallel transport

full rationale

The abstract and description present WT-PCA as a new variational interpretation of log-PCA using covariance operators at the barycenter, with convergence derived from the dynamical perspective and parallel transport structure of OT problems. No equations, self-citations, or reductions are exhibited that make any prediction equivalent to its inputs by construction, fit a parameter then rename it as prediction, or rely on load-bearing self-citations. The central statistical rate claim is framed as following from the stated assumptions on the Wasserstein geometry without reduction to fitted quantities. This is the common honest non-finding for papers whose core derivation remains independent of the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract invokes standard optimal transport structures without introducing new free parameters or entities; relies on domain assumptions about the Wasserstein space.

axioms (1)
  • domain assumption The Wasserstein space on R^m admits a parallel transport structure for optimal transport problems that allows consistent definition of covariance operators at the barycenter.
    Invoked to derive the statistical convergence rate of empirical WT-PCA.

pith-pipeline@v0.9.1-grok · 5665 in / 1278 out tokens · 41649 ms · 2026-06-27T02:52:21.135600+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 11 canonical work pages

  1. [1]

    Principal component analysis,

    I. Jolliffe, “Principal component analysis,” inInternational Encyclopedia of Statistical Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 1094–1096,isbn: 978-3-642-04898-2.doi: 10.1007/ 978-3-642-04898-2_455

  2. [2]

    Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric,

    V. Seguy and M. Cuturi, “Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28, Curran Associates, Inc., 2015. [Online]. Available: https : / / proceedings . neurips . cc / paper / 2015 / file ...

  3. [3]

    Annales de l'Institut Henri Poincaré, Probabilités et Statistiques , year =

    J. Bigot, R. Gouet, T. Klein, and A. L´ opez, “Geodesic PCA in the Wasserstein space by convex PCA,” Annales de l’Institut Henri Poincar´ e, Probabilit´ es et Statistiques, vol. 53, no. 1, pp. 1–26, 2017.doi: 10.1214/15-AIHP706[Online]. Available:https://doi.org/10.1214/15-AIHP706

  4. [4]

    Geodesic PCA versus Log-PCA of Histograms in the Wasserstein Space,

    E. Cazelles, V. Seguy, J. Bigot, M. Cuturi, and N. Papadakis, “Geodesic PCA versus Log-PCA of Histograms in the Wasserstein Space,”SIAM Journal on Scientific Computing, vol. 40, no. 2, B429– B456, 2018.doi:10.1137/17M1143459

  5. [5]

    Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

    M. Agueh and G. Carlier, “Barycenters in the Wasserstein Space,”SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011.doi:10.1137/100805741

  6. [6]

    Optimal transport barycenter via nonconvex concave minimax optimization,

    K. Kim, R. Yao, C. Zhu, and X. Chen, “Optimal transport barycenter via nonconvex concave minimax optimization,” inInternational Conference on Machine Learning (ICML), Jul. 2025

  7. [7]

    Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis,

    K. Kim, B. Zhou, C. Zhu, and X. Chen, “Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis,” inInternational Conference on Learning Representations (ICLR), 2026

  8. [8]

    P. Xu, C. Zhu, and X. Chen,A Unified Approach for Computing Wasserstein Barycenters of Discrete and Continuous Measures, 2026. arXiv: 2605.11270 [math.OC]. [Online]. Available: https://arxiv. org/abs/2605.11270

  9. [9]

    Ambrosio, E

    L. Ambrosio, E. Bru´ e, and D. Semola,Lectures on Optimal Transport(UNITEXT). Springer Inter- national Publishing, 2021,isbn: 9783030721626. [Online]. Available: https://books.google.com/ books?id=vcI5EAAAQBAJ

  10. [10]

    Principal geodesic analysis for the study of nonlinear statistics of shape,

    P. Fletcher, C. Lu, S. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging, vol. 23, no. 8, pp. 995–1005, 2004.doi: 10.1109/TMI.2004.831793

  11. [11]

    On the Wasserstein Geodesic Principal Com- ponent Analysis of probability measures,

    N. Vesseron, E. Cazelles, A. L. Brigant, and Klein, “On the Wasserstein Geodesic Principal Com- ponent Analysis of probability measures,” inThe Fourteenth International Conference on Learning Representations, 2026. [Online]. Available:https://openreview.net/forum?id=OJupg4mDjS

  12. [12]

    A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images,

    W. Wang, D. Slepˇ cev, S. Basu, J. A. Ozolek, and G. K. Rohde, “A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images,”International Journal of Computer Vision, vol. 101, no. 2, pp. 254–269, 2013.doi:10.1007/s11263-012-0566-z

  13. [13]

    Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations,

    S. Sommer, F. Lauze, S. Hauberg, and M. Nielsen, “Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations,” inComputer Vision – ECCV 2010, K. Daniilidis, P. Maragos, and N. Paragios, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 43–56, isbn: 978-3-642-15567-3

  14. [14]

    L. V. Santoro and V. M. Panaretos,Statistical Inference for Bures-Wasserstein Flows, 2024. arXiv: 2310.13764 [stat.ME]. [Online]. Available:https://arxiv.org/abs/2310.13764

  15. [15]

    Wasserstein barycenters over Riemannian manifolds,

    Y.-H. Kim and B. Pass, “Wasserstein barycenters over Riemannian manifolds,”Advances in Mathe- matics, vol. 307, pp. 640–683, 2017,issn: 0001-8708.doi:10.1016/j.aim.2016.11.026 13

  16. [16]

    3160340603

    Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,”Commu- nications on Pure and Applied Mathematics, vol. 44, no. 4, pp. 375–417, 1991.doi: 10.1002/cpa. 3160440402eprint:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160440402

  17. [17]

    Villani,Topics in Optimal Transportation(Graduate studies in mathematics)

    C. Villani,Topics in Optimal Transportation(Graduate studies in mathematics). American Mathemat- ical Society, 2003,isbn: 9780821833124. [Online]. Available: https://books.google.com/books? id=R%5C_nWqjq89oEC

  18. [18]

    The Geometry of Dissipative Evolution Equations: The Porous Medium Equation,

    F. Otto, “The Geometry of Dissipative Evolution Equations: The Porous Medium Equation,”Com- munications in Partial Differential Equations, vol. 26, no. 1-2, pp. 101–174, 2001.doi: 10.1081/PDE- 100002243

  19. [19]

    Ambrosio, N

    L. Ambrosio, N. Gigli, and G. Savar´ e,Gradient Flows in Metric Spaces and in the Space of Probability Measures(Lectures in Mathematics ETH Z¨ urich), Second. Birkh¨ auser Baseluser Basel, 2008

  20. [20]

    Ricci curvature for metric-measure spaces via optimal transport,

    J. Lott and C. Villani, “Ricci curvature for metric-measure spaces via optimal transport,”Ann. of Math., vol. 169, no. 3, pp. 903–991, 2009

  21. [21]

    A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem,

    J.-D. Benamou and Y. Brenier, “A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem,”Numerische Mathematik, vol. 84, no. 3, pp. 375–393, 2000.doi: 10.1007/ s002110050002

  22. [22]

    Construction of the Parallel Transport in the Wasserstein Space,

    L. Ambrosio and N. Gigli, “Construction of the Parallel Transport in the Wasserstein Space,”Methods and Applications of Analysis, vol. 15, no. 1, pp. 1–30, 2008

  23. [23]

    Advances in Mathematics , author =

    R. J. McCann, “A Convexity Principle for Interacting Gases,”Advances in Mathematics, vol. 128, no. 1, pp. 153–179, 1997,issn: 0001-8708.doi:10.1006/aima.1997.1634

  24. [24]

    Birkhäuser Cham, 1 edition, 2015

    F. Santambrogio,Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling(Progress in Nonlinear Differential Equations and Their Applications). Springer International Publishing, 2015,isbn: 978-3-319-20828-2.doi:doi.org/10.1007/978-3-319-20828-2

  25. [25]

    Gigli,Second Order Analysis on( P2(M), W2) (Memoirs of the American Mathematical Society)

    N. Gigli,Second Order Analysis on( P2(M), W2) (Memoirs of the American Mathematical Society). American Mathematical Society, 2012,isbn: 978-0-8218-8529-1

  26. [26]

    Wasserstein regression,

    Y. Chen, Z. Lin, and H. -G. M¨ uller, “Wasserstein regression,”Journal of the American Statistical Association, vol. 118, no. 542, pp. 869–882, 2023

  27. [27]

    Hsing and R

    T. Hsing and R. Eubank,Theoretical foundations of functional data analysis, with an introduction to linear operators. John Wiley & Sons, 2015, vol. 997

  28. [28]

    Convergence rates for discretized Monge–Amp` ere equations and quantitative stability of optimal transport,

    R. J. Berman, “Convergence rates for discretized Monge–Amp` ere equations and quantitative stability of optimal transport,”Foundations of Computational Mathematics, vol. 21, no. 4, pp. 1099–1140, 2021

  29. [29]

    Chewi, J

    S. Chewi, J. Niles-Weed, and P. Rigollet,Statistical optimal transport(Lecture Notes in Mathematics). Springer Cham, 2024,isbn: 978-3-031-85160-5

  30. [30]

    Statistical inference for Bures–Wasserstein barycenters,

    A. Kroshnin, V. Spokoiny, and A. Suvorikova, “Statistical inference for Bures–Wasserstein barycenters,” The Annals of Applied Probability, vol. 31, no. 3, pp. 1264–1298, 2021

  31. [31]

    Bosq,Linear processes in function spaces: theory and applications

    D. Bosq,Linear processes in function spaces: theory and applications. Springer Science & Business Media, 2000, vol. 149

  32. [32]

    Lecun, L

    Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recog- nition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.doi:10.1109/5.726791

  33. [33]

    Accessed: Jan

    WallpaperAccess.com,WallpaperAccess, 2026. Accessed: Jan. 28, 2026. [Online]. Available: https: //wallpaperaccess.com/

  34. [34]

    Bhatia,Positive definite matrices

    R. Bhatia,Positive definite matrices. Princeton university press, 2009

  35. [35]

    On the Bures–Wasserstein distance between positive definite matrices,

    R. Bhatia, T. Jain, and Y. Lim, “On the Bures–Wasserstein distance between positive definite matrices,” Expositiones Mathematicae, vol. 37, no. 2, pp. 165–191, 2019. 14 A Technical Details A.1 Alternative Proof of Lemma 3.2 Proof. First, since the objective function in (14) is quadratic and thus convex in ξ, we shall consider the equivalent constraint ∥ξ∥...