Wasserstein Least Squares: A Canonical Regression Method for Probability Distributions

Jonathan Niles-Weed; Uriel Mart\'inez Le\'on

arxiv: 2605.30266 · v1 · pith:XGITMKIFnew · submitted 2026-05-28 · 🧮 math.ST · stat.TH

Wasserstein Least Squares: A Canonical Regression Method for Probability Distributions

Uriel Mart\'inez Le\'on , Jonathan Niles-Weed This is my paper

Pith reviewed 2026-06-28 23:57 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords Wasserstein least squaresdistributional regressionoptimal transportWasserstein barycenterstemplate deformation modelconvex analysisparametric rates

0 comments

The pith

Wasserstein least squares extends classical least squares to probability distributions as its canonical convex-analytic counterpart and attains root-n estimation rates under a deformation model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Wasserstein least squares for regressing vector covariates onto distribution-valued responses. It establishes this estimator as the direct analogue of Euclidean least squares by applying convex-analytic arguments that produce multimarginal and dual formulations. Under the template deformation model, where each observed distribution is a random push-forward of a fixed template, the method achieves the parametric convergence rate of n to the power of minus one half. This rate yields an exponential improvement on existing bounds for Wasserstein barycenters as a special case.

Core claim

Wasserstein least squares is the canonical extension of Euclidean least squares to the space of probability distributions from the perspective of convex analysis; this viewpoint gives rise to multimarginal and dual formulations of the Wasserstein least squares problem, extending a similar theory for Wasserstein barycenters. Under the template deformation model, estimation is possible at the n^{-1/2} rate, which produces an exponential improvement over prior rates for Wasserstein barycenters.

What carries the argument

Wasserstein least squares problem, defined by minimizing expected squared Wasserstein distance to a linear predictor in the space of measures and shown to be the convex-analytic lift of Euclidean least squares.

If this is right

The regression estimator converges at the parametric n^{-1/2} rate for the underlying map from covariates to distributions.
Wasserstein barycenters, recovered as the intercept-only case, inherit the same n^{-1/2} rate, an exponential improvement over previous bounds.
A particle-based heuristic permits computation on large data sets and yields new demographic insights from the RAND Health and Retirement Study.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The random-effects interpretation may allow transfer of classical mixed-model diagnostics to the Wasserstein setting.
The multimarginal formulation could be adapted to other convex losses beyond squared distance in optimal transport regression.
Robustness checks on data lacking a clear template could quantify how much the parametric rate degrades outside the model.

Load-bearing premise

Observed distributions are generated as random deformations of one fixed template distribution.

What would settle it

A simulation in which distributions are generated without a common template yet the estimator still converges at rate n to the minus one half would falsify the necessity of the deformation model for the claimed rate.

Figures

Figures reproduced from arXiv: 2605.30266 by Jonathan Niles-Weed, Uriel Mart\'inez Le\'on.

**Figure 1.** Figure 1: (left) shows its corner plot: diagonal panels display the marginal distribution of each coefficient, and off-diagonal panels reveal genuine correlations among demographic effects. On the other hand, as we describe in Section A, Fr´echet regression also implicitly yields a joint distribution over the coefficient vector: the predicted distribution corresponding to a covariate vector x has quantile function… view at source ↗

**Figure 2.** Figure 2: Sequential conditioning (BMI ∈ [29, 33] at age 50; three age-60 scenarios), cohorts 1935–39 and 1940– 44, both sexes. Columns: worsening (BMI > 33, red), stable (BMI 29–33, blue), improved (BMI < 29, green). Rows: observed trajectories (top), Wasserstein least squares (middle), Fr´echet (bottom). Shaded bands: 75% and 99% prediction intervals; box plots: matched empirical distribution. Wasserstein least s… view at source ↗

**Figure 3.** Figure 3: Random transport maps ∇ϕi from five noise families (coloured) applied to the template Q⋆ x (black dashed). Top row: the maps T(y); shaded band shows the C3 curvature bounds. Bottom row: deviation T(y) − y from the identity, illustrating C2 (E[T(y)] = y). The two templates are: (i) Univariate (d = 1, n = 50), with Q⋆ x = N (t, 1 + t 2 ) and covariate x = (1, t) ⊤, t ∼ U[−2, 2]; the variance is U-shaped, gr… view at source ↗

**Figure 4.** Figure 4: Univariate: estimated densities at t ∈ {−1.8, −0.5, 0.5, 1.8}. True template Q⋆ xi (black dashed), noisy observations νi (grey), Wasserstein least squares (blue), Fr´echet (red dashed). Wasserstein least squares recovers the true template across the covariate range under additive noise. The U-shaped variance (univariate), Fr´echet regression underestimates spread at the extremes. Both templates share a fea… view at source ↗

**Figure 5.** Figure 5: Covariance trajectory t 7→ Σ(t) projected onto three coordinate pairs of the SPD cone. Wasserstein least squares (blue) traces the curved true path (black); Fr´echet-GD (red) follows a straight line.Bivariate Gaussian experiment: Wasserstein least squares recovers the quadratic covariance trajectory Σ(t) = A + t(B + B⊤) + t 2C. The log-Euclidean view (right) makes the curvature explicit; Fr´echet regressi… view at source ↗

**Figure 6.** Figure 6: Predictive parity between Wasserstein least squares and Fr´echet regression on the BMI data. Left: LOO W2 error histograms (top) and paired cell-level comparison (bottom); the median difference is 0.025 BMI units and 61% of cells favour Fr´echet, confirming that neither method dominates on raw out-of-sample fit. Right: Fitted density heatmaps for male subjects, cohort 1935– 1939; both models produce virt… view at source ↗

**Figure 7.** Figure 7: Comparison of coefficient distributions from Wasserstein least squares and Fr´echet regression. (a) Marginal distributions of the random coefficient vector β ∼ Qb estimated via Wasserstein least squares. Each panel shows the empirical distribution of one coefficient component from the M = 2000 particle representation of Qb. Histogram with kernel density estimate overlay; vertical black dashed line at z… view at source ↗

**Figure 8.** Figure 8: Comparison of joint distribution structures between Wasserstein least squares and Fr´echet regression. To the left: Corner plot showing the joint distribution structure of β ∼ Qb. Diagonal panels: marginal distributions (histogram with KDE) for each coefficient; black vertical line at zero, red vertical line at the mean. Lower triangle: bivariate scatter plots showing pairwise joint distributions with cont… view at source ↗

**Figure 9.** Figure 9: Marginal versus conditional standard deviation (cohort 1940, female). The shaded region represents the variance explained by knowing β0. cohort effect—across the entire distribution. In contrast, interrogating the Wasserstein least squares particle cloud reveals distinct minority trajectories [PITH_FULL_IMAGE:figures/full_fig_p039_9.png] view at source ↗

**Figure 10.** Figure 10: Analysis of particles with βage2 > 0 (accelerating BMI growth). Using M = 20,000 particles, approximately 10% exhibit positive quadratic age coefficients, representing individuals whose BMI continues to accelerate with age rather than plateau or decline. (a) Marginal distribution of βage2 ; vertical dashed line indicates zero, solid line shows the mean. Shaded region highlights particles with βage2 > 0.… view at source ↗

**Figure 11.** Figure 11: Concave/monotone vs. convex BMI trajectories, cohort 1940–44, female. Top row: individual observed BMI trajectories (thin lines) with group median (thick line); dashed line at BMI = 30. Bottom row: detrended residuals after removing each individual’s personal linear trend, with smoothed median and interquartile range (IQR) band (25th–75th percentile); dashed line at zero (no curvature). Left (blue): the… view at source ↗

**Figure 12.** Figure 12: Analysis of particles with βcohort < 0 (reverse cohort effect). Using M = 20,000 particles, approximately 6% exhibit negative cohort coefficients, representing individuals for whom later birth cohorts have lower BMI—opposite of the typical obesity epidemic trend. (a) Marginal distribution of βcohort; vertical dashed line indicates zero. Shaded region highlights particles with βcohort < 0. (b) Condition… view at source ↗

**Figure 13.** Figure 13: Joint distribution of coefficients colored by extreme values (M = 20,000 particles). (a) Scatter plot of (β0, βage) colored by βage2 . Particles with βage2 > 0 (red/orange) tend to cluster in specific regions of the (β0, βage) space, revealing the correlation structure among coefficients. (b) Same scatter plot colored by βcohort. Particles with βcohort < 0 (red) show distinct patterns in the joint distrib… view at source ↗

**Figure 14.** Figure 14: Observed BMI trajectories (thin lines) overlaid on model-implied 80 % prediction bands (shaded) for individuals at the obesity threshold (BMI = 31 at age 50), ages 50–75. Prediction bands are the 10th–90th percentile of {x(age)⊤βm}M′ m=1, where βm are the M′ ≤ M particles (Wasserstein least squares) or quantile-level coefficient vectors (Fr´echet) that survive the conditioning criterion yb(50) ∈ [30, 32… view at source ↗

**Figure 15.** Figure 15: Probability of obesity (BMI ≥ 30) as a function of age for females in cohort 1945, comparing Wasserstein least squares and Fr´echet regression under single and sequential conditioning. (Panel 1) Wasserstein least squares, single conditioning. Conditioning on BMI at age 50 produces smooth, clinically meaningful probability curves. Individuals starting below the threshold (BMI50 = 28, green) show a modest… view at source ↗

**Figure 16.** Figure 16: Predicted BMI trajectories under single conditioning (BMI ∈ [29, 33] at age 50), cohorts 1935–39 and 1940–44, both sexes. The top row shows individual observed BMI trajectories from the matched HRS cohort for Wasserstein least squares (left, blue) and Fr´echet (right, red). The bottom row shows the Wasserstein least squares (left) and Fr´echet (right) prediction intervals together with empirical box plot… view at source ↗

**Figure 17.** Figure 17: Predicted BMI trajectories under sequential conditioning (BMI ∈ [29, 33] at age 50 followed by three scenarios at age 60), cohorts 1935–39 and 1940–44, both sexes. Columns correspond to three second-observation scenarios: worsening (BMI > 33 at 60, red), stable (BMI 29–33 at 60, blue), and improved (BMI < 29 at 60, green). The top row shows individual observed trajectories from the matched HRS cohort; th… view at source ↗

**Figure 18.** Figure 18: Template recovery under sinusoidal deformation (k = 1.2, n = 50). Density heatmaps of the true Q⋆ x (left), Wasserstein least squares fit (center), and Fr´echet fit (right) over the covariate range t ∈ [−2, 2]. Wasserstein least squares reproduces the U-shaped variance of the true template; Fr´echet produces nearly uniform spread across all t. uniformly from [−2, 2]; each response νi = (∇ϕi)#Q⋆ xi is app… view at source ↗

**Figure 19.** Figure 19: Wasserstein least squares recovers Q⋆ under affine and non-linear noise; Fr´echet regression is structurally misspecified. DGP: νi = (∇ϕi)#Q⋆ xi , Q⋆ xi ∼ N (t, 1+t 2 ), n = 50, five noise models (see [PITH_FULL_IMAGE:figures/full_fig_p054_19.png] view at source ↗

**Figure 20.** Figure 20: Wasserstein least squares recovers a quadratic covariance trajectory on the SPD manifold; Fr´echet-GD is structurally misspecified. DGP: νi = (∇ϕi)#Q⋆ xi , Q⋆ xi ∼ N (µ(t), Σ(t)) with Σ(t) = A + t(B+B⊤) + t 2C (quadratic on the SPD manifold), n = 50, three noise models (see [PITH_FULL_IMAGE:figures/full_fig_p057_20.png] view at source ↗

**Figure 21.** Figure 21: shows the median W2 2 error and interquartile range on log–log axes together with n −1/2 and n −1 reference lines. All three curves decay monotonically. The location-scale and radial models display a rate closer to n −1 over this range. 10 25 50 100 200 500 n (number of observations) 10 3 10 2 10 1 1 n i W2 2 (Q xi , Q xi ) Additive ( = = 1) Radial ( = 0.70) Location-Scale ( = 0.40) n 1/2 n 1 [PITH_FULL_… view at source ↗

read the original abstract

We perform a mathematical and statistical analysis of the Wasserstein least squares problem, a regression method for vector-valued covariates and distribution-valued responses. Our proposal contrasts with other distributional regression methods by having a direct interpretation in terms of random variables, as a nonparametric analogue of the classic random-effects model. On the mathematical side, we use a strategy of Lavenant (2024) to show that Wasserstein least squares is the canonical extension of Euclidean least squares to the space of probability distributions from the perspective of convex analysis; this viewpoint gives rise to multimarginal and dual formulations of the Wasserstein least squares problem, extending a similar theory for Wasserstein barycenters. We perform a statistical analysis of the Wasserstein least squares problem under the template deformation model, showing, surprisingly, that estimation is possible at the n^{-1/2} rate. As a special case, we obtain improved rates of estimation for Wasserstein barycenters, which are an exponential improvement over those established by Ahidar-Coutrix, Le Gouic and Paris (2020). Finally, we propose a heuristic particle method for Wasserstein least squares and use it to conduct a novel analysis of large-scale demographic data from the RAND Health and Retirement Study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames Wasserstein least squares as the convex-analysis canonical extension of ordinary least squares and obtains an n^{-1/2} rate plus barycenter improvement, but both statistical claims rest on the template deformation model.

read the letter

The new pieces are the multimarginal and dual formulations that follow from applying Lavenant's convex-analysis approach, plus the claim that this setup recovers the usual parametric rate when the data are generated by random deformations of a single template. The barycenter rates improve exponentially over the earlier Ahidar-Coutrix et al. result under the same assumption. They also give a particle heuristic and apply it to RAND Health and Retirement Study data, which is a concrete step beyond pure theory.

The convex-analysis part looks like a clean extension of existing barycenter work and gives a direct random-variable reading that other distributional regression methods lack. That is useful framing even if one does not buy the rates.

The statistical claims are narrower than the abstract might suggest. The n^{-1/2} rate and the barycenter gain are obtained only after restricting to the template deformation model; outside that model the problem is nonparametric over Wasserstein space and the slower rates from prior literature are the expected ones. The paper states the modeling choice explicitly, so the improvement is real but conditional rather than general. The particle method is labeled heuristic, so it supplies no error guarantees.

This is for readers already working in optimal transport or distributional statistics who want the convex-analysis viewpoint or need to handle distribution-valued responses under a deformation assumption. It has enough new formulation and a real-data example to merit referee time rather than desk rejection, though any review should check whether the derivations survive without the template model and whether the particle scheme can be made rigorous.

Referee Report

2 major / 2 minor

Summary. The paper introduces Wasserstein least squares as a regression method mapping vector covariates to distribution-valued responses. It uses a convex-analysis strategy from Lavenant (2024) to establish this as the canonical extension of Euclidean least squares, yielding multimarginal and dual formulations that extend barycenter theory. Under the template deformation model, the estimator achieves the parametric n^{-1/2} rate; as a special case this yields exponentially faster rates for Wasserstein barycenters than those in Ahidar-Coutrix et al. (2020). A particle heuristic is proposed and applied to RAND Health and Retirement Study demographic data.

Significance. The convex-analytic characterization supplies a principled, optimization-based foundation for distributional regression that may unify existing approaches. The n^{-1/2} rate under the deformation model is a notable improvement in a structured setting and directly improves barycenter estimation when the model holds. The empirical section provides a concrete demonstration on large-scale data, though the practical scope is limited by the modeling assumption.

major comments (2)

[Statistical analysis section] Statistical analysis section (abstract and corresponding main-text section): the n^{-1/2} rate and the exponential improvement over Ahidar-Coutrix et al. (2020) are obtained exclusively under the template deformation model; the manuscript should state explicitly whether any general (nonparametric) rate is available or whether the model is indispensable for the claimed rate, as this assumption is load-bearing for the headline statistical result.
[Mathematical analysis section] Mathematical analysis section: the multimarginal and dual formulations are asserted to follow from Lavenant (2024); the paper should include a self-contained verification that the Wasserstein least-squares functional is convex (or strictly convex under stated conditions) and that the dual problem recovers the same minimizer, citing the precise equation or proposition where this is shown.

minor comments (2)

[Application section] The abstract states that the n^{-1/2} claim rests on the template deformation model; the main text should add a short paragraph clarifying how this model is checked or motivated for the RAND data application.
Notation for the multimarginal formulation should be aligned with the barycenter literature to facilitate comparison; a short remark contrasting the two problems would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the two constructive major comments. Both points can be addressed by targeted revisions that clarify the scope of the statistical results and strengthen the self-contained mathematical presentation. We outline our responses below.

read point-by-point responses

Referee: [Statistical analysis section] Statistical analysis section (abstract and corresponding main-text section): the n^{-1/2} rate and the exponential improvement over Ahidar-Coutrix et al. (2020) are obtained exclusively under the template deformation model; the manuscript should state explicitly whether any general (nonparametric) rate is available or whether the model is indispensable for the claimed rate, as this assumption is load-bearing for the headline statistical result.

Authors: We agree that the n^{-1/2} rate (and the resulting exponential improvement for barycenters) is derived exclusively under the template deformation model; no general nonparametric rate is claimed or derived in the paper. The model is indispensable for the parametric rate. We will revise the abstract and the statistical analysis section to state this explicitly, making clear that the headline rate requires the deformation assumption and that the analysis does not provide rates outside this structured setting. revision: yes
Referee: [Mathematical analysis section] Mathematical analysis section: the multimarginal and dual formulations are asserted to follow from Lavenant (2024); the paper should include a self-contained verification that the Wasserstein least-squares functional is convex (or strictly convex under stated conditions) and that the dual problem recovers the same minimizer, citing the precise equation or proposition where this is shown.

Authors: We will add a short self-contained verification in the mathematical analysis section. This will adapt the convex-analytic arguments of Lavenant (2024) to the Wasserstein least-squares functional, explicitly showing convexity (and strict convexity under the stated conditions on the cost) and verifying that the dual recovers the same minimizer. The added text will cite the precise propositions from Lavenant (2024) that are being specialized, while keeping the argument self-contained for the reader. revision: yes

Circularity Check

0 steps flagged

No circularity: external strategy and explicit model assumption yield independent content

full rationale

The paper invokes an external strategy from Lavenant (2024) for the convex-analysis claim that Wasserstein least squares is canonical, and derives the n^{-1/2} rate explicitly under the stated template deformation model. Neither step reduces a prediction to a fitted quantity by construction, nor relies on self-citation load-bearing or ansatz smuggling. The comparison to Ahidar-Coutrix et al. (2020) is a benchmark contrast, not a definitional reduction. The derivation chain therefore contains independent mathematical and statistical content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on the template deformation model for the statistical rates and on the convex-analysis strategy of Lavenant (2024) for the canonical characterization; no free parameters or invented entities are mentioned in the abstract.

axioms (2)

domain assumption The template deformation model holds for the observed distributions.
Invoked to obtain the n^{-1/2} estimation rate in the statistical analysis.
domain assumption Lavenant (2024) strategy applies to the Wasserstein least squares functional.
Used to establish the canonical extension and multimarginal/dual formulations.

pith-pipeline@v0.9.1-grok · 5746 in / 1368 out tokens · 18149 ms · 2026-06-28T23:57:20.388786+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43 0 (2): 0 904--924, 2011. doi:10.1137/100805741

work page doi:10.1137/100805741 2011
[2]

Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics

Adil Ahidar-Coutrix, Thibaut Le Gouic, and Quentin Paris. Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics. Probability Theory and Related Fields, 177 0 (1--2): 0 323--368, 2020. doi:10.1007/s00440-019-00950-0

work page doi:10.1007/s00440-019-00950-0 2020
[3]

Alexandrov Geometry: Foundations , volume 236 of Graduate Studies in Mathematics

Stephanie Alexander, Vitali Kapovitch, and Anton Petrunin. Alexandrov Geometry: Foundations , volume 236 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2024. ISBN 978-1-4704-7536-9

2024
[4]

Aliprantis and Kim C

Charalambos D. Aliprantis and Kim C. Border. Infinite Dimensional Analysis: A Hitchhiker's Guide. Springer, Berlin, 3 edition, 2006. doi:10.1007/3-540-29587-9

work page doi:10.1007/3-540-29587-9 2006
[5]

Altschuler and Enric Boix-Adser \`a

Jason M. Altschuler and Enric Boix-Adser \`a . Wasserstein barycenters can be computed in polynomial time in fixed dimension. Journal of Machine Learning Research, 22 0 (44): 0 1--19, 2021. URL https://jmlr.org/papers/v22/20-588.html

2021
[6]

Altschuler, Sinho Chewi, Patrik R

Jason M. Altschuler, Sinho Chewi, Patrik R. Gerber, and Austin J. Stromme. Averaging on the Bures -- Wasserstein manifold: dimension-free convergence of gradient descent. In Advances in Neural Information Processing Systems, volume 34, pages 22132--22145, 2021

2021
[7]

u rich. Birkh \

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar \'e . Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Z \"u rich. Birkh \"a user, Basel, 2 edition, 2008. ISBN 978-3-7643-8721-1

2008
[8]

Barlow, David J

Richard E. Barlow, David J. Bartholomew, Joan M. Bremner, and Hugh D. Brunk. Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, London, 1972

1972
[9]

Geometrizing rates of convergence, ii.The Annals of Statistics, 19, 06 1991

Rudolf Beran and Peter Hall. Estimating coefficient distributions in random coefficient regressions. The Annals of Statistics, 20 0 (4): 0 1970--1984, 1992. doi:10.1214/aos/1176348898

work page doi:10.1214/aos/1176348898 1970
[10]

On the Bures -- Wasserstein distance between positive definite matrices

Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the Bures -- Wasserstein distance between positive definite matrices. Expositiones Mathematicae, 37 0 (2): 0 165--191, 2019. doi:10.1016/j.exmath.2018.01.002

work page doi:10.1016/j.exmath.2018.01.002 2019
[11]

Estimation and inference for the Wasserstein distance between mixing measures in topic models, 2022

Xin Bing, Florentina Bunea, and Jonathan Niles-Weed. Estimation and inference for the Wasserstein distance between mixing measures in topic models, 2022. Forthcoming, Bernoulli

2022
[12]

Distribution's template estimate with Wasserstein metrics

Emmanuel Boissard, Thibaut Le Gouic, and Jean-Michel Loubes. Distribution's template estimate with Wasserstein metrics. Bernoulli, 21 0 (2): 0 740--759, 2015. doi:10.3150/13-BEJ585

work page doi:10.3150/13-bej585 2015
[13]

An Introduction to Optimization on Smooth Manifolds

Nicolas Boumal. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. doi:10.1017/9781009166164

work page doi:10.1017/9781009166164 2023
[14]

G., Gut, G., Sarabia del Castillo, J., Levesque, M., Lehmann, K.-V., Pelkmans, L., Krause, A

Charlotte Bunne, Stefan G. Stark, Gabriele Gut, Jacobo Sarabia del Castillo , Mitch Levesque, Kjong-Van Lehmann, Lucas Pelkmans, Andreas Krause, and Gunnar R \"a tsch. Learning single-cell perturbation responses using neural optimal transport. Nature Methods, 20 0 (11): 0 1759--1768, 2023. doi:10.1038/s41592-023-01969-x

work page doi:10.1038/s41592-023-01969-x 2023
[15]

doi: 10.1007/s10208-009-9045-5

Emmanuel J. Cand \`e s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9 0 (6): 0 717--772, 2009. doi:10.1007/s10208-009-9045-5

work page doi:10.1007/s10208-009-9045-5 2009
[16]

About adult BMI , 2024

Centers for Disease Control and Prevention . About adult BMI , 2024. URL https://www.cdc.gov/bmi/about/index.html. Accessed: 2026-02-03

2024
[17]

Journal of the American Statistical Association , volume =

Yaqing Chen, Zhenhua Lin, and Hans-Georg M \"u ller. Wasserstein regression. Journal of the American Statistical Association, 118 0 (542): 0 869--882, 2023. doi:10.1080/01621459.2021.1956937

work page doi:10.1080/01621459.2021.1956937 2023
[18]

Sinho Chewi, Tyler Maunu, Philippe Rigollet, and Austin J. Stromme. Gradient descent algorithms for Bures -- Wasserstein barycenters. In Proceedings of the 33rd Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1276--1304. PMLR, 2020

2020
[19]

https://doi.org/10.1007/978-3-031-85160-5

Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet. Statistical Optimal Transport, volume 2364 of Lecture Notes in Mathematics. Springer, Cham, 2025. ISBN 978-3-031-85159-9. doi:10.1007/978-3-031-85160-5. \'E cole d' \'E t \'e de Probabilit \'e s de Saint-Flour XLIX -- 2019

work page doi:10.1007/978-3-031-85160-5 2025
[20]

Faster Wasserstein distance estimation with the Sinkhorn divergence

L \'e na \" c Chizat, Pierre Roussillon, Flavien L \'e ger, Fran c ois-Xavier Vialard, and Gabriel Peyr \'e . Faster Wasserstein distance estimation with the Sinkhorn divergence. In Advances in Neural Information Processing Systems, volume 33, pages 2257--2269, 2020

2020
[21]

Panel data from time series of cross-sections

Angus Deaton. Panel data from time series of cross-sections. Journal of Econometrics, 30 0 (1--2): 0 109--126, 1985. doi:10.1016/0304-4076(85)90134-4

work page doi:10.1016/0304-4076(85)90134-4 1985
[22]

A short proof on the rate of convergence of the empirical measure for the Wasserstein distance, 2021

Vincent Divol. A short proof on the rate of convergence of the empirical measure for the Wasserstein distance, 2021

2021
[23]

On the complexity of the optimal transport problem with graph-structured cost

Jiaojiao Fan, Isabel Haasler, Johan Karlsson, and Yongxin Chen. On the complexity of the optimal transport problem with graph-structured cost. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 9147--9165. PMLR, 2022

2022
[24]

On the rate of convergence in W asserstein distance of the empirical measure

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162 0 (3--4): 0 707--738, 2015. doi:10.1007/s00440-014-0583-7

work page doi:10.1007/s00440-014-0583-7 2015
[25]

On a formula for the L^2 Wasserstein metric between measures on Euclidean and Hilbert spaces

Matthias Gelbrich. On a formula for the L^2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten, 147: 0 185--203, 1990. doi:10.1002/mana.19901470121

work page doi:10.1002/mana.19901470121 1990
[26]

Panaretos

Laya Ghodrati and Victor M. Panaretos. Distribution-on-distribution regression via optimal transport maps. Biometrika, 109 0 (4): 0 957--974, 2022. doi:10.1093/biomet/asac005

work page doi:10.1093/biomet/asac005 2022
[27]

In BMI we trust: reframing the body mass index as a measure of health

Iliya Gutin. In BMI we trust: reframing the body mass index as a measure of health. Social Theory & Health, 16 0 (3): 0 256--271, 2018. doi:10.1057/s41285-017-0055-0

work page doi:10.1057/s41285-017-0055-0 2018
[28]

2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2 ed.)

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2 edition, 2009. doi:10.1007/978-0-387-84858-7

work page doi:10.1007/978-0-387-84858-7 2009
[29]

Convex Analysis and Minimization Algorithms I : Fundamentals , volume 305 of Grundlehren der mathematischen Wissenschaften

Jean-Baptiste Hiriart-Urruty and Claude Lemar \'e chal. Convex Analysis and Minimization Algorithms I : Fundamentals , volume 305 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 1993. doi:10.1007/978-3-662-02796-7

work page doi:10.1007/978-3-662-02796-7 1993
[30]

Minimal noise subsystems

David C. Hoaglin and Roy E. Welsch. The hat matrix in regression and ANOVA . The American Statistician, 32 0 (1): 0 17--22, 1978. doi:10.1080/00031305.1978.10479237

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.1978.10479237 1978
[31]

Analyzing the random coefficient model nonparametrically

Stefan Hoderlein, Jussi Klemel \"a , and Enno Mammen. Analyzing the random coefficient model nonparametrically. Econometric Theory, 26 0 (3): 0 804--837, 2010. doi:10.1017/S0266466609990119

work page doi:10.1017/s0266466609990119 2010
[32]

Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators

Tailen Hsing and Randall Eubank. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, 2015. doi:10.1002/9781118762547

work page doi:10.1002/9781118762547 2015
[33]

Huckemann

Shayan Hundrieser, Benjamin Eltzner, and Stephan F. Huckemann. A lower bound for estimating Fr \'e chet means, 2024

2024
[34]

Georgiou

Amirhossein Karimi and Tryphon T. Georgiou. Regression analysis of distributional data through multi-marginal optimal transport, 2021

2021
[35]

Georgiou

Amirhossein Karimi, Luigia Ripani, and Tryphon T. Georgiou. Statistical learning in Wasserstein space. IEEE Control Systems Letters, 5 0 (3): 0 899--904, 2021. doi:10.1109/LCSYS.2020.3006837

work page doi:10.1109/lcsys.2020.3006837 2021
[36]

Linear convergence of gradient and proximal-gradient methods under the Polyak -- ojasiewicz condition

Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak -- ojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD), volume 9851 of Lecture Notes in Computer Science, pages 795--811. Springer, 2016. doi:10.1007/978-3-319-46128-1_50

work page doi:10.1007/978-3-319-46128-1_50 2016
[37]

Laird and James H

Nan M. Laird and James H. Ware. Random-effects models for longitudinal data. Biometrics, 38 0 (4): 0 963--974, 1982. doi:10.2307/2529876

work page doi:10.2307/2529876 1982
[38]

Variational inference via Wasserstein gradient flows

Marc Lambert, Sinho Chewi, Francis Bach, Silv \`e re Bonnabel, and Philippe Rigollet. Variational inference via Wasserstein gradient flows. In Advances in Neural Information Processing Systems, volume 35, 2022

2022
[39]

Lifting functionals defined on maps to measure-valued maps via optimal transport

Hugo Lavenant. Lifting functionals defined on maps to measure-valued maps via optimal transport. Annali della Scuola Normale Superiore di Pisa, Classe di Scienze, 2024. doi:10.2422/2036-2145.202309_034. Published online

work page doi:10.2422/2036-2145.202309_034 2024
[40]

Thibaut Le Gouic, Quentin Paris, Philippe Rigollet, and Austin J. Stromme. Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space. Journal of the European Mathematical Society, 25 0 (6): 0 2229--2250, 2023. doi:10.4171/jems/1234

work page doi:10.4171/jems/1234 2023
[41]

Longford

Nicholas T. Longford. Random Coefficient Models, volume 11 of Oxford Statistical Science Series. Clarendon Press, Oxford University Press, New York, 1993. ISBN 0-19-852264-9

1993
[42]

Sharp convergence rates for empirical optimal transport with smooth costs

Tudor Manole and Jonathan Niles-Weed. Sharp convergence rates for empirical optimal transport with smooth costs. The Annals of Applied Probability, 34 0 (1B): 0 1108--1135, 2024. doi:10.1214/23-AAP1986

work page doi:10.1214/23-aap1986 2024
[43]

Plugin estimation of smooth optimal transport maps

Tudor Manole, Sivaraman Balakrishnan, Jonathan Niles-Weed, and Larry Wasserman. Plugin estimation of smooth optimal transport maps. The Annals of Statistics, 52 0 (3): 0 966--998, 2024. doi:10.1214/24-AOS2379

work page doi:10.1214/24-aos2379 2024
[44]

Peter Hall , functional data analysis and random objects

Hans-Georg M \"u ller. Peter Hall , functional data analysis and random objects. The Annals of Statistics, 44 0 (5): 0 1867--1887, 2016. doi:10.1214/16-AOS1492

work page doi:10.1214/16-aos1492 2016
[45]

Minimax estimation of smooth densities in Wasserstein distance

Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in Wasserstein distance. The Annals of Statistics, 50 0 (3): 0 1519--1540, 2022. doi:10.1214/21-AOS2161

work page doi:10.1214/21-aos2161 2022
[46]

Panaretos and Yoav Zemel

Victor M. Panaretos and Yoav Zemel. An Invitation to Statistics in Wasserstein Space . SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham, 2020. doi:10.1007/978-3-030-38438-8

work page doi:10.1007/978-3-030-38438-8 2020
[47]

Fr \'e chet regression for random objects with Euclidean predictors

Alexander Petersen and Hans-Georg M \"u ller. Fr \'e chet regression for random objects with Euclidean predictors. The Annals of Statistics, 47 0 (2): 0 691--719, 2019. doi:10.1214/17-AOS1624

work page doi:10.1214/17-aos1624 2019
[48]

Alexander Petersen, Xi Liu, and Afshin A. Divani. Wasserstein F -tests and confidence bands for the Fr \'e chet regression of density response curves. The Annals of Statistics, 49 0 (1): 0 590--611, 2021. doi:10.1214/20-AOS1971

work page doi:10.1214/20-aos1971 2021
[49]

Modeling probability density functions as data objects

Alexander Petersen, Chao Zhang, and Piotr Kokoszka. Modeling probability density functions as data objects. Econometrics and Statistics, 21: 0 159--178, 2022. doi:10.1016/j.ecosta.2021.04.004

work page doi:10.1016/j.ecosta.2021.04.004 2022
[50]

J. O. Ramsay and B. W. Silverman. Functional Data Analysis. Springer Series in Statistics. Springer, New York, 2 edition, 2005. ISBN 978-0-387-40080-8

2005
[51]

RAND HRS longitudinal file 2022 (v1)

RAND Center for the Study of Aging . RAND HRS longitudinal file 2022 (v1). Technical report, Institute for Social Research, University of Michigan, Ann Arbor, MI, 2025. URL https://hrsdata.isr.umich.edu/data-products/rand-hrs-longitudinal-file-2022. Funded by the National Institute on Aging (NIA U01AG009740) and the Social Security Administration

2022
[52]

High-dimensional statistics, 2023

Philippe Rigollet and Jan-Christian H \"u tter. High-dimensional statistics, 2023. Lecture notes for MIT 18.657

2023
[53]

Tyrrell Rockafellar

R. Tyrrell Rockafellar. Convex Analysis. Number 28 in Princeton Mathematical Series. Princeton University Press, Princeton, NJ, 1970

1970
[54]

A simple relaxation approach to duality for optimal transport problems in completely regular spaces

Giuseppe Savar \'e and Giacomo Enrico Sodini. A simple relaxation approach to duality for optimal transport problems in completely regular spaces. Journal of Convex Analysis, 29 0 (1): 0 1--12, 2022

2022
[55]

Minimax distribution estimation in Wasserstein distance, 2018

Shashank Singh and Barnab \'a s P \'o czos. Minimax distribution estimation in Wasserstein distance, 2018

2018
[56]

Non- Euclidean data analysis with metric statistics

Wookyeong Song, Hang Zhou, Yidong Zhou, and Hans-Georg M \"u ller. Non- Euclidean data analysis with metric statistics. Harvard Data Science Review, February 2026. URL https://hdsr.mitpress.mit.edu/pub/fi0cphkz

2026
[57]

Stephen M. Stigler. The History of Statistics: The Measurement of Uncertainty Before 1900. Belknap Press of Harvard University Press, Cambridge, MA, 1986. ISBN 0-674-40340-1

1900
[58]

On the geometry of metric measure spaces

Karl-Theodor Sturm. On the geometry of metric measure spaces. I . Acta Mathematica, 196 0 (1): 0 65--131, 2006. doi:10.1007/s11511-006-0002-8

work page doi:10.1007/s11511-006-0002-8 2006
[59]

Vaupel, Kenneth G

James W. Vaupel, Kenneth G. Manton, and Eric Stallard. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16 0 (3): 0 439--454, 1979. doi:10.2307/2061224

work page doi:10.2307/2061224 1979
[60]

A Guide to Modern Econometrics

Marno Verbeek. A Guide to Modern Econometrics. John Wiley & Sons, Hoboken, NJ, 5 edition, 2017. ISBN 978-1-119-40115-5

2017
[61]

High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. doi:10.1017/9781108231596

work page doi:10.1017/9781108231596 2018
[62]

Springer, Berlin, Heidelberg, 2009

C \'e dric Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. doi:10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009
[63]

Ward, Sara N

Zachary J. Ward, Sara N. Bleich, Angie L. Cradock, Jessica L. Barrett, Catherine M. Giles, Chasmine Flax, Michael W. Long, and Steven L. Gortmaker. Projected U.S. state-level prevalence of adult obesity and severe obesity. New England Journal of Medicine, 381 0 (25): 0 2440--2450, 2019. doi:10.1056/NEJMsa1909301

work page doi:10.1056/nejmsa1909301 2019
[64]

diffusion time

Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25 0 (4A): 0 2620--2648, 2019. doi:10.3150/18-BEJ1065

work page doi:10.3150/18-bej1065 2019
[65]

Obesity: Preventing and managing the global epidemic

World Health Organization . Obesity: Preventing and managing the global epidemic. report of a WHO consultation. Technical Report 894, World Health Organization, Geneva, 2000. PMID: 11234459

2000
[66]

Wasserstein F -tests for Fr \'e chet regression on Bures -- Wasserstein manifolds

Haoshu Xu and Hongzhe Li. Wasserstein F -tests for Fr \'e chet regression on Bures -- Wasserstein manifolds. Journal of Machine Learning Research, 26 0 (77): 0 1--123, 2025. URL http://jmlr.org/papers/v26/24-0493.html

2025
[67]

Walsh, Moira P

Yang Claire Yang, Christine E. Walsh, Moira P. Johnson, Daniel W. Belsky, Max Reason, Patrick Curran, Allison E. Aiello, Marianne Chanti-Ketterl, and Kathleen Mullan Harris. Life-course trajectories of body mass index from adolescence to old age: racial and educational disparities. Proceedings of the National Academy of Sciences, 118 0 (17): 0 e2020167118...

work page doi:10.1073/pnas.2020167118 2021
[68]

Panaretos

Yoav Zemel and Victor M. Panaretos. Fr \'e chet means and Procrustes analysis in Wasserstein space. Bernoulli, 25 0 (2): 0 932--976, 2019. doi:10.3150/17-BEJ1009

work page doi:10.3150/17-bej1009 2019

[1] [1]

Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924, 2011

Martial Agueh and Guillaume Carlier. Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43 0 (2): 0 904--924, 2011. doi:10.1137/100805741

work page doi:10.1137/100805741 2011

[2] [2]

Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics

Adil Ahidar-Coutrix, Thibaut Le Gouic, and Quentin Paris. Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics. Probability Theory and Related Fields, 177 0 (1--2): 0 323--368, 2020. doi:10.1007/s00440-019-00950-0

work page doi:10.1007/s00440-019-00950-0 2020

[3] [3]

Alexandrov Geometry: Foundations , volume 236 of Graduate Studies in Mathematics

Stephanie Alexander, Vitali Kapovitch, and Anton Petrunin. Alexandrov Geometry: Foundations , volume 236 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2024. ISBN 978-1-4704-7536-9

2024

[4] [4]

Aliprantis and Kim C

Charalambos D. Aliprantis and Kim C. Border. Infinite Dimensional Analysis: A Hitchhiker's Guide. Springer, Berlin, 3 edition, 2006. doi:10.1007/3-540-29587-9

work page doi:10.1007/3-540-29587-9 2006

[5] [5]

Altschuler and Enric Boix-Adser \`a

Jason M. Altschuler and Enric Boix-Adser \`a . Wasserstein barycenters can be computed in polynomial time in fixed dimension. Journal of Machine Learning Research, 22 0 (44): 0 1--19, 2021. URL https://jmlr.org/papers/v22/20-588.html

2021

[6] [6]

Altschuler, Sinho Chewi, Patrik R

Jason M. Altschuler, Sinho Chewi, Patrik R. Gerber, and Austin J. Stromme. Averaging on the Bures -- Wasserstein manifold: dimension-free convergence of gradient descent. In Advances in Neural Information Processing Systems, volume 34, pages 22132--22145, 2021

2021

[7] [7]

u rich. Birkh \

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar \'e . Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Z \"u rich. Birkh \"a user, Basel, 2 edition, 2008. ISBN 978-3-7643-8721-1

2008

[8] [8]

Barlow, David J

Richard E. Barlow, David J. Bartholomew, Joan M. Bremner, and Hugh D. Brunk. Statistical Inference under Order Restrictions: The Theory and Application of Isotonic Regression. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, London, 1972

1972

[9] [9]

Geometrizing rates of convergence, ii.The Annals of Statistics, 19, 06 1991

Rudolf Beran and Peter Hall. Estimating coefficient distributions in random coefficient regressions. The Annals of Statistics, 20 0 (4): 0 1970--1984, 1992. doi:10.1214/aos/1176348898

work page doi:10.1214/aos/1176348898 1970

[10] [10]

On the Bures -- Wasserstein distance between positive definite matrices

Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the Bures -- Wasserstein distance between positive definite matrices. Expositiones Mathematicae, 37 0 (2): 0 165--191, 2019. doi:10.1016/j.exmath.2018.01.002

work page doi:10.1016/j.exmath.2018.01.002 2019

[11] [11]

Estimation and inference for the Wasserstein distance between mixing measures in topic models, 2022

Xin Bing, Florentina Bunea, and Jonathan Niles-Weed. Estimation and inference for the Wasserstein distance between mixing measures in topic models, 2022. Forthcoming, Bernoulli

2022

[12] [12]

Distribution's template estimate with Wasserstein metrics

Emmanuel Boissard, Thibaut Le Gouic, and Jean-Michel Loubes. Distribution's template estimate with Wasserstein metrics. Bernoulli, 21 0 (2): 0 740--759, 2015. doi:10.3150/13-BEJ585

work page doi:10.3150/13-bej585 2015

[13] [13]

An Introduction to Optimization on Smooth Manifolds

Nicolas Boumal. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. doi:10.1017/9781009166164

work page doi:10.1017/9781009166164 2023

[14] [14]

G., Gut, G., Sarabia del Castillo, J., Levesque, M., Lehmann, K.-V., Pelkmans, L., Krause, A

Charlotte Bunne, Stefan G. Stark, Gabriele Gut, Jacobo Sarabia del Castillo , Mitch Levesque, Kjong-Van Lehmann, Lucas Pelkmans, Andreas Krause, and Gunnar R \"a tsch. Learning single-cell perturbation responses using neural optimal transport. Nature Methods, 20 0 (11): 0 1759--1768, 2023. doi:10.1038/s41592-023-01969-x

work page doi:10.1038/s41592-023-01969-x 2023

[15] [15]

doi: 10.1007/s10208-009-9045-5

Emmanuel J. Cand \`e s and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9 0 (6): 0 717--772, 2009. doi:10.1007/s10208-009-9045-5

work page doi:10.1007/s10208-009-9045-5 2009

[16] [16]

About adult BMI , 2024

Centers for Disease Control and Prevention . About adult BMI , 2024. URL https://www.cdc.gov/bmi/about/index.html. Accessed: 2026-02-03

2024

[17] [17]

Journal of the American Statistical Association , volume =

Yaqing Chen, Zhenhua Lin, and Hans-Georg M \"u ller. Wasserstein regression. Journal of the American Statistical Association, 118 0 (542): 0 869--882, 2023. doi:10.1080/01621459.2021.1956937

work page doi:10.1080/01621459.2021.1956937 2023

[18] [18]

Sinho Chewi, Tyler Maunu, Philippe Rigollet, and Austin J. Stromme. Gradient descent algorithms for Bures -- Wasserstein barycenters. In Proceedings of the 33rd Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1276--1304. PMLR, 2020

2020

[19] [19]

https://doi.org/10.1007/978-3-031-85160-5

Sinho Chewi, Jonathan Niles-Weed, and Philippe Rigollet. Statistical Optimal Transport, volume 2364 of Lecture Notes in Mathematics. Springer, Cham, 2025. ISBN 978-3-031-85159-9. doi:10.1007/978-3-031-85160-5. \'E cole d' \'E t \'e de Probabilit \'e s de Saint-Flour XLIX -- 2019

work page doi:10.1007/978-3-031-85160-5 2025

[20] [20]

Faster Wasserstein distance estimation with the Sinkhorn divergence

L \'e na \" c Chizat, Pierre Roussillon, Flavien L \'e ger, Fran c ois-Xavier Vialard, and Gabriel Peyr \'e . Faster Wasserstein distance estimation with the Sinkhorn divergence. In Advances in Neural Information Processing Systems, volume 33, pages 2257--2269, 2020

2020

[21] [21]

Panel data from time series of cross-sections

Angus Deaton. Panel data from time series of cross-sections. Journal of Econometrics, 30 0 (1--2): 0 109--126, 1985. doi:10.1016/0304-4076(85)90134-4

work page doi:10.1016/0304-4076(85)90134-4 1985

[22] [22]

A short proof on the rate of convergence of the empirical measure for the Wasserstein distance, 2021

Vincent Divol. A short proof on the rate of convergence of the empirical measure for the Wasserstein distance, 2021

2021

[23] [23]

On the complexity of the optimal transport problem with graph-structured cost

Jiaojiao Fan, Isabel Haasler, Johan Karlsson, and Yongxin Chen. On the complexity of the optimal transport problem with graph-structured cost. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 9147--9165. PMLR, 2022

2022

[24] [24]

On the rate of convergence in W asserstein distance of the empirical measure

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162 0 (3--4): 0 707--738, 2015. doi:10.1007/s00440-014-0583-7

work page doi:10.1007/s00440-014-0583-7 2015

[25] [25]

On a formula for the L^2 Wasserstein metric between measures on Euclidean and Hilbert spaces

Matthias Gelbrich. On a formula for the L^2 Wasserstein metric between measures on Euclidean and Hilbert spaces. Mathematische Nachrichten, 147: 0 185--203, 1990. doi:10.1002/mana.19901470121

work page doi:10.1002/mana.19901470121 1990

[26] [26]

Panaretos

Laya Ghodrati and Victor M. Panaretos. Distribution-on-distribution regression via optimal transport maps. Biometrika, 109 0 (4): 0 957--974, 2022. doi:10.1093/biomet/asac005

work page doi:10.1093/biomet/asac005 2022

[27] [27]

In BMI we trust: reframing the body mass index as a measure of health

Iliya Gutin. In BMI we trust: reframing the body mass index as a measure of health. Social Theory & Health, 16 0 (3): 0 256--271, 2018. doi:10.1057/s41285-017-0055-0

work page doi:10.1057/s41285-017-0055-0 2018

[28] [28]

2009.The Elements of Statistical Learning: Data Mining, Inference, and Prediction(2 ed.)

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2 edition, 2009. doi:10.1007/978-0-387-84858-7

work page doi:10.1007/978-0-387-84858-7 2009

[29] [29]

Convex Analysis and Minimization Algorithms I : Fundamentals , volume 305 of Grundlehren der mathematischen Wissenschaften

Jean-Baptiste Hiriart-Urruty and Claude Lemar \'e chal. Convex Analysis and Minimization Algorithms I : Fundamentals , volume 305 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 1993. doi:10.1007/978-3-662-02796-7

work page doi:10.1007/978-3-662-02796-7 1993

[30] [30]

Minimal noise subsystems

David C. Hoaglin and Roy E. Welsch. The hat matrix in regression and ANOVA . The American Statistician, 32 0 (1): 0 17--22, 1978. doi:10.1080/00031305.1978.10479237

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.1978.10479237 1978

[31] [31]

Analyzing the random coefficient model nonparametrically

Stefan Hoderlein, Jussi Klemel \"a , and Enno Mammen. Analyzing the random coefficient model nonparametrically. Econometric Theory, 26 0 (3): 0 804--837, 2010. doi:10.1017/S0266466609990119

work page doi:10.1017/s0266466609990119 2010

[32] [32]

Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators

Tailen Hsing and Randall Eubank. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley Series in Probability and Statistics. John Wiley & Sons, Chichester, 2015. doi:10.1002/9781118762547

work page doi:10.1002/9781118762547 2015

[33] [33]

Huckemann

Shayan Hundrieser, Benjamin Eltzner, and Stephan F. Huckemann. A lower bound for estimating Fr \'e chet means, 2024

2024

[34] [34]

Georgiou

Amirhossein Karimi and Tryphon T. Georgiou. Regression analysis of distributional data through multi-marginal optimal transport, 2021

2021

[35] [35]

Georgiou

Amirhossein Karimi, Luigia Ripani, and Tryphon T. Georgiou. Statistical learning in Wasserstein space. IEEE Control Systems Letters, 5 0 (3): 0 899--904, 2021. doi:10.1109/LCSYS.2020.3006837

work page doi:10.1109/lcsys.2020.3006837 2021

[36] [36]

Linear convergence of gradient and proximal-gradient methods under the Polyak -- ojasiewicz condition

Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the Polyak -- ojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases (ECML PKDD), volume 9851 of Lecture Notes in Computer Science, pages 795--811. Springer, 2016. doi:10.1007/978-3-319-46128-1_50

work page doi:10.1007/978-3-319-46128-1_50 2016

[37] [37]

Laird and James H

Nan M. Laird and James H. Ware. Random-effects models for longitudinal data. Biometrics, 38 0 (4): 0 963--974, 1982. doi:10.2307/2529876

work page doi:10.2307/2529876 1982

[38] [38]

Variational inference via Wasserstein gradient flows

Marc Lambert, Sinho Chewi, Francis Bach, Silv \`e re Bonnabel, and Philippe Rigollet. Variational inference via Wasserstein gradient flows. In Advances in Neural Information Processing Systems, volume 35, 2022

2022

[39] [39]

Lifting functionals defined on maps to measure-valued maps via optimal transport

Hugo Lavenant. Lifting functionals defined on maps to measure-valued maps via optimal transport. Annali della Scuola Normale Superiore di Pisa, Classe di Scienze, 2024. doi:10.2422/2036-2145.202309_034. Published online

work page doi:10.2422/2036-2145.202309_034 2024

[40] [40]

Thibaut Le Gouic, Quentin Paris, Philippe Rigollet, and Austin J. Stromme. Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space. Journal of the European Mathematical Society, 25 0 (6): 0 2229--2250, 2023. doi:10.4171/jems/1234

work page doi:10.4171/jems/1234 2023

[41] [41]

Longford

Nicholas T. Longford. Random Coefficient Models, volume 11 of Oxford Statistical Science Series. Clarendon Press, Oxford University Press, New York, 1993. ISBN 0-19-852264-9

1993

[42] [42]

Sharp convergence rates for empirical optimal transport with smooth costs

Tudor Manole and Jonathan Niles-Weed. Sharp convergence rates for empirical optimal transport with smooth costs. The Annals of Applied Probability, 34 0 (1B): 0 1108--1135, 2024. doi:10.1214/23-AAP1986

work page doi:10.1214/23-aap1986 2024

[43] [43]

Plugin estimation of smooth optimal transport maps

Tudor Manole, Sivaraman Balakrishnan, Jonathan Niles-Weed, and Larry Wasserman. Plugin estimation of smooth optimal transport maps. The Annals of Statistics, 52 0 (3): 0 966--998, 2024. doi:10.1214/24-AOS2379

work page doi:10.1214/24-aos2379 2024

[44] [44]

Peter Hall , functional data analysis and random objects

Hans-Georg M \"u ller. Peter Hall , functional data analysis and random objects. The Annals of Statistics, 44 0 (5): 0 1867--1887, 2016. doi:10.1214/16-AOS1492

work page doi:10.1214/16-aos1492 2016

[45] [45]

Minimax estimation of smooth densities in Wasserstein distance

Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in Wasserstein distance. The Annals of Statistics, 50 0 (3): 0 1519--1540, 2022. doi:10.1214/21-AOS2161

work page doi:10.1214/21-aos2161 2022

[46] [46]

Panaretos and Yoav Zemel

Victor M. Panaretos and Yoav Zemel. An Invitation to Statistics in Wasserstein Space . SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham, 2020. doi:10.1007/978-3-030-38438-8

work page doi:10.1007/978-3-030-38438-8 2020

[47] [47]

Fr \'e chet regression for random objects with Euclidean predictors

Alexander Petersen and Hans-Georg M \"u ller. Fr \'e chet regression for random objects with Euclidean predictors. The Annals of Statistics, 47 0 (2): 0 691--719, 2019. doi:10.1214/17-AOS1624

work page doi:10.1214/17-aos1624 2019

[48] [48]

Alexander Petersen, Xi Liu, and Afshin A. Divani. Wasserstein F -tests and confidence bands for the Fr \'e chet regression of density response curves. The Annals of Statistics, 49 0 (1): 0 590--611, 2021. doi:10.1214/20-AOS1971

work page doi:10.1214/20-aos1971 2021

[49] [49]

Modeling probability density functions as data objects

Alexander Petersen, Chao Zhang, and Piotr Kokoszka. Modeling probability density functions as data objects. Econometrics and Statistics, 21: 0 159--178, 2022. doi:10.1016/j.ecosta.2021.04.004

work page doi:10.1016/j.ecosta.2021.04.004 2022

[50] [50]

J. O. Ramsay and B. W. Silverman. Functional Data Analysis. Springer Series in Statistics. Springer, New York, 2 edition, 2005. ISBN 978-0-387-40080-8

2005

[51] [51]

RAND HRS longitudinal file 2022 (v1)

RAND Center for the Study of Aging . RAND HRS longitudinal file 2022 (v1). Technical report, Institute for Social Research, University of Michigan, Ann Arbor, MI, 2025. URL https://hrsdata.isr.umich.edu/data-products/rand-hrs-longitudinal-file-2022. Funded by the National Institute on Aging (NIA U01AG009740) and the Social Security Administration

2022

[52] [52]

High-dimensional statistics, 2023

Philippe Rigollet and Jan-Christian H \"u tter. High-dimensional statistics, 2023. Lecture notes for MIT 18.657

2023

[53] [53]

Tyrrell Rockafellar

R. Tyrrell Rockafellar. Convex Analysis. Number 28 in Princeton Mathematical Series. Princeton University Press, Princeton, NJ, 1970

1970

[54] [54]

A simple relaxation approach to duality for optimal transport problems in completely regular spaces

Giuseppe Savar \'e and Giacomo Enrico Sodini. A simple relaxation approach to duality for optimal transport problems in completely regular spaces. Journal of Convex Analysis, 29 0 (1): 0 1--12, 2022

2022

[55] [55]

Minimax distribution estimation in Wasserstein distance, 2018

Shashank Singh and Barnab \'a s P \'o czos. Minimax distribution estimation in Wasserstein distance, 2018

2018

[56] [56]

Non- Euclidean data analysis with metric statistics

Wookyeong Song, Hang Zhou, Yidong Zhou, and Hans-Georg M \"u ller. Non- Euclidean data analysis with metric statistics. Harvard Data Science Review, February 2026. URL https://hdsr.mitpress.mit.edu/pub/fi0cphkz

2026

[57] [57]

Stephen M. Stigler. The History of Statistics: The Measurement of Uncertainty Before 1900. Belknap Press of Harvard University Press, Cambridge, MA, 1986. ISBN 0-674-40340-1

1900

[58] [58]

On the geometry of metric measure spaces

Karl-Theodor Sturm. On the geometry of metric measure spaces. I . Acta Mathematica, 196 0 (1): 0 65--131, 2006. doi:10.1007/s11511-006-0002-8

work page doi:10.1007/s11511-006-0002-8 2006

[59] [59]

Vaupel, Kenneth G

James W. Vaupel, Kenneth G. Manton, and Eric Stallard. The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography, 16 0 (3): 0 439--454, 1979. doi:10.2307/2061224

work page doi:10.2307/2061224 1979

[60] [60]

A Guide to Modern Econometrics

Marno Verbeek. A Guide to Modern Econometrics. John Wiley & Sons, Hoboken, NJ, 5 edition, 2017. ISBN 978-1-119-40115-5

2017

[61] [61]

High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. doi:10.1017/9781108231596

work page doi:10.1017/9781108231596 2018

[62] [62]

Springer, Berlin, Heidelberg, 2009

C \'e dric Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, 2009. doi:10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009

[63] [63]

Ward, Sara N

Zachary J. Ward, Sara N. Bleich, Angie L. Cradock, Jessica L. Barrett, Catherine M. Giles, Chasmine Flax, Michael W. Long, and Steven L. Gortmaker. Projected U.S. state-level prevalence of adult obesity and severe obesity. New England Journal of Medicine, 381 0 (25): 0 2440--2450, 2019. doi:10.1056/NEJMsa1909301

work page doi:10.1056/nejmsa1909301 2019

[64] [64]

diffusion time

Jonathan Weed and Francis Bach. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli, 25 0 (4A): 0 2620--2648, 2019. doi:10.3150/18-BEJ1065

work page doi:10.3150/18-bej1065 2019

[65] [65]

Obesity: Preventing and managing the global epidemic

World Health Organization . Obesity: Preventing and managing the global epidemic. report of a WHO consultation. Technical Report 894, World Health Organization, Geneva, 2000. PMID: 11234459

2000

[66] [66]

Wasserstein F -tests for Fr \'e chet regression on Bures -- Wasserstein manifolds

Haoshu Xu and Hongzhe Li. Wasserstein F -tests for Fr \'e chet regression on Bures -- Wasserstein manifolds. Journal of Machine Learning Research, 26 0 (77): 0 1--123, 2025. URL http://jmlr.org/papers/v26/24-0493.html

2025

[67] [67]

Walsh, Moira P

Yang Claire Yang, Christine E. Walsh, Moira P. Johnson, Daniel W. Belsky, Max Reason, Patrick Curran, Allison E. Aiello, Marianne Chanti-Ketterl, and Kathleen Mullan Harris. Life-course trajectories of body mass index from adolescence to old age: racial and educational disparities. Proceedings of the National Academy of Sciences, 118 0 (17): 0 e2020167118...

work page doi:10.1073/pnas.2020167118 2021

[68] [68]

Panaretos

Yoav Zemel and Victor M. Panaretos. Fr \'e chet means and Procrustes analysis in Wasserstein space. Bernoulli, 25 0 (2): 0 932--976, 2019. doi:10.3150/17-BEJ1009

work page doi:10.3150/17-bej1009 2019