Recognition: no theorem link
Shape-constrained density estimation with Wasserstein projection
Pith reviewed 2026-05-15 13:06 UTC · model grok-4.3
The pith
Wasserstein projection estimation yields convex optimization for non-increasing and log-concave densities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. For non-increasing densities on R+ and log-concave densities on R, structural properties of the estimator are proved, a discretization is proposed for implementation with off-the-shelf solvers, and comparisons with the maximum likelihood estimator are made.
What carries the argument
Displacement-convex subsets of the Wasserstein space, which ensure that the projection estimation problem is convex.
Load-bearing premise
The shape constraints of interest must correspond to displacement-convex subsets of the Wasserstein space.
What would settle it
Solving the discretized Wasserstein projection problem for a non-increasing density constraint and checking whether the optimization remains convex when the constraint set violates displacement convexity.
Figures
read the original abstract
Statistical inference based on optimal transport offers a different perspective from that of maximum likelihood, and has increasingly gained attention in recent years. In this paper, we study univariate nonparametric shape-constrained density estimation via projection with respect to the $p$-Wasserstein distance, with a focus on the quadratic case $p = 2$. By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. We focus on two fundamental examples, namely non-increasing densities on $\mathbb{R}_+ := [0, \infty)$ and log-concave densities on $\mathbb{R}$. In each case, we prove structural properties of the Wasserstein projection estimator, propose a discretization which can be implemented by off-the-shelf solvers, and compare the projection estimator with the corresponding maximum likelihood estimator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a framework for univariate nonparametric shape-constrained density estimation by projecting an empirical measure onto displacement-convex subsets of the Wasserstein space under the p-Wasserstein metric (with emphasis on p=2). The central examples are non-increasing densities on [0,∞) and log-concave densities on R; the authors show that these constraints become convex sets when represented via quantile functions, render the projection a convex quadratic program, establish structural properties of the resulting estimators, propose a discretization solvable by off-the-shelf convex solvers, and compare the procedure to the corresponding maximum-likelihood estimators.
Significance. If the structural results and discretization analysis hold, the work supplies a computationally attractive, geometrically grounded alternative to maximum-likelihood estimation for two canonical shape constraints. The reduction to a convex quadratic program via the quantile-function representation of W2 geodesics is a clean application of displacement convexity and could be useful in settings where Wasserstein geometry is already natural. The explicit comparison with MLE also provides a concrete benchmark for practitioners.
major comments (2)
- [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.
- [§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.
minor comments (3)
- [§2 and §3.1] The notation for the quantile function and its inverse is introduced in §2 but reused with slightly different symbols in §3.1; a single consistent definition would improve readability.
- [Figure 2] Figure 2 (comparison of estimators) lacks error bars or variability measures across the Monte Carlo replications; adding these would make the visual comparison with MLE more informative.
- [References] The reference list omits several recent works on Wasserstein-based shape constraints (e.g., papers on convex-order projections); adding two or three key citations would better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and will revise the manuscript accordingly to improve clarity and completeness.
read point-by-point responses
-
Referee: [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.
Authors: We agree that the proof sketch of Theorem 3.4 would benefit from an explicit treatment of the atomic case. The squared 2-Wasserstein distance is strictly convex on the space of probability measures with finite second moments (a standard consequence of the strict convexity of the quadratic cost function in the optimal transport problem). Because the log-concave constraint set is convex in the Wasserstein space, uniqueness of the projection follows directly for any empirical measure, including those with finite support. We will revise the proof to include a short clarifying paragraph that invokes this general fact and confirms uniqueness holds without additional assumptions on the support of the empirical measure. revision: yes
-
Referee: [§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.
Authors: We acknowledge that Proposition 4.3 currently expresses the discretization error solely in terms of the mesh size h. To address the concern, we will add a remark following the proposition that discusses the dependence on n and tail behavior. Under the finite-second-moment assumption already required for the W2 setting, the total error remains controlled for moderate n; for heavy-tailed densities we will note that the bound can be applied after suitable truncation (with an explicit tail-probability term) or under additional moment conditions. This clarification will make the practical scope of the computational gains easier to evaluate. revision: yes
Circularity Check
No significant circularity detected
full rationale
The derivation chain rests on external, established results from optimal transport theory (displacement convexity of subsets in Wasserstein space) and one-dimensional convex analysis (quantile-function convexity for monotone densities and analogous characterizations for log-concave densities). These are invoked as independent mathematical facts rather than derived internally or via self-citation chains. The projection estimator is formulated as a convex quadratic program directly from the geometry of W2 geodesics (linear interpolations of quantiles), with no step reducing by construction to a fitted parameter, renamed ansatz, or load-bearing self-reference. The paper remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Shape constraints correspond to displacement-convex subsets of the Wasserstein space
Reference graph
Works this paper leans on
-
[1]
Shun-ichi Amari and Takeru Matsuda. Wasserstein statistics in one-dimensional location scale models.Annals of the Institute of Statistical Mathematics, 74(1):33–47, 2022
work page 2022
-
[2]
Shun-ichi Amari and Takeru Matsuda. Information geometry of Wasserstein statistics on shapes and affine deformations.Information Geometry, 7(2):285–309, 2024
work page 2024
-
[3]
Information geometry of the Otto metric.Information Geometry, 2024
Nihat Ay. Information geometry of the Otto metric.Information Geometry, 2024
work page 2024
-
[4]
Richard E Barlow and Hugh D Brunk. The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140–147, 1972
work page 1972
-
[5]
Federico Bassetti, Antonella Bodini, and Eugenio Regazzini. On minimum Kantorovich dis- tance estimators.Statistics & Probability Letters, 76(12):1298–1302, 2006
work page 2006
-
[6]
American Mathematical Society, 2019
Sergey Bobkov and Michel Ledoux.One-dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. American Mathematical Society, 2019
work page 2019
-
[7]
Petr Anatolevich Borodin, Yu Yu Druzhinin, and Kseniya Vasil’evna Chesnokova. Finite- dimensional subspaces ofl p with Lipschitz metric projection.Mathematical Notes, 102:465– 474, 2017
work page 2017
-
[8]
Cambridge University Press, 2004
Stephen P Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004
work page 2004
-
[9]
Ren´ e Carmona and Fran¸ cois Delarue.Probabilistic Theory of Mean Field Games with Appli- cations I: Mean Field FBSDEs, Control, and Games. Springer, 2018
work page 2018
-
[10]
Lenaic Chizat, Gabriel Peyr´ e, Bernhard Schmitzer, and Fran¸ cois-Xavier Vialard. An inter- polating distance between optimal transport and fisher–rao metrics.Foundations of Compu- tational Mathematics, 18(1):1–44, 2018
work page 2018
-
[11]
Shape-Constrained Density Estimation via Optimal Transport
Ryan Cumings-Menon. Shape-constrained density estimation via optimal transport.arXiv preprint arXiv:1710.09069, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024
Lutz D¨ umbgen. Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024
work page 2024
-
[13]
Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3):707–738, 2015
work page 2015
-
[14]
Ulf Grenander. On the theory of mortality measurement: part ii.Scandinavian Actuarial Journal, 1956(2):125–153, 1956
work page 1956
-
[15]
Cambridge University Press, 2014
Piet Groeneboom and Geurt Jongbloed.Nonparametric Estimation under Shape Constraints. Cambridge University Press, 2014
work page 2014
-
[16]
Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026
Rohan Hore, Ruodu Wang, and Aaditya Ramdas. Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026
-
[17]
Johnson.The NLopt nonlinear-optimization package, 2008
Steven G. Johnson.The NLopt nonlinear-optimization package, 2008
work page 2008
-
[18]
Bernd Klaus and Korbinian Strimmer.fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism, 2024. R package version 1.2.18
work page 2024
-
[19]
The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024
Hugo Lavenant, Jonas Luckhardt, Gilles Mordant, Bernhard Schmitzer, and Luca Tamanini. The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024
-
[20]
Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023
Wuchen Li and Jiaxi Zhao. Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023
work page 2023
-
[21]
A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997
Robert J McCann. A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997
work page 1997
-
[22]
ProQuest LLC, Ann Arbor, MI, 1994
Robert John McCann.A convexity theory for interacting gases and equilibrium crystals. ProQuest LLC, Ann Arbor, MI, 1994. Thesis (Ph.D.)–Princeton University
work page 1994
-
[23]
Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in Wasserstein distance.The Annals of Statistics, 50(3):1519–1540, 2022
work page 2022
-
[24]
On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025
Hayato Nishimori and Takeru Matsuda. On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025. SHAPE-CONSTRAINED DENSITY ESTIMATION WITH WASSERSTEIN PROJECTION 31
work page 2025
-
[25]
Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025
Naoki Otani and Takeru Matsuda. Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025
-
[26]
On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025
Soumik Pal, Bodhisattva Sen, and Ting-Kam Leonard Wong. On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025
-
[27]
R Foundation for Statistical Computing, Vienna, Austria, 2024
R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024
work page 2024
-
[28]
Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus
Philippe Rigollet and Jonathan Weed. Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus. Math´ ematique, 356(11-12):1228–1235, 2018
work page 2018
-
[29]
Princeton University Press, 1970
R Tyrrell Rockafellar.Convex Analysis. Princeton University Press, 1970
work page 1970
-
[30]
Kaspar Rufibach and Lutz Duembgen.logcondens: Estimate a Log-Concave Probability Den- sity from iid Observations, 2023. R package version 2.1.8
work page 2023
-
[31]
Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018
Richard J Samworth. Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018
work page 2018
-
[32]
Nonparametric Inference Under Shape Constraints
Richard J Samworth and Bodhisattva Sen. Special issue on “Nonparametric Inference Under Shape Constraints”.Statistical Science, 33(4):469–472, 2018
work page 2018
-
[33]
Filippo Santambrogio.Optimal Transport for Applied Mathematicians. Birk¨ auser, 2015
work page 2015
-
[34]
Filippo Santambrogio and Xu-Jia Wang. Convexity of the support of the displacement inter- polation: Counterexamples.Applied Mathematics Letters, 58:152–158, 2016
work page 2016
-
[35]
Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014
Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014
work page 2014
-
[36]
Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025
Nicol´ as Garc´ ıa Trillos, Adam Quinn Jaffe, and Bodhisattva Sen. Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025
-
[37]
Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019
Berwin A. Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019. R package version 1.5-8
work page 2019
-
[38]
Nina Vesseron, Elsa Cazelles, Alice Le Brigant, and Thierry Klein. On the Wasserstein geo- desic principal component analysis of probability measures.arXiv preprint arXiv:2506.04480, 2025
-
[39]
American Mathematical Society, 2003
C´ edric Villani.Topics in Optimal Transportation. American Mathematical Society, 2003
work page 2003
- [40]
-
[41]
Detecting the presence of mixing with multiscale maximum likelihood
Guenther Walther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association, 97(458):508–513, 2002
work page 2002
-
[42]
On minimax density estimation via measure transport
Sven Wang and Youssef Marzouk. On minimax density estimation via measure transport. arXiv preprint arXiv:2207.10231, 2022. Department of Mathematical Informatics, University of Tokyo & RIKEN Center for Brain Science Email address:matsuda@mist.i.u-tokyo.ac.jp Department of Statistical Sciences, University of Toronto Email address:tkl.wong@utoronto.ca
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.