pith. machine review for the scientific record. sign in

arxiv: 2603.08939 · v2 · submitted 2026-03-09 · 🧮 math.ST · stat.TH

Recognition: no theorem link

Shape-constrained density estimation with Wasserstein projection

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:06 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords shape-constrained density estimationWasserstein projectiondisplacement convexitynon-increasing densitieslog-concave densitiesoptimal transportconvex optimization
0
0 comments X

The pith

Wasserstein projection estimation yields convex optimization for non-increasing and log-concave densities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that shape-constrained density estimation can be formulated as a convex optimization problem by projecting onto displacement-convex sets in the Wasserstein space. It focuses on estimating non-increasing densities on the non-negative reals and log-concave densities on the entire real line using the quadratic Wasserstein distance. This provides an alternative to maximum likelihood estimation with provable structural properties and a practical discretization scheme implementable by standard solvers.

Core claim

By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. For non-increasing densities on R+ and log-concave densities on R, structural properties of the estimator are proved, a discretization is proposed for implementation with off-the-shelf solvers, and comparisons with the maximum likelihood estimator are made.

What carries the argument

Displacement-convex subsets of the Wasserstein space, which ensure that the projection estimation problem is convex.

Load-bearing premise

The shape constraints of interest must correspond to displacement-convex subsets of the Wasserstein space.

What would settle it

Solving the discretized Wasserstein projection problem for a non-increasing density constraint and checking whether the optimization remains convex when the constraint set violates displacement convexity.

Figures

Figures reproduced from arXiv: 2603.08939 by Takeru Matsuda, Ting-Kam Leonard Wong.

Figure 1
Figure 1. Figure 1: Left: Estimated quantile functions from the Wasser￾stein projection estimator, for the mixture distributions given in (5.6) (Example 5.1). Right: Densities of the Wasserstein projec￾tion estimator (shaded) and Grenander’s estimator (dashed), for two cases (λ = 0, 0.8) highlighted by thicker lines on the left panel. The support {0.2, 1} of the data is shown by the crosses. Q0. 11 We implement the greatest c… view at source ↗
Figure 2
Figure 2. Figure 2: Left: Data and estimated quantile functions in the context of Example 5.2. Right: True and estimated densities. 0.001 and 0.999. These parameters are chosen for visual purposes. The quantile function of µn has two concave regions corresponding to the two peaks of the density. The Wasserstein projection µˆn is approximately the W2-projection of µ ∗ onto Fm,2 (see Theorem 2.14). On the other hand, Grenander’… view at source ↗
Figure 3
Figure 3. Figure 3: Estimated densities for the two-point distribution in Example 5.3. Left: λ = 0.4. Right: λ = 0.2. 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 Quantile function u data/true Wasserstein MLE 0 5 10 15 20 25 30 35 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Density x true Wasserstein MLE [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Left: Empirical, true and estimated quantile functions in the context of Example 5.4. Right: True and estimated densities. The data points are shown by the crosses. has support [0, 1] (when λ ̸= 0, 1), we observe that the density of the Wasserstein projection estimator has a wider support in both cases. Example 5.4 (A misspecified case). Let µn be the empirical distribution of n = 50 i.i.d. samples from µ … view at source ↗
read the original abstract

Statistical inference based on optimal transport offers a different perspective from that of maximum likelihood, and has increasingly gained attention in recent years. In this paper, we study univariate nonparametric shape-constrained density estimation via projection with respect to the $p$-Wasserstein distance, with a focus on the quadratic case $p = 2$. By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. We focus on two fundamental examples, namely non-increasing densities on $\mathbb{R}_+ := [0, \infty)$ and log-concave densities on $\mathbb{R}$. In each case, we prove structural properties of the Wasserstein projection estimator, propose a discretization which can be implemented by off-the-shelf solvers, and compare the projection estimator with the corresponding maximum likelihood estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a framework for univariate nonparametric shape-constrained density estimation by projecting an empirical measure onto displacement-convex subsets of the Wasserstein space under the p-Wasserstein metric (with emphasis on p=2). The central examples are non-increasing densities on [0,∞) and log-concave densities on R; the authors show that these constraints become convex sets when represented via quantile functions, render the projection a convex quadratic program, establish structural properties of the resulting estimators, propose a discretization solvable by off-the-shelf convex solvers, and compare the procedure to the corresponding maximum-likelihood estimators.

Significance. If the structural results and discretization analysis hold, the work supplies a computationally attractive, geometrically grounded alternative to maximum-likelihood estimation for two canonical shape constraints. The reduction to a convex quadratic program via the quantile-function representation of W2 geodesics is a clean application of displacement convexity and could be useful in settings where Wasserstein geometry is already natural. The explicit comparison with MLE also provides a concrete benchmark for practitioners.

major comments (2)
  1. [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.
  2. [§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.
minor comments (3)
  1. [§2 and §3.1] The notation for the quantile function and its inverse is introduced in §2 but reused with slightly different symbols in §3.1; a single consistent definition would improve readability.
  2. [Figure 2] Figure 2 (comparison of estimators) lacks error bars or variability measures across the Monte Carlo replications; adding these would make the visual comparison with MLE more informative.
  3. [References] The reference list omits several recent works on Wasserstein-based shape constraints (e.g., papers on convex-order projections); adding two or three key citations would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and will revise the manuscript accordingly to improve clarity and completeness.

read point-by-point responses
  1. Referee: [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.

    Authors: We agree that the proof sketch of Theorem 3.4 would benefit from an explicit treatment of the atomic case. The squared 2-Wasserstein distance is strictly convex on the space of probability measures with finite second moments (a standard consequence of the strict convexity of the quadratic cost function in the optimal transport problem). Because the log-concave constraint set is convex in the Wasserstein space, uniqueness of the projection follows directly for any empirical measure, including those with finite support. We will revise the proof to include a short clarifying paragraph that invokes this general fact and confirms uniqueness holds without additional assumptions on the support of the empirical measure. revision: yes

  2. Referee: [§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.

    Authors: We acknowledge that Proposition 4.3 currently expresses the discretization error solely in terms of the mesh size h. To address the concern, we will add a remark following the proposition that discusses the dependence on n and tail behavior. Under the finite-second-moment assumption already required for the W2 setting, the total error remains controlled for moderate n; for heavy-tailed densities we will note that the bound can be applied after suitable truncation (with an explicit tail-probability term) or under additional moment conditions. This clarification will make the practical scope of the computational gains easier to evaluate. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain rests on external, established results from optimal transport theory (displacement convexity of subsets in Wasserstein space) and one-dimensional convex analysis (quantile-function convexity for monotone densities and analogous characterizations for log-concave densities). These are invoked as independent mathematical facts rather than derived internally or via self-citation chains. The projection estimator is formulated as a convex quadratic program directly from the geometry of W2 geodesics (linear interpolations of quantiles), with no step reducing by construction to a fitted parameter, renamed ansatz, or load-bearing self-reference. The paper remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on displacement convexity of the chosen shape sets in Wasserstein space, a standard domain assumption from optimal-transport literature.

axioms (1)
  • domain assumption Shape constraints correspond to displacement-convex subsets of the Wasserstein space
    Invoked to guarantee that the projection problem is convex.

pith-pipeline@v0.9.0 · 5432 in / 1183 out tokens · 50405 ms · 2026-05-15T13:06:00.134934+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    Wasserstein statistics in one-dimensional location scale models.Annals of the Institute of Statistical Mathematics, 74(1):33–47, 2022

    Shun-ichi Amari and Takeru Matsuda. Wasserstein statistics in one-dimensional location scale models.Annals of the Institute of Statistical Mathematics, 74(1):33–47, 2022

  2. [2]

    Information geometry of Wasserstein statistics on shapes and affine deformations.Information Geometry, 7(2):285–309, 2024

    Shun-ichi Amari and Takeru Matsuda. Information geometry of Wasserstein statistics on shapes and affine deformations.Information Geometry, 7(2):285–309, 2024

  3. [3]

    Information geometry of the Otto metric.Information Geometry, 2024

    Nihat Ay. Information geometry of the Otto metric.Information Geometry, 2024

  4. [4]

    The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140–147, 1972

    Richard E Barlow and Hugh D Brunk. The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140–147, 1972

  5. [5]

    On minimum Kantorovich dis- tance estimators.Statistics & Probability Letters, 76(12):1298–1302, 2006

    Federico Bassetti, Antonella Bodini, and Eugenio Regazzini. On minimum Kantorovich dis- tance estimators.Statistics & Probability Letters, 76(12):1298–1302, 2006

  6. [6]

    American Mathematical Society, 2019

    Sergey Bobkov and Michel Ledoux.One-dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. American Mathematical Society, 2019

  7. [7]

    Finite- dimensional subspaces ofl p with Lipschitz metric projection.Mathematical Notes, 102:465– 474, 2017

    Petr Anatolevich Borodin, Yu Yu Druzhinin, and Kseniya Vasil’evna Chesnokova. Finite- dimensional subspaces ofl p with Lipschitz metric projection.Mathematical Notes, 102:465– 474, 2017

  8. [8]

    Cambridge University Press, 2004

    Stephen P Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004

  9. [9]

    Springer, 2018

    Ren´ e Carmona and Fran¸ cois Delarue.Probabilistic Theory of Mean Field Games with Appli- cations I: Mean Field FBSDEs, Control, and Games. Springer, 2018

  10. [10]

    An inter- polating distance between optimal transport and fisher–rao metrics.Foundations of Compu- tational Mathematics, 18(1):1–44, 2018

    Lenaic Chizat, Gabriel Peyr´ e, Bernhard Schmitzer, and Fran¸ cois-Xavier Vialard. An inter- polating distance between optimal transport and fisher–rao metrics.Foundations of Compu- tational Mathematics, 18(1):1–44, 2018

  11. [11]

    Shape-Constrained Density Estimation via Optimal Transport

    Ryan Cumings-Menon. Shape-constrained density estimation via optimal transport.arXiv preprint arXiv:1710.09069, 2017

  12. [12]

    Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024

    Lutz D¨ umbgen. Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024

  13. [13]

    On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3):707–738, 2015

    Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3):707–738, 2015

  14. [14]

    On the theory of mortality measurement: part ii.Scandinavian Actuarial Journal, 1956(2):125–153, 1956

    Ulf Grenander. On the theory of mortality measurement: part ii.Scandinavian Actuarial Journal, 1956(2):125–153, 1956

  15. [15]

    Cambridge University Press, 2014

    Piet Groeneboom and Geurt Jongbloed.Nonparametric Estimation under Shape Constraints. Cambridge University Press, 2014

  16. [16]

    Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026

    Rohan Hore, Ruodu Wang, and Aaditya Ramdas. Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026

  17. [17]

    Johnson.The NLopt nonlinear-optimization package, 2008

    Steven G. Johnson.The NLopt nonlinear-optimization package, 2008

  18. [18]

    R package version 1.2.18

    Bernd Klaus and Korbinian Strimmer.fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism, 2024. R package version 1.2.18

  19. [19]

    The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024

    Hugo Lavenant, Jonas Luckhardt, Gilles Mordant, Bernhard Schmitzer, and Luca Tamanini. The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024

  20. [20]

    Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023

    Wuchen Li and Jiaxi Zhao. Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023

  21. [21]

    A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997

    Robert J McCann. A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997

  22. [22]

    ProQuest LLC, Ann Arbor, MI, 1994

    Robert John McCann.A convexity theory for interacting gases and equilibrium crystals. ProQuest LLC, Ann Arbor, MI, 1994. Thesis (Ph.D.)–Princeton University

  23. [23]

    Minimax estimation of smooth densities in Wasserstein distance.The Annals of Statistics, 50(3):1519–1540, 2022

    Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in Wasserstein distance.The Annals of Statistics, 50(3):1519–1540, 2022

  24. [24]

    On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025

    Hayato Nishimori and Takeru Matsuda. On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025. SHAPE-CONSTRAINED DENSITY ESTIMATION WITH WASSERSTEIN PROJECTION 31

  25. [25]

    Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025

    Naoki Otani and Takeru Matsuda. Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025

  26. [26]

    On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025

    Soumik Pal, Bodhisattva Sen, and Ting-Kam Leonard Wong. On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025

  27. [27]

    R Foundation for Statistical Computing, Vienna, Austria, 2024

    R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024

  28. [28]

    Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus

    Philippe Rigollet and Jonathan Weed. Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus. Math´ ematique, 356(11-12):1228–1235, 2018

  29. [29]

    Princeton University Press, 1970

    R Tyrrell Rockafellar.Convex Analysis. Princeton University Press, 1970

  30. [30]

    R package version 2.1.8

    Kaspar Rufibach and Lutz Duembgen.logcondens: Estimate a Log-Concave Probability Den- sity from iid Observations, 2023. R package version 2.1.8

  31. [31]

    Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018

    Richard J Samworth. Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018

  32. [32]

    Nonparametric Inference Under Shape Constraints

    Richard J Samworth and Bodhisattva Sen. Special issue on “Nonparametric Inference Under Shape Constraints”.Statistical Science, 33(4):469–472, 2018

  33. [33]

    Birk¨ auser, 2015

    Filippo Santambrogio.Optimal Transport for Applied Mathematicians. Birk¨ auser, 2015

  34. [34]

    Convexity of the support of the displacement inter- polation: Counterexamples.Applied Mathematics Letters, 58:152–158, 2016

    Filippo Santambrogio and Xu-Jia Wang. Convexity of the support of the displacement inter- polation: Counterexamples.Applied Mathematics Letters, 58:152–158, 2016

  35. [35]

    Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014

    Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014

  36. [36]

    Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025

    Nicol´ as Garc´ ıa Trillos, Adam Quinn Jaffe, and Bodhisattva Sen. Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025

  37. [37]

    Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019

    Berwin A. Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019. R package version 1.5-8

  38. [38]

    On the Wasserstein geo- desic principal component analysis of probability measures.arXiv preprint arXiv:2506.04480, 2025

    Nina Vesseron, Elsa Cazelles, Alice Le Brigant, and Thierry Klein. On the Wasserstein geo- desic principal component analysis of probability measures.arXiv preprint arXiv:2506.04480, 2025

  39. [39]

    American Mathematical Society, 2003

    C´ edric Villani.Topics in Optimal Transportation. American Mathematical Society, 2003

  40. [40]

    Springer, 2008

    C´ edric Villani.Optimal Transport: Old and New. Springer, 2008

  41. [41]

    Detecting the presence of mixing with multiscale maximum likelihood

    Guenther Walther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association, 97(458):508–513, 2002

  42. [42]

    On minimax density estimation via measure transport

    Sven Wang and Youssef Marzouk. On minimax density estimation via measure transport. arXiv preprint arXiv:2207.10231, 2022. Department of Mathematical Informatics, University of Tokyo & RIKEN Center for Brain Science Email address:matsuda@mist.i.u-tokyo.ac.jp Department of Statistical Sciences, University of Toronto Email address:tkl.wong@utoronto.ca