arxiv: 2603.08939 · v2 · submitted 2026-03-09 · 🧮 math.ST · stat.TH

Recognition: no theorem link

Shape-constrained density estimation with Wasserstein projection

Takeru Matsuda , Ting-Kam Leonard Wong

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:06 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords shape-constrained density estimationWasserstein projectiondisplacement convexitynon-increasing densitieslog-concave densitiesoptimal transportconvex optimization

0 comments

The pith

Wasserstein projection estimation yields convex optimization for non-increasing and log-concave densities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that shape-constrained density estimation can be formulated as a convex optimization problem by projecting onto displacement-convex sets in the Wasserstein space. It focuses on estimating non-increasing densities on the non-negative reals and log-concave densities on the entire real line using the quadratic Wasserstein distance. This provides an alternative to maximum likelihood estimation with provable structural properties and a practical discretization scheme implementable by standard solvers.

Core claim

By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. For non-increasing densities on R+ and log-concave densities on R, structural properties of the estimator are proved, a discretization is proposed for implementation with off-the-shelf solvers, and comparisons with the maximum likelihood estimator are made.

What carries the argument

Displacement-convex subsets of the Wasserstein space, which ensure that the projection estimation problem is convex.

Load-bearing premise

The shape constraints of interest must correspond to displacement-convex subsets of the Wasserstein space.

What would settle it

Solving the discretized Wasserstein projection problem for a non-increasing density constraint and checking whether the optimization remains convex when the constraint set violates displacement convexity.

Figures

Figures reproduced from arXiv: 2603.08939 by Takeru Matsuda, Ting-Kam Leonard Wong.

**Figure 1.** Figure 1: Left: Estimated quantile functions from the Wasserstein projection estimator, for the mixture distributions given in (5.6) (Example 5.1). Right: Densities of the Wasserstein projection estimator (shaded) and Grenander’s estimator (dashed), for two cases (λ = 0, 0.8) highlighted by thicker lines on the left panel. The support {0.2, 1} of the data is shown by the crosses. Q0. 11 We implement the greatest c… view at source ↗

**Figure 2.** Figure 2: Left: Data and estimated quantile functions in the context of Example 5.2. Right: True and estimated densities. 0.001 and 0.999. These parameters are chosen for visual purposes. The quantile function of µn has two concave regions corresponding to the two peaks of the density. The Wasserstein projection µˆn is approximately the W2-projection of µ ∗ onto Fm,2 (see Theorem 2.14). On the other hand, Grenander’… view at source ↗

**Figure 3.** Figure 3: Estimated densities for the two-point distribution in Example 5.3. Left: λ = 0.4. Right: λ = 0.2. 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 Quantile function u data/true Wasserstein MLE 0 5 10 15 20 25 30 35 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Density x true Wasserstein MLE [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

**Figure 4.** Figure 4: Left: Empirical, true and estimated quantile functions in the context of Example 5.4. Right: True and estimated densities. The data points are shown by the crosses. has support [0, 1] (when λ ̸= 0, 1), we observe that the density of the Wasserstein projection estimator has a wider support in both cases. Example 5.4 (A misspecified case). Let µn be the empirical distribution of n = 50 i.i.d. samples from µ … view at source ↗

read the original abstract

Statistical inference based on optimal transport offers a different perspective from that of maximum likelihood, and has increasingly gained attention in recent years. In this paper, we study univariate nonparametric shape-constrained density estimation via projection with respect to the $p$-Wasserstein distance, with a focus on the quadratic case $p = 2$. By considering shape constraints given by displacement convex subsets of the Wasserstein space, Wasserstein projection estimation is a convex optimization problem. We focus on two fundamental examples, namely non-increasing densities on $\mathbb{R}_+ := [0, \infty)$ and log-concave densities on $\mathbb{R}$. In each case, we prove structural properties of the Wasserstein projection estimator, propose a discretization which can be implemented by off-the-shelf solvers, and compare the projection estimator with the corresponding maximum likelihood estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper turns Wasserstein projection onto non-increasing and log-concave densities into a convex quadratic program in 1D via quantile functions, with structural results and a usable discretization.

read the letter

The core advance is showing that displacement-convex shape constraints become convex sets under the quantile representation for the 2-Wasserstein metric. Non-increasing densities on the positive line correspond to convex quantile functions, and log-concave densities admit an analogous convex characterization. This makes the projection estimator a convex optimization problem that can be discretized and solved with standard solvers. They prove some structural properties of the resulting estimators and compare them to the maximum-likelihood versions for the same constraints. That is the useful part: a different computational route to the same shape-constrained problems that people already solve with MLE or other methods. The 1D geometry is clean and the reduction avoids the usual non-convexity issues that appear in higher dimensions or with other distances. The discretization step is the practical contribution that lets the method be implemented without custom code. The comparisons to MLE are the natural next check, though they are unsurprising in direction. Soft spots are limited. The error analysis for the discretization and the finite-sample behavior would need careful reading to confirm they are tight enough for the claims. The paper stays in one dimension, so it does not address whether the same convexity carries over elsewhere. Overall the argument is internally consistent and rests on standard optimal-transport facts rather than circular reasoning. This is for statisticians working on shape-constrained nonparametric estimation who already know optimal transport or want a convex alternative to likelihood methods. It is not a broad methodological overhaul, but the specific combination is new enough and the implementation is concrete enough that it deserves a serious referee. I would send it to peer review.

Referee Report

2 major / 3 minor

Summary. The manuscript develops a framework for univariate nonparametric shape-constrained density estimation by projecting an empirical measure onto displacement-convex subsets of the Wasserstein space under the p-Wasserstein metric (with emphasis on p=2). The central examples are non-increasing densities on [0,∞) and log-concave densities on R; the authors show that these constraints become convex sets when represented via quantile functions, render the projection a convex quadratic program, establish structural properties of the resulting estimators, propose a discretization solvable by off-the-shelf convex solvers, and compare the procedure to the corresponding maximum-likelihood estimators.

Significance. If the structural results and discretization analysis hold, the work supplies a computationally attractive, geometrically grounded alternative to maximum-likelihood estimation for two canonical shape constraints. The reduction to a convex quadratic program via the quantile-function representation of W2 geodesics is a clean application of displacement convexity and could be useful in settings where Wasserstein geometry is already natural. The explicit comparison with MLE also provides a concrete benchmark for practitioners.

major comments (2)

[§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.
[§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.

minor comments (3)

[§2 and §3.1] The notation for the quantile function and its inverse is introduced in §2 but reused with slightly different symbols in §3.1; a single consistent definition would improve readability.
[Figure 2] Figure 2 (comparison of estimators) lacks error bars or variability measures across the Monte Carlo replications; adding these would make the visual comparison with MLE more informative.
[References] The reference list omits several recent works on Wasserstein-based shape constraints (e.g., papers on convex-order projections); adding two or three key citations would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and will revise the manuscript accordingly to improve clarity and completeness.

read point-by-point responses

Referee: [§3.2, Theorem 3.4] §3.2, Theorem 3.4: the uniqueness argument for the projection onto the log-concave set relies on strict convexity of the squared W2 distance, but the proof sketch does not address the case in which the empirical measure is supported on finitely many atoms; an explicit argument or counter-example would be needed to confirm that the optimizer remains unique.

Authors: We agree that the proof sketch of Theorem 3.4 would benefit from an explicit treatment of the atomic case. The squared 2-Wasserstein distance is strictly convex on the space of probability measures with finite second moments (a standard consequence of the strict convexity of the quadratic cost function in the optimal transport problem). Because the log-concave constraint set is convex in the Wasserstein space, uniqueness of the projection follows directly for any empirical measure, including those with finite support. We will revise the proof to include a short clarifying paragraph that invokes this general fact and confirms uniqueness holds without additional assumptions on the support of the empirical measure. revision: yes
Referee: [§4.1, Proposition 4.3] §4.1, Algorithm 1 and Proposition 4.3: the discretization error bound is stated only in terms of the mesh size h without an explicit dependence on the number of samples n or the tail behavior of the target density; this makes it difficult to assess whether the reported computational gains remain valid for moderate n and heavy-tailed distributions.

Authors: We acknowledge that Proposition 4.3 currently expresses the discretization error solely in terms of the mesh size h. To address the concern, we will add a remark following the proposition that discusses the dependence on n and tail behavior. Under the finite-second-moment assumption already required for the W2 setting, the total error remains controlled for moderate n; for heavy-tailed densities we will note that the bound can be applied after suitable truncation (with an explicit tail-probability term) or under additional moment conditions. This clarification will make the practical scope of the computational gains easier to evaluate. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain rests on external, established results from optimal transport theory (displacement convexity of subsets in Wasserstein space) and one-dimensional convex analysis (quantile-function convexity for monotone densities and analogous characterizations for log-concave densities). These are invoked as independent mathematical facts rather than derived internally or via self-citation chains. The projection estimator is formulated as a convex quadratic program directly from the geometry of W2 geodesics (linear interpolations of quantiles), with no step reducing by construction to a fitted parameter, renamed ansatz, or load-bearing self-reference. The paper remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on displacement convexity of the chosen shape sets in Wasserstein space, a standard domain assumption from optimal-transport literature.

axioms (1)

domain assumption Shape constraints correspond to displacement-convex subsets of the Wasserstein space
Invoked to guarantee that the projection problem is convex.

pith-pipeline@v0.9.0 · 5432 in / 1183 out tokens · 50405 ms · 2026-05-15T13:06:00.134934+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Wasserstein statistics in one-dimensional location scale models.Annals of the Institute of Statistical Mathematics, 74(1):33–47, 2022

Shun-ichi Amari and Takeru Matsuda. Wasserstein statistics in one-dimensional location scale models.Annals of the Institute of Statistical Mathematics, 74(1):33–47, 2022

work page 2022
[2]

Information geometry of Wasserstein statistics on shapes and affine deformations.Information Geometry, 7(2):285–309, 2024

Shun-ichi Amari and Takeru Matsuda. Information geometry of Wasserstein statistics on shapes and affine deformations.Information Geometry, 7(2):285–309, 2024

work page 2024
[3]

Information geometry of the Otto metric.Information Geometry, 2024

Nihat Ay. Information geometry of the Otto metric.Information Geometry, 2024

work page 2024
[4]

The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140–147, 1972

Richard E Barlow and Hugh D Brunk. The isotonic regression problem and its dual.Journal of the American Statistical Association, 67(337):140–147, 1972

work page 1972
[5]

On minimum Kantorovich dis- tance estimators.Statistics & Probability Letters, 76(12):1298–1302, 2006

Federico Bassetti, Antonella Bodini, and Eugenio Regazzini. On minimum Kantorovich dis- tance estimators.Statistics & Probability Letters, 76(12):1298–1302, 2006

work page 2006
[6]

American Mathematical Society, 2019

Sergey Bobkov and Michel Ledoux.One-dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. American Mathematical Society, 2019

work page 2019
[7]

Finite- dimensional subspaces ofl p with Lipschitz metric projection.Mathematical Notes, 102:465– 474, 2017

Petr Anatolevich Borodin, Yu Yu Druzhinin, and Kseniya Vasil’evna Chesnokova. Finite- dimensional subspaces ofl p with Lipschitz metric projection.Mathematical Notes, 102:465– 474, 2017

work page 2017
[8]

Cambridge University Press, 2004

Stephen P Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004

work page 2004
[9]

Springer, 2018

Ren´ e Carmona and Fran¸ cois Delarue.Probabilistic Theory of Mean Field Games with Appli- cations I: Mean Field FBSDEs, Control, and Games. Springer, 2018

work page 2018
[10]

An inter- polating distance between optimal transport and fisher–rao metrics.Foundations of Compu- tational Mathematics, 18(1):1–44, 2018

Lenaic Chizat, Gabriel Peyr´ e, Bernhard Schmitzer, and Fran¸ cois-Xavier Vialard. An inter- polating distance between optimal transport and fisher–rao metrics.Foundations of Compu- tational Mathematics, 18(1):1–44, 2018

work page 2018
[11]

Shape-Constrained Density Estimation via Optimal Transport

Ryan Cumings-Menon. Shape-constrained density estimation via optimal transport.arXiv preprint arXiv:1710.09069, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024

Lutz D¨ umbgen. Shape-constrained statistical inference.Annual Review of Statistics and Its Application, 11, 2024

work page 2024
[13]

On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3):707–738, 2015

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3):707–738, 2015

work page 2015
[14]

On the theory of mortality measurement: part ii.Scandinavian Actuarial Journal, 1956(2):125–153, 1956

Ulf Grenander. On the theory of mortality measurement: part ii.Scandinavian Actuarial Journal, 1956(2):125–153, 1956

work page 1956
[15]

Cambridge University Press, 2014

Piet Groeneboom and Geurt Jongbloed.Nonparametric Estimation under Shape Constraints. Cambridge University Press, 2014

work page 2014
[16]

Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026

Rohan Hore, Ruodu Wang, and Aaditya Ramdas. Online monotone density estimation and log-optimal calibration.arXiv preprint arXiv:2602.08927, 2026

work page arXiv 2026
[17]

Johnson.The NLopt nonlinear-optimization package, 2008

Steven G. Johnson.The NLopt nonlinear-optimization package, 2008

work page 2008
[18]

R package version 1.2.18

Bernd Klaus and Korbinian Strimmer.fdrtool: Estimation of (Local) False Discovery Rates and Higher Criticism, 2024. R package version 1.2.18

work page 2024
[19]

The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024

Hugo Lavenant, Jonas Luckhardt, Gilles Mordant, Bernhard Schmitzer, and Luca Tamanini. The riemannian geometry of sinkhorn divergences.arXiv preprint arXiv:2405.04987, 2024

work page arXiv 2024
[20]

Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023

Wuchen Li and Jiaxi Zhao. Wasserstein information matrix.Information Geometry, 6(1):203– 255, 2023

work page 2023
[21]

A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997

Robert J McCann. A convexity principle for interacting gases.Advances in Mathematics, 128(1):153–179, 1997

work page 1997
[22]

ProQuest LLC, Ann Arbor, MI, 1994

Robert John McCann.A convexity theory for interacting gases and equilibrium crystals. ProQuest LLC, Ann Arbor, MI, 1994. Thesis (Ph.D.)–Princeton University

work page 1994
[23]

Minimax estimation of smooth densities in Wasserstein distance.The Annals of Statistics, 50(3):1519–1540, 2022

Jonathan Niles-Weed and Quentin Berthet. Minimax estimation of smooth densities in Wasserstein distance.The Annals of Statistics, 50(3):1519–1540, 2022

work page 2022
[24]

On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025

Hayato Nishimori and Takeru Matsuda. On the attainment of the Wasserstein–Cramer–Rao lower bound.Information Geometry, 2025. SHAPE-CONSTRAINED DENSITY ESTIMATION WITH WASSERSTEIN PROJECTION 31

work page 2025
[25]

Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025

Naoki Otani and Takeru Matsuda. Wasserstein projection estimators for circular distribu- tions.arXiv preprint arXiv:2510.18367, 2025

work page arXiv 2025
[26]

On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025

Soumik Pal, Bodhisattva Sen, and Ting-Kam Leonard Wong. On the Wasserstein alignment problem.arXiv preprint arXiv:2503.06838, 2025

work page arXiv 2025
[27]

R Foundation for Statistical Computing, Vienna, Austria, 2024

R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024

work page 2024
[28]

Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus

Philippe Rigollet and Jonathan Weed. Entropic optimal transport is maximum-likelihood deconvolution.Comptes Rendus. Math´ ematique, 356(11-12):1228–1235, 2018

work page 2018
[29]

Princeton University Press, 1970

R Tyrrell Rockafellar.Convex Analysis. Princeton University Press, 1970

work page 1970
[30]

R package version 2.1.8

Kaspar Rufibach and Lutz Duembgen.logcondens: Estimate a Log-Concave Probability Den- sity from iid Observations, 2023. R package version 2.1.8

work page 2023
[31]

Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018

Richard J Samworth. Recent progress in log-concave density estimation.Statistical Science, 33(4):493–509, 2018

work page 2018
[32]

Nonparametric Inference Under Shape Constraints

Richard J Samworth and Bodhisattva Sen. Special issue on “Nonparametric Inference Under Shape Constraints”.Statistical Science, 33(4):469–472, 2018

work page 2018
[33]

Birk¨ auser, 2015

Filippo Santambrogio.Optimal Transport for Applied Mathematicians. Birk¨ auser, 2015

work page 2015
[34]

Convexity of the support of the displacement inter- polation: Counterexamples.Applied Mathematics Letters, 58:152–158, 2016

Filippo Santambrogio and Xu-Jia Wang. Convexity of the support of the displacement inter- polation: Counterexamples.Applied Mathematics Letters, 58:152–158, 2016

work page 2016
[35]

Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014

Adrien Saumard and Jon A Wellner. Log-concavity and strong log-concavity: a review.Sta- tistics surveys, 8:45, 2014

work page 2014
[36]

Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025

Nicol´ as Garc´ ıa Trillos, Adam Quinn Jaffe, and Bodhisattva Sen. Wasserstein-Cram´ eer-Rao theory of unbiased estimation.arXiv preprint arXiv:2511.07414, 2025

work page arXiv 2025
[37]

Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019

Berwin A. Turlach and Andreas Weingessel.quadprog: Functions to Solve Quadratic Pro- gramming Problems, 2019. R package version 1.5-8

work page 2019
[38]

On the Wasserstein geo- desic principal component analysis of probability measures.arXiv preprint arXiv:2506.04480, 2025

Nina Vesseron, Elsa Cazelles, Alice Le Brigant, and Thierry Klein. On the Wasserstein geo- desic principal component analysis of probability measures.arXiv preprint arXiv:2506.04480, 2025

work page arXiv 2025
[39]

American Mathematical Society, 2003

C´ edric Villani.Topics in Optimal Transportation. American Mathematical Society, 2003

work page 2003
[40]

Springer, 2008

C´ edric Villani.Optimal Transport: Old and New. Springer, 2008

work page 2008
[41]

Detecting the presence of mixing with multiscale maximum likelihood

Guenther Walther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association, 97(458):508–513, 2002

work page 2002
[42]

On minimax density estimation via measure transport

Sven Wang and Youssef Marzouk. On minimax density estimation via measure transport. arXiv preprint arXiv:2207.10231, 2022. Department of Mathematical Informatics, University of Tokyo & RIKEN Center for Brain Science Email address:matsuda@mist.i.u-tokyo.ac.jp Department of Statistical Sciences, University of Toronto Email address:tkl.wong@utoronto.ca

work page arXiv 2022