Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Jeff M. Phillips; Peter Matthew Jacobs

arxiv: 2605.20122 · v1 · pith:NNS7UZL4new · submitted 2026-05-19 · 📊 stat.ML · cs.CC· cs.LG

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Peter Matthew Jacobs , Jeff M. Phillips This is my paper

Pith reviewed 2026-05-20 03:22 UTC · model grok-4.3

classification 📊 stat.ML cs.CCcs.LG

keywords Wasserstein distanceHolder smooth distributionsgrid sketchingcomputational statisticsoptimal transportsample complexitydistribution estimation

0 comments

The pith

A regular cartesian grid sketch of samples estimates squared Wasserstein distance to additive epsilon error in time epsilon to the power of minus max of 2 and (d plus 1 over 1 plus alpha) for alpha-Holder smooth distributions on the unit d-

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Sample-Sketch-Solve method that first projects samples from two distributions onto a regular cartesian grid. For alpha-Holder smooth distributions this projection compresses the input while preserving the statistical error rate of the Wasserstein estimator. The resulting grid structure then admits faster exact optimal-transport algorithms. The combined procedure therefore achieves an overall runtime of epsilon to the power minus max of 2 and (d plus 1 plus o of 1) over (1 plus alpha) when the distributions are alpha-Holder smooth on the unit cube. In two dimensions with alpha greater than one half the runtime becomes the statistically optimal Theta of epsilon to the minus 2.

Core claim

By inserting a regular cartesian grid sketch into the estimation pipeline, the squared Wasserstein distance W two squared between alpha-Holder smooth measures P and Q supported on the unit cube can be approximated to additive error epsilon in expectation using a total computational-statistical runtime of epsilon to the power of minus max of 2 and (d plus 1 plus o of 1) over (1 plus alpha).

What carries the argument

The regular cartesian grid sketch of the empirical samples, which simultaneously compresses the data without asymptotically inflating statistical error and imposes a regular structure that permits faster exact Wasserstein solvers.

If this is right

When d equals 2 and alpha exceeds one half the runtime is statistically optimal at Theta of epsilon to the minus 2.
When d equals 3 and alpha approaches 1 the runtime becomes nearly optimal.
The sketching step costs linear time in the number of samples and can be performed with constant work per sample.
Exact Wasserstein algorithms can now be applied directly to the much smaller sketched measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grid compression idea may extend to other integral probability metrics whose cost functions admit similar regularity.
Adaptive or non-uniform grids could be substituted to reduce the hidden constants while preserving the asymptotic scaling.
The approach supplies a concrete way to balance sample size against per-sample computation when Wasserstein estimation is embedded inside larger learning pipelines.

Load-bearing premise

The underlying distributions must be alpha-Holder smooth on the unit cube so that the grid sketch does not increase the statistical error beyond the target epsilon.

What would settle it

Running the sketched estimator on samples drawn from distributions that violate alpha-Holder smoothness on the cube and measuring whether the observed additive error exceeds epsilon.

read the original abstract

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $\epsilon$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $\alpha$-H\"older smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $\epsilon$ error in $\epsilon^{-\max(2,\frac{d+1+o(1)}{1+\alpha})}$ time for $0 < \alpha < 1$ H\"older smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $\Theta(\epsilon^{-2})$ for $\alpha > 1/2$ when $d=2$ and nearly optimal as $\alpha \to 1$ when $d = 3$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Grid sketching under Holder smoothness gives a new runtime exponent for Wasserstein estimation that hits optimal rates in some low-d cases, but the bias control step needs close checking.

read the letter

Hi, the main takeaway is that this paper gets a runtime of epsilon to the minus max(2, (d+1+o(1))/(1+alpha)) for epsilon-error estimation of squared 2-Wasserstein distance between alpha-Holder smooth distributions on the unit cube. For d=2 and alpha over 1/2 it reaches the optimal quadratic rate, and it gets close to optimal for d=3 as alpha nears 1. That specific joint dependence on dimension and smoothness looks like the new quantitative piece not directly in prior abstracts they cite. They do a solid job framing the Sample-Sketch-Solve approach and showing how a regular Cartesian grid can compress samples while keeping the statistical error asymptotically the same, plus how the resulting structure speeds up the exact solver. This directly targets the practical scaling problem for Wasserstein computations in low dimensions like 2 and 3. The soft spot is exactly the one in the stress-test note: the whole exponent rests on the grid binning adding only o(epsilon) bias in expectation so the total error stays governed by the usual statistical rate. If the Holder condition only gives a bias that forces a finer grid or carries extra log factors, the claimed bound would slip. The abstract states the result cleanly but leaves the error analysis and any supporting experiments for the full text. This is aimed at researchers in statistical machine learning and computational optimal transport who work with smooth densities and need better runtime-statistical tradeoffs. A reader focused on efficient distance estimation in small dimensions would find the bounds useful. It deserves a serious referee because the claim is concrete and the approach is a reasonable combination of existing ideas with a new exponent. I would send it to peer review.

Referee Report

1 major / 1 minor

Summary. The paper introduces a Sample-Sketch-Solve framework for estimating the squared 2-Wasserstein distance W₂²(P, Q) between two α-Hölder smooth probability measures on the unit cube in d dimensions. Samples are sketched onto a regular Cartesian grid to compress the support while preserving the statistical error rate, after which an exact solver is applied to the resulting discrete measures. The central result is a computational-statistical runtime of ε^{-max(2, (d+1+o(1))/(1+α))} to achieve ε-additive error in expectation, which is optimal Θ(ε^{-2}) for α > 1/2 when d=2 and nearly optimal as α → 1 when d=3.

Significance. If the grid-sketch bias analysis is rigorous, the result would meaningfully improve the computational-statistical tradeoff for Wasserstein estimation in low dimensions by showing that Hölder smoothness permits aggressive compression without asymptotically inflating the statistical error, thereby enabling faster exact solvers on the reduced support.

major comments (1)

[Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.

minor comments (1)

[Introduction and Section 3] Notation for the sketched measure and the precise definition of the Cartesian grid spacing h should be introduced earlier and used consistently throughout the runtime analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the grid-sketch bias analysis as a central component of the argument. We address the major comment below and have revised the manuscript to make the relevant bounds fully explicit.

read point-by-point responses

Referee: [Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.

Authors: We agree that an explicit bias bound is necessary for the claimed runtime. In the revised manuscript we have added Theorem 3.4 and the surrounding lemmas in Section 3.3. Using the α-Hölder modulus of continuity of the densities, we prove that E[W₂²(μ_n, μ_n^h)] ≤ C(d,α) h^{2α} whenever the grid spacing is h. Setting h = Θ(ε^{1/(2α)}) makes the bias at most ε/2. Because the underlying measures are α-Hölder, the number of occupied grid cells after sketching is O((1/h)^d) with high probability; substituting the chosen h yields an effective support size whose contribution to the exact solver runtime is absorbed into the (d+1+o(1))/(1+α) term. The o(1) already accounts for the polylogarithmic factors arising from concentration and from the exact discrete solver; no additional logarithmic factors are introduced that would change the max(2,·) expression. The revised proof therefore confirms that the grid-sketch step preserves the target computational-statistical rate. revision: yes

Circularity Check

0 steps flagged

No significant circularity; runtime follows from grid-sketch bias analysis under Hölder smoothness

full rationale

The claimed runtime exponent is derived from the Sample-Sketch-Solve paradigm: a Cartesian grid of spacing chosen to keep discretization bias o(ε) under the α-Hölder assumption, followed by an exact solver on the compressed support. The abstract states that the sketch 'can compress the data without increasing asymptotic error' and 'regularizes the structure which enables faster exact algorithms,' with the final bound stated as a direct consequence for distributions on (0,1)^d. No equation or step is shown to equate the output runtime to a fitted parameter, a self-citation chain, or a renamed empirical pattern; the Hölder modulus supplies an independent bias bound that is driven below ε while controlling the number of occupied cells. The derivation therefore remains self-contained against the stated smoothness assumption and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the target distributions are α-Hölder smooth; the grid sketch is introduced as a new compression step whose error analysis is not detailed in the abstract.

axioms (1)

domain assumption Distributions P and Q are α-Hölder smooth on (0,1)^d
Invoked to ensure the Cartesian grid sketch does not increase asymptotic statistical error.

pith-pipeline@v0.9.0 · 5817 in / 1279 out tokens · 45686 ms · 2026-05-20T03:22:42.168428+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We approximate W₂²(P,Q) within ε error in ε^{-max(2,(d+1+o(1))/(1+α))} time for 0<α<1 Hölder smooth distributions... regular cartesian grid sketch... compress the data without increasing asymptotic error
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2.3... |W₂²(Gh#P,Gh#Q)-W₂²(P,Q)|≤K h^{1+α}... relies on... Collins and Tong regularity of Brenier map

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

Bernoulli , volume=

Estimation of wasserstein distances in the spiked transport model , author=. Bernoulli , volume=. 2022 , publisher=

work page 2022
[2]

Advances in Neural Information Processing Systems , volume=

Faster Wasserstein distance estimation with the Sinkhorn divergence , author=. Advances in Neural Information Processing Systems , volume=

work page
[3]

Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=

On bipartite matching under the RMS distance , author=. Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=. 2006 , organization=

work page 2006
[4]

Advances in Neural Information Processing Systems , volume=

A graph theoretic additive approximation of optimal transport , author=. Advances in Neural Information Processing Systems , volume=

work page
[5]

Approximating the Quadratic Transportation Metric in Near-Linear Time

Approximating the quadratic transportation metric in near-linear time , author=. arXiv preprint arXiv:1810.10046 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Advances in neural information processing systems , volume=

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , author=. Advances in neural information processing systems , volume=

work page
[7]

International conference on machine learning , pages=

Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[8]

International Conference on Machine Learning , pages=

On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[9]

Approximating optimal transport with linear programs

Approximating optimal transport with linear programs , author=. arXiv preprint arXiv:1810.05957 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Journal of the ACM (JACM) , volume=

Theoretical improvements in algorithmic efficiency for network flow problems , author=. Journal of the ACM (JACM) , volume=. 1972 , publisher=

work page 1972
[11]

Advances in Neural Information Processing Systems , volume=

A robust exact algorithm for the euclidean bipartite matching problem , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

Combinatorica , volume=

On optimal matchings , author=. Combinatorica , volume=. 1984 , publisher=

work page 1984
[13]

Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

An O (n 5/4) Time -Approximation Algorithm for RMS Matching in a Plane , author=. Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2021 , organization=

work page 2021
[14]

Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

Deterministic, near-linear ��-approximation algorithm for geometric bipartite matching , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

work page
[15]

Annual review of statistics and its application , volume=

Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

work page 2019
[16]

2024 IEEE International Symposium on Information Theory (ISIT) , pages=

An extension of McDiarmid's inequality , author=. 2024 IEEE International Symposium on Information Theory (ISIT) , pages=. 2024 , organization=

work page 2024
[17]

Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

Jacobs, Peter Matthew and Patel, Lekha and Bhattacharya, Anirban and Pati, Debdeep , journal=. Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

work page
[18]

2021 IEEE International Symposium on Information Theory (ISIT) , pages=

Two-sample test using projected wasserstein distance , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=

work page 2021
[19]

Bernoulli , volume=

Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , author=. Bernoulli , volume=. 2019 , publisher=

work page 2019
[20]

The Annals of Statistics , volume=

Minimax estimation of smooth densities in Wasserstein distance , author=. The Annals of Statistics , volume=. 2022 , publisher=

work page 2022
[21]

Statistical optimal transport,

Statistical optimal transport , author=. arXiv preprint arXiv:2407.18163 , volume=. 2024 , publisher=

work page arXiv 2024
[22]

Communications on pure and applied mathematics , volume=

Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on pure and applied mathematics , volume=. 1991 , publisher=

work page 1991
[23]

Journal of the American Mathematical Society , volume=

The regularity of mappings with a convex potential , author=. Journal of the American Mathematical Society , volume=. 1992 , publisher=

work page 1992
[24]

Communications on pure and applied mathematics , volume=

Boundary regularity of maps with convex potentials , author=. Communications on pure and applied mathematics , volume=. 1992 , publisher=

work page 1992
[25]

Annals of mathematics , volume=

Boundary regularity of maps with convex potentials--II , author=. Annals of mathematics , volume=. 1996 , publisher=

work page 1996
[26]

arXiv preprint arXiv:2507.05395 , year=

Boundary regularity of optimal transport maps on convex domains , author=. arXiv preprint arXiv:2507.05395 , year=

work page arXiv
[27]

SIAM Journal on Mathematical Analysis , volume=

Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=

work page 2011
[28]

Information and Inference: A Journal of the IMA , volume=

On parameter estimation with the Wasserstein distance , author=. Information and Inference: A Journal of the IMA , volume=. 2019 , publisher=

work page 2019
[29]

Entropy , volume=

On wasserstein two-sample testing and related families of nonparametric tests , author=. Entropy , volume=. 2017 , publisher=

work page 2017
[30]

Naval research logistics quarterly , volume=

Variants of the Hungarian method for assignment problems , author=. Naval research logistics quarterly , volume=. 1956 , publisher=

work page 1956
[31]

URL: https://https://web

Lecture notes on statistics and information theory , author=. URL: https://https://web. stanford. edu/class/stats311/lecture-notes. pdf , year=

work page
[32]

Akash Mittal , title =

work page
[33]

Computing Kantorovich-Wasserstein Distances on

Auricchio, Gennaro and Bassetti, Federico and Gualandi, Stefano and Veneroni, Marco , booktitle =. Computing Kantorovich-Wasserstein Distances on

work page
[34]

and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =

van den Brand, Jan and Chen, Li and Kyng, Rasmus and Liu, Yang P. and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =. A Deterministic Almost-Linear Time Algorithm for Minimum-Cost Flow , year =

work page
[35]

Introduction to Nonparametric Estimation , pages=

Nonparametric estimators , author=. Introduction to Nonparametric Estimation , pages=. 2008 , publisher=

work page 2008
[36]

Advances in Neural Information Processing Systems , volume=

Nonparametric density estimation under adversarial losses , author=. Advances in Neural Information Processing Systems , volume=

work page

[1] [1]

Bernoulli , volume=

Estimation of wasserstein distances in the spiked transport model , author=. Bernoulli , volume=. 2022 , publisher=

work page 2022

[2] [2]

Advances in Neural Information Processing Systems , volume=

Faster Wasserstein distance estimation with the Sinkhorn divergence , author=. Advances in Neural Information Processing Systems , volume=

work page

[3] [3]

Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=

On bipartite matching under the RMS distance , author=. Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=. 2006 , organization=

work page 2006

[4] [4]

Advances in Neural Information Processing Systems , volume=

A graph theoretic additive approximation of optimal transport , author=. Advances in Neural Information Processing Systems , volume=

work page

[5] [5]

Approximating the Quadratic Transportation Metric in Near-Linear Time

Approximating the quadratic transportation metric in near-linear time , author=. arXiv preprint arXiv:1810.10046 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Advances in neural information processing systems , volume=

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

International conference on machine learning , pages=

Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[8] [8]

International Conference on Machine Learning , pages=

On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019

[9] [9]

Approximating optimal transport with linear programs

Approximating optimal transport with linear programs , author=. arXiv preprint arXiv:1810.05957 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Journal of the ACM (JACM) , volume=

Theoretical improvements in algorithmic efficiency for network flow problems , author=. Journal of the ACM (JACM) , volume=. 1972 , publisher=

work page 1972

[11] [11]

Advances in Neural Information Processing Systems , volume=

A robust exact algorithm for the euclidean bipartite matching problem , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

Combinatorica , volume=

On optimal matchings , author=. Combinatorica , volume=. 1984 , publisher=

work page 1984

[13] [13]

Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

An O (n 5/4) Time -Approximation Algorithm for RMS Matching in a Plane , author=. Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2021 , organization=

work page 2021

[14] [14]

Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

Deterministic, near-linear ��-approximation algorithm for geometric bipartite matching , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

work page

[15] [15]

Annual review of statistics and its application , volume=

Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

work page 2019

[16] [16]

2024 IEEE International Symposium on Information Theory (ISIT) , pages=

An extension of McDiarmid's inequality , author=. 2024 IEEE International Symposium on Information Theory (ISIT) , pages=. 2024 , organization=

work page 2024

[17] [17]

Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

Jacobs, Peter Matthew and Patel, Lekha and Bhattacharya, Anirban and Pati, Debdeep , journal=. Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

work page

[18] [18]

2021 IEEE International Symposium on Information Theory (ISIT) , pages=

Two-sample test using projected wasserstein distance , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=

work page 2021

[19] [19]

Bernoulli , volume=

Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , author=. Bernoulli , volume=. 2019 , publisher=

work page 2019

[20] [20]

The Annals of Statistics , volume=

Minimax estimation of smooth densities in Wasserstein distance , author=. The Annals of Statistics , volume=. 2022 , publisher=

work page 2022

[21] [21]

Statistical optimal transport,

Statistical optimal transport , author=. arXiv preprint arXiv:2407.18163 , volume=. 2024 , publisher=

work page arXiv 2024

[22] [22]

Communications on pure and applied mathematics , volume=

Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on pure and applied mathematics , volume=. 1991 , publisher=

work page 1991

[23] [23]

Journal of the American Mathematical Society , volume=

The regularity of mappings with a convex potential , author=. Journal of the American Mathematical Society , volume=. 1992 , publisher=

work page 1992

[24] [24]

Communications on pure and applied mathematics , volume=

Boundary regularity of maps with convex potentials , author=. Communications on pure and applied mathematics , volume=. 1992 , publisher=

work page 1992

[25] [25]

Annals of mathematics , volume=

Boundary regularity of maps with convex potentials--II , author=. Annals of mathematics , volume=. 1996 , publisher=

work page 1996

[26] [26]

arXiv preprint arXiv:2507.05395 , year=

Boundary regularity of optimal transport maps on convex domains , author=. arXiv preprint arXiv:2507.05395 , year=

work page arXiv

[27] [27]

SIAM Journal on Mathematical Analysis , volume=

Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=

work page 2011

[28] [28]

Information and Inference: A Journal of the IMA , volume=

On parameter estimation with the Wasserstein distance , author=. Information and Inference: A Journal of the IMA , volume=. 2019 , publisher=

work page 2019

[29] [29]

Entropy , volume=

On wasserstein two-sample testing and related families of nonparametric tests , author=. Entropy , volume=. 2017 , publisher=

work page 2017

[30] [30]

Naval research logistics quarterly , volume=

Variants of the Hungarian method for assignment problems , author=. Naval research logistics quarterly , volume=. 1956 , publisher=

work page 1956

[31] [31]

URL: https://https://web

Lecture notes on statistics and information theory , author=. URL: https://https://web. stanford. edu/class/stats311/lecture-notes. pdf , year=

work page

[32] [32]

Akash Mittal , title =

work page

[33] [33]

Computing Kantorovich-Wasserstein Distances on

Auricchio, Gennaro and Bassetti, Federico and Gualandi, Stefano and Veneroni, Marco , booktitle =. Computing Kantorovich-Wasserstein Distances on

work page

[34] [34]

and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =

van den Brand, Jan and Chen, Li and Kyng, Rasmus and Liu, Yang P. and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =. A Deterministic Almost-Linear Time Algorithm for Minimum-Cost Flow , year =

work page

[35] [35]

Introduction to Nonparametric Estimation , pages=

Nonparametric estimators , author=. Introduction to Nonparametric Estimation , pages=. 2008 , publisher=

work page 2008

[36] [36]

Advances in Neural Information Processing Systems , volume=

Nonparametric density estimation under adversarial losses , author=. Advances in Neural Information Processing Systems , volume=

work page