Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation
Pith reviewed 2026-05-20 03:22 UTC · model grok-4.3
The pith
A regular cartesian grid sketch of samples estimates squared Wasserstein distance to additive epsilon error in time epsilon to the power of minus max of 2 and (d plus 1 over 1 plus alpha) for alpha-Holder smooth distributions on the unit d-
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By inserting a regular cartesian grid sketch into the estimation pipeline, the squared Wasserstein distance W two squared between alpha-Holder smooth measures P and Q supported on the unit cube can be approximated to additive error epsilon in expectation using a total computational-statistical runtime of epsilon to the power of minus max of 2 and (d plus 1 plus o of 1) over (1 plus alpha).
What carries the argument
The regular cartesian grid sketch of the empirical samples, which simultaneously compresses the data without asymptotically inflating statistical error and imposes a regular structure that permits faster exact Wasserstein solvers.
If this is right
- When d equals 2 and alpha exceeds one half the runtime is statistically optimal at Theta of epsilon to the minus 2.
- When d equals 3 and alpha approaches 1 the runtime becomes nearly optimal.
- The sketching step costs linear time in the number of samples and can be performed with constant work per sample.
- Exact Wasserstein algorithms can now be applied directly to the much smaller sketched measures.
Where Pith is reading between the lines
- The same grid compression idea may extend to other integral probability metrics whose cost functions admit similar regularity.
- Adaptive or non-uniform grids could be substituted to reduce the hidden constants while preserving the asymptotic scaling.
- The approach supplies a concrete way to balance sample size against per-sample computation when Wasserstein estimation is embedded inside larger learning pipelines.
Load-bearing premise
The underlying distributions must be alpha-Holder smooth on the unit cube so that the grid sketch does not increase the statistical error beyond the target epsilon.
What would settle it
Running the sketched estimator on samples drawn from distributions that violate alpha-Holder smoothness on the cube and measuring whether the observed additive error exceeds epsilon.
read the original abstract
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $\epsilon$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $\alpha$-H\"older smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $\epsilon$ error in $\epsilon^{-\max(2,\frac{d+1+o(1)}{1+\alpha})}$ time for $0 < \alpha < 1$ H\"older smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $\Theta(\epsilon^{-2})$ for $\alpha > 1/2$ when $d=2$ and nearly optimal as $\alpha \to 1$ when $d = 3$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a Sample-Sketch-Solve framework for estimating the squared 2-Wasserstein distance W₂²(P, Q) between two α-Hölder smooth probability measures on the unit cube in d dimensions. Samples are sketched onto a regular Cartesian grid to compress the support while preserving the statistical error rate, after which an exact solver is applied to the resulting discrete measures. The central result is a computational-statistical runtime of ε^{-max(2, (d+1+o(1))/(1+α))} to achieve ε-additive error in expectation, which is optimal Θ(ε^{-2}) for α > 1/2 when d=2 and nearly optimal as α → 1 when d=3.
Significance. If the grid-sketch bias analysis is rigorous, the result would meaningfully improve the computational-statistical tradeoff for Wasserstein estimation in low dimensions by showing that Hölder smoothness permits aggressive compression without asymptotically inflating the statistical error, thereby enabling faster exact solvers on the reduced support.
major comments (1)
- [Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.
minor comments (1)
- [Introduction and Section 3] Notation for the sketched measure and the precise definition of the Cartesian grid spacing h should be introduced earlier and used consistently throughout the runtime analysis.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying the grid-sketch bias analysis as a central component of the argument. We address the major comment below and have revised the manuscript to make the relevant bounds fully explicit.
read point-by-point responses
-
Referee: [Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.
Authors: We agree that an explicit bias bound is necessary for the claimed runtime. In the revised manuscript we have added Theorem 3.4 and the surrounding lemmas in Section 3.3. Using the α-Hölder modulus of continuity of the densities, we prove that E[W₂²(μ_n, μ_n^h)] ≤ C(d,α) h^{2α} whenever the grid spacing is h. Setting h = Θ(ε^{1/(2α)}) makes the bias at most ε/2. Because the underlying measures are α-Hölder, the number of occupied grid cells after sketching is O((1/h)^d) with high probability; substituting the chosen h yields an effective support size whose contribution to the exact solver runtime is absorbed into the (d+1+o(1))/(1+α) term. The o(1) already accounts for the polylogarithmic factors arising from concentration and from the exact discrete solver; no additional logarithmic factors are introduced that would change the max(2,·) expression. The revised proof therefore confirms that the grid-sketch step preserves the target computational-statistical rate. revision: yes
Circularity Check
No significant circularity; runtime follows from grid-sketch bias analysis under Hölder smoothness
full rationale
The claimed runtime exponent is derived from the Sample-Sketch-Solve paradigm: a Cartesian grid of spacing chosen to keep discretization bias o(ε) under the α-Hölder assumption, followed by an exact solver on the compressed support. The abstract states that the sketch 'can compress the data without increasing asymptotic error' and 'regularizes the structure which enables faster exact algorithms,' with the final bound stated as a direct consequence for distributions on (0,1)^d. No equation or step is shown to equate the output runtime to a fitted parameter, a self-citation chain, or a renamed empirical pattern; the Hölder modulus supplies an independent bias bound that is driven below ε while controlling the number of occupied cells. The derivation therefore remains self-contained against the stated smoothness assumption and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Distributions P and Q are α-Hölder smooth on (0,1)^d
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We approximate W₂²(P,Q) within ε error in ε^{-max(2,(d+1+o(1))/(1+α))} time for 0<α<1 Hölder smooth distributions... regular cartesian grid sketch... compress the data without increasing asymptotic error
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2.3... |W₂²(Gh#P,Gh#Q)-W₂²(P,Q)|≤K h^{1+α}... relies on... Collins and Tong regularity of Brenier map
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Estimation of wasserstein distances in the spiked transport model , author=. Bernoulli , volume=. 2022 , publisher=
work page 2022
-
[2]
Advances in Neural Information Processing Systems , volume=
Faster Wasserstein distance estimation with the Sinkhorn divergence , author=. Advances in Neural Information Processing Systems , volume=
-
[3]
On bipartite matching under the RMS distance , author=. Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=. 2006 , organization=
work page 2006
-
[4]
Advances in Neural Information Processing Systems , volume=
A graph theoretic additive approximation of optimal transport , author=. Advances in Neural Information Processing Systems , volume=
-
[5]
Approximating the Quadratic Transportation Metric in Near-Linear Time
Approximating the quadratic transportation metric in near-linear time , author=. arXiv preprint arXiv:1810.10046 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Advances in neural information processing systems , volume=
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , author=. Advances in neural information processing systems , volume=
-
[7]
International conference on machine learning , pages=
Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm , author=. International conference on machine learning , pages=. 2018 , organization=
work page 2018
-
[8]
International Conference on Machine Learning , pages=
On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms , author=. International Conference on Machine Learning , pages=. 2019 , organization=
work page 2019
-
[9]
Approximating optimal transport with linear programs
Approximating optimal transport with linear programs , author=. arXiv preprint arXiv:1810.05957 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Journal of the ACM (JACM) , volume=
Theoretical improvements in algorithmic efficiency for network flow problems , author=. Journal of the ACM (JACM) , volume=. 1972 , publisher=
work page 1972
-
[11]
Advances in Neural Information Processing Systems , volume=
A robust exact algorithm for the euclidean bipartite matching problem , author=. Advances in Neural Information Processing Systems , volume=
-
[12]
On optimal matchings , author=. Combinatorica , volume=. 1984 , publisher=
work page 1984
-
[13]
Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=
An O (n 5/4) Time -Approximation Algorithm for RMS Matching in a Plane , author=. Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2021 , organization=
work page 2021
-
[14]
Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=
Deterministic, near-linear ��-approximation algorithm for geometric bipartite matching , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=
-
[15]
Annual review of statistics and its application , volume=
Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=
work page 2019
-
[16]
2024 IEEE International Symposium on Information Theory (ISIT) , pages=
An extension of McDiarmid's inequality , author=. 2024 IEEE International Symposium on Information Theory (ISIT) , pages=. 2024 , organization=
work page 2024
-
[17]
Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^
Jacobs, Peter Matthew and Patel, Lekha and Bhattacharya, Anirban and Pati, Debdeep , journal=. Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^
-
[18]
2021 IEEE International Symposium on Information Theory (ISIT) , pages=
Two-sample test using projected wasserstein distance , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=
work page 2021
-
[19]
Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , author=. Bernoulli , volume=. 2019 , publisher=
work page 2019
-
[20]
The Annals of Statistics , volume=
Minimax estimation of smooth densities in Wasserstein distance , author=. The Annals of Statistics , volume=. 2022 , publisher=
work page 2022
-
[21]
Statistical optimal transport,
Statistical optimal transport , author=. arXiv preprint arXiv:2407.18163 , volume=. 2024 , publisher=
-
[22]
Communications on pure and applied mathematics , volume=
Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on pure and applied mathematics , volume=. 1991 , publisher=
work page 1991
-
[23]
Journal of the American Mathematical Society , volume=
The regularity of mappings with a convex potential , author=. Journal of the American Mathematical Society , volume=. 1992 , publisher=
work page 1992
-
[24]
Communications on pure and applied mathematics , volume=
Boundary regularity of maps with convex potentials , author=. Communications on pure and applied mathematics , volume=. 1992 , publisher=
work page 1992
-
[25]
Annals of mathematics , volume=
Boundary regularity of maps with convex potentials--II , author=. Annals of mathematics , volume=. 1996 , publisher=
work page 1996
-
[26]
arXiv preprint arXiv:2507.05395 , year=
Boundary regularity of optimal transport maps on convex domains , author=. arXiv preprint arXiv:2507.05395 , year=
-
[27]
SIAM Journal on Mathematical Analysis , volume=
Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=
work page 2011
-
[28]
Information and Inference: A Journal of the IMA , volume=
On parameter estimation with the Wasserstein distance , author=. Information and Inference: A Journal of the IMA , volume=. 2019 , publisher=
work page 2019
-
[29]
On wasserstein two-sample testing and related families of nonparametric tests , author=. Entropy , volume=. 2017 , publisher=
work page 2017
-
[30]
Naval research logistics quarterly , volume=
Variants of the Hungarian method for assignment problems , author=. Naval research logistics quarterly , volume=. 1956 , publisher=
work page 1956
-
[31]
Lecture notes on statistics and information theory , author=. URL: https://https://web. stanford. edu/class/stats311/lecture-notes. pdf , year=
-
[32]
Akash Mittal , title =
-
[33]
Computing Kantorovich-Wasserstein Distances on
Auricchio, Gennaro and Bassetti, Federico and Gualandi, Stefano and Veneroni, Marco , booktitle =. Computing Kantorovich-Wasserstein Distances on
-
[34]
van den Brand, Jan and Chen, Li and Kyng, Rasmus and Liu, Yang P. and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =. A Deterministic Almost-Linear Time Algorithm for Minimum-Cost Flow , year =
-
[35]
Introduction to Nonparametric Estimation , pages=
Nonparametric estimators , author=. Introduction to Nonparametric Estimation , pages=. 2008 , publisher=
work page 2008
-
[36]
Advances in Neural Information Processing Systems , volume=
Nonparametric density estimation under adversarial losses , author=. Advances in Neural Information Processing Systems , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.