pith. sign in

arxiv: 2605.20122 · v1 · pith:NNS7UZL4new · submitted 2026-05-19 · 📊 stat.ML · cs.CC· cs.LG

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Pith reviewed 2026-05-20 03:22 UTC · model grok-4.3

classification 📊 stat.ML cs.CCcs.LG
keywords Wasserstein distanceHolder smooth distributionsgrid sketchingcomputational statisticsoptimal transportsample complexitydistribution estimation
0
0 comments X

The pith

A regular cartesian grid sketch of samples estimates squared Wasserstein distance to additive epsilon error in time epsilon to the power of minus max of 2 and (d plus 1 over 1 plus alpha) for alpha-Holder smooth distributions on the unit d-

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Sample-Sketch-Solve method that first projects samples from two distributions onto a regular cartesian grid. For alpha-Holder smooth distributions this projection compresses the input while preserving the statistical error rate of the Wasserstein estimator. The resulting grid structure then admits faster exact optimal-transport algorithms. The combined procedure therefore achieves an overall runtime of epsilon to the power minus max of 2 and (d plus 1 plus o of 1) over (1 plus alpha) when the distributions are alpha-Holder smooth on the unit cube. In two dimensions with alpha greater than one half the runtime becomes the statistically optimal Theta of epsilon to the minus 2.

Core claim

By inserting a regular cartesian grid sketch into the estimation pipeline, the squared Wasserstein distance W two squared between alpha-Holder smooth measures P and Q supported on the unit cube can be approximated to additive error epsilon in expectation using a total computational-statistical runtime of epsilon to the power of minus max of 2 and (d plus 1 plus o of 1) over (1 plus alpha).

What carries the argument

The regular cartesian grid sketch of the empirical samples, which simultaneously compresses the data without asymptotically inflating statistical error and imposes a regular structure that permits faster exact Wasserstein solvers.

If this is right

  • When d equals 2 and alpha exceeds one half the runtime is statistically optimal at Theta of epsilon to the minus 2.
  • When d equals 3 and alpha approaches 1 the runtime becomes nearly optimal.
  • The sketching step costs linear time in the number of samples and can be performed with constant work per sample.
  • Exact Wasserstein algorithms can now be applied directly to the much smaller sketched measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same grid compression idea may extend to other integral probability metrics whose cost functions admit similar regularity.
  • Adaptive or non-uniform grids could be substituted to reduce the hidden constants while preserving the asymptotic scaling.
  • The approach supplies a concrete way to balance sample size against per-sample computation when Wasserstein estimation is embedded inside larger learning pipelines.

Load-bearing premise

The underlying distributions must be alpha-Holder smooth on the unit cube so that the grid sketch does not increase the statistical error beyond the target epsilon.

What would settle it

Running the sketched estimator on samples drawn from distributions that violate alpha-Holder smoothness on the cube and measuring whether the observed additive error exceeds epsilon.

read the original abstract

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $\epsilon$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $\alpha$-H\"older smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $\epsilon$ error in $\epsilon^{-\max(2,\frac{d+1+o(1)}{1+\alpha})}$ time for $0 < \alpha < 1$ H\"older smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $\Theta(\epsilon^{-2})$ for $\alpha > 1/2$ when $d=2$ and nearly optimal as $\alpha \to 1$ when $d = 3$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces a Sample-Sketch-Solve framework for estimating the squared 2-Wasserstein distance W₂²(P, Q) between two α-Hölder smooth probability measures on the unit cube in d dimensions. Samples are sketched onto a regular Cartesian grid to compress the support while preserving the statistical error rate, after which an exact solver is applied to the resulting discrete measures. The central result is a computational-statistical runtime of ε^{-max(2, (d+1+o(1))/(1+α))} to achieve ε-additive error in expectation, which is optimal Θ(ε^{-2}) for α > 1/2 when d=2 and nearly optimal as α → 1 when d=3.

Significance. If the grid-sketch bias analysis is rigorous, the result would meaningfully improve the computational-statistical tradeoff for Wasserstein estimation in low dimensions by showing that Hölder smoothness permits aggressive compression without asymptotically inflating the statistical error, thereby enabling faster exact solvers on the reduced support.

major comments (1)
  1. [Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.
minor comments (1)
  1. [Introduction and Section 3] Notation for the sketched measure and the precise definition of the Cartesian grid spacing h should be introduced earlier and used consistently throughout the runtime analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the grid-sketch bias analysis as a central component of the argument. We address the major comment below and have revised the manuscript to make the relevant bounds fully explicit.

read point-by-point responses
  1. Referee: [Sample-Sketch-Solve paradigm and grid bias analysis] The grid-sketch bias control (the step that must ensure E[W₂²(μ_n, μ_n^h)] = o(ε) for the sketched empirical measure μ_n^h) is load-bearing for the claimed exponent. The manuscript must derive an explicit bound on this bias using the α-Hölder modulus of continuity, show that the required grid spacing h(ε, α) keeps the number of occupied cells compatible with the subsequent solver runtime, and confirm that no extra logarithmic factors appear that would alter the max(2, (d+1+o(1))/(1+α)) expression.

    Authors: We agree that an explicit bias bound is necessary for the claimed runtime. In the revised manuscript we have added Theorem 3.4 and the surrounding lemmas in Section 3.3. Using the α-Hölder modulus of continuity of the densities, we prove that E[W₂²(μ_n, μ_n^h)] ≤ C(d,α) h^{2α} whenever the grid spacing is h. Setting h = Θ(ε^{1/(2α)}) makes the bias at most ε/2. Because the underlying measures are α-Hölder, the number of occupied grid cells after sketching is O((1/h)^d) with high probability; substituting the chosen h yields an effective support size whose contribution to the exact solver runtime is absorbed into the (d+1+o(1))/(1+α) term. The o(1) already accounts for the polylogarithmic factors arising from concentration and from the exact discrete solver; no additional logarithmic factors are introduced that would change the max(2,·) expression. The revised proof therefore confirms that the grid-sketch step preserves the target computational-statistical rate. revision: yes

Circularity Check

0 steps flagged

No significant circularity; runtime follows from grid-sketch bias analysis under Hölder smoothness

full rationale

The claimed runtime exponent is derived from the Sample-Sketch-Solve paradigm: a Cartesian grid of spacing chosen to keep discretization bias o(ε) under the α-Hölder assumption, followed by an exact solver on the compressed support. The abstract states that the sketch 'can compress the data without increasing asymptotic error' and 'regularizes the structure which enables faster exact algorithms,' with the final bound stated as a direct consequence for distributions on (0,1)^d. No equation or step is shown to equate the output runtime to a fitted parameter, a self-citation chain, or a renamed empirical pattern; the Hölder modulus supplies an independent bias bound that is driven below ε while controlling the number of occupied cells. The derivation therefore remains self-contained against the stated smoothness assumption and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the target distributions are α-Hölder smooth; the grid sketch is introduced as a new compression step whose error analysis is not detailed in the abstract.

axioms (1)
  • domain assumption Distributions P and Q are α-Hölder smooth on (0,1)^d
    Invoked to ensure the Cartesian grid sketch does not increase asymptotic statistical error.

pith-pipeline@v0.9.0 · 5817 in / 1279 out tokens · 45686 ms · 2026-05-20T03:22:42.168428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

  1. [1]

    Bernoulli , volume=

    Estimation of wasserstein distances in the spiked transport model , author=. Bernoulli , volume=. 2022 , publisher=

  2. [2]

    Advances in Neural Information Processing Systems , volume=

    Faster Wasserstein distance estimation with the Sinkhorn divergence , author=. Advances in Neural Information Processing Systems , volume=

  3. [3]

    Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=

    On bipartite matching under the RMS distance , author=. Proceedings of the Eighteenth Canadian Conference on Computational Geometry’, Kingston, Canada , pages=. 2006 , organization=

  4. [4]

    Advances in Neural Information Processing Systems , volume=

    A graph theoretic additive approximation of optimal transport , author=. Advances in Neural Information Processing Systems , volume=

  5. [5]

    Approximating the Quadratic Transportation Metric in Near-Linear Time

    Approximating the quadratic transportation metric in near-linear time , author=. arXiv preprint arXiv:1810.10046 , year=

  6. [6]

    Advances in neural information processing systems , volume=

    Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , author=. Advances in neural information processing systems , volume=

  7. [7]

    International conference on machine learning , pages=

    Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm , author=. International conference on machine learning , pages=. 2018 , organization=

  8. [8]

    International Conference on Machine Learning , pages=

    On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms , author=. International Conference on Machine Learning , pages=. 2019 , organization=

  9. [9]

    Approximating optimal transport with linear programs

    Approximating optimal transport with linear programs , author=. arXiv preprint arXiv:1810.05957 , year=

  10. [10]

    Journal of the ACM (JACM) , volume=

    Theoretical improvements in algorithmic efficiency for network flow problems , author=. Journal of the ACM (JACM) , volume=. 1972 , publisher=

  11. [11]

    Advances in Neural Information Processing Systems , volume=

    A robust exact algorithm for the euclidean bipartite matching problem , author=. Advances in Neural Information Processing Systems , volume=

  12. [12]

    Combinatorica , volume=

    On optimal matchings , author=. Combinatorica , volume=. 1984 , publisher=

  13. [13]

    Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

    An O (n 5/4) Time -Approximation Algorithm for RMS Matching in a Plane , author=. Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2021 , organization=

  14. [14]

    Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

    Deterministic, near-linear ��-approximation algorithm for geometric bipartite matching , author=. Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing , pages=

  15. [15]

    Annual review of statistics and its application , volume=

    Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

  16. [16]

    2024 IEEE International Symposium on Information Theory (ISIT) , pages=

    An extension of McDiarmid's inequality , author=. 2024 IEEE International Symposium on Information Theory (ISIT) , pages=. 2024 , organization=

  17. [17]

    Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

    Jacobs, Peter Matthew and Patel, Lekha and Bhattacharya, Anirban and Pati, Debdeep , journal=. Minimax Posterior Contraction Rates for Unconstrained Distribution Estimation on [0, 1]\^

  18. [18]

    2021 IEEE International Symposium on Information Theory (ISIT) , pages=

    Two-sample test using projected wasserstein distance , author=. 2021 IEEE International Symposium on Information Theory (ISIT) , pages=. 2021 , organization=

  19. [19]

    Bernoulli , volume=

    Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , author=. Bernoulli , volume=. 2019 , publisher=

  20. [20]

    The Annals of Statistics , volume=

    Minimax estimation of smooth densities in Wasserstein distance , author=. The Annals of Statistics , volume=. 2022 , publisher=

  21. [21]

    Statistical optimal transport,

    Statistical optimal transport , author=. arXiv preprint arXiv:2407.18163 , volume=. 2024 , publisher=

  22. [22]

    Communications on pure and applied mathematics , volume=

    Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on pure and applied mathematics , volume=. 1991 , publisher=

  23. [23]

    Journal of the American Mathematical Society , volume=

    The regularity of mappings with a convex potential , author=. Journal of the American Mathematical Society , volume=. 1992 , publisher=

  24. [24]

    Communications on pure and applied mathematics , volume=

    Boundary regularity of maps with convex potentials , author=. Communications on pure and applied mathematics , volume=. 1992 , publisher=

  25. [25]

    Annals of mathematics , volume=

    Boundary regularity of maps with convex potentials--II , author=. Annals of mathematics , volume=. 1996 , publisher=

  26. [26]

    arXiv preprint arXiv:2507.05395 , year=

    Boundary regularity of optimal transport maps on convex domains , author=. arXiv preprint arXiv:2507.05395 , year=

  27. [27]

    SIAM Journal on Mathematical Analysis , volume=

    Barycenters in the Wasserstein space , author=. SIAM Journal on Mathematical Analysis , volume=. 2011 , publisher=

  28. [28]

    Information and Inference: A Journal of the IMA , volume=

    On parameter estimation with the Wasserstein distance , author=. Information and Inference: A Journal of the IMA , volume=. 2019 , publisher=

  29. [29]

    Entropy , volume=

    On wasserstein two-sample testing and related families of nonparametric tests , author=. Entropy , volume=. 2017 , publisher=

  30. [30]

    Naval research logistics quarterly , volume=

    Variants of the Hungarian method for assignment problems , author=. Naval research logistics quarterly , volume=. 1956 , publisher=

  31. [31]

    URL: https://https://web

    Lecture notes on statistics and information theory , author=. URL: https://https://web. stanford. edu/class/stats311/lecture-notes. pdf , year=

  32. [32]

    Akash Mittal , title =

  33. [33]

    Computing Kantorovich-Wasserstein Distances on

    Auricchio, Gennaro and Bassetti, Federico and Gualandi, Stefano and Veneroni, Marco , booktitle =. Computing Kantorovich-Wasserstein Distances on

  34. [34]

    and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =

    van den Brand, Jan and Chen, Li and Kyng, Rasmus and Liu, Yang P. and Peng, Richard and Probst Gutenberg, Maximilian and Sachdeva, Sushant and Sidford, Aaron , journal =. A Deterministic Almost-Linear Time Algorithm for Minimum-Cost Flow , year =

  35. [35]

    Introduction to Nonparametric Estimation , pages=

    Nonparametric estimators , author=. Introduction to Nonparametric Estimation , pages=. 2008 , publisher=

  36. [36]

    Advances in Neural Information Processing Systems , volume=

    Nonparametric density estimation under adversarial losses , author=. Advances in Neural Information Processing Systems , volume=