Finite-sample bounds for regularized optimal transport
Pith reviewed 2026-06-25 19:14 UTC · model grok-4.3
The pith
Quadratically regularized optimal transport estimates the unregularized cost at rate n to the power minus 2 over d plus 4.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For general convex regularizations the regularized empirical optimal transport cost admits bias and variance bounds that depend explicitly on the regularization parameter and the intrinsic dimension d of the marginals; when the cost is quadratic, quadratic regularization therefore estimates the unregularized optimal transport cost at rate n^{-2/(d+4)}, which is the fastest non-asymptotic rate known for any regularized estimator.
What carries the argument
Non-asymptotic bias and variance bounds on the empirical regularized optimal transport cost, with explicit dependence on regularization parameter and intrinsic dimension.
If this is right
- The same bounds apply simultaneously to Kullback-Leibler and to L^p regularizations with 1<p<infty.
- The results improve all prior non-asymptotic guarantees for entropic optimal transport.
- They supply the first explicit quantitative finite-sample statements for L^p regularization.
- The derived rate n^{-2/(d+4)} is the fastest non-asymptotic rate currently available for any estimator based on regularized optimal transport.
Where Pith is reading between the lines
- The explicit dimension dependence suggests that the method remains useful when data lie on low-dimensional manifolds even if ambient dimension is large.
- The general convex-regularization framework could be used to compare the statistical efficiency of different regularizers for a fixed computational budget.
- The bias-variance decomposition may guide adaptive choice of the regularization parameter from data.
Load-bearing premise
The marginal distributions have a well-defined finite intrinsic dimension d and the regularization is a general convex function.
What would settle it
A numerical example with known finite d and quadratic cost in which the estimation error for the unregularized cost decays slower than n^{-2/(d+4)} would falsify the rate claim.
read the original abstract
We study the sample complexity of regularized optimal transport for general convex regularizations including the Kullback--Leibler divergence and $L^p$ penalties. Our main results are non-asymptotic bias and variance bounds for the empirical cost, with explicit dependence on the regularization parameter and on the intrinsic dimension of the marginals. Our approach simultaneously improves, unifies, and extends existing finite-sample bounds. In particular, we improve the state of the art for entropic optimal transport, and we obtain the first fully quantitative results for $L^p$ regularization with $1<p<\infty$. For the quadratic transport cost, we deduce that quadratically regularized optimal transport (i.e., $L^2$ regularization) estimates the unregularized optimal transport cost at rate $n^{-2/(d+4)}$, the fastest non-asymptotic rate currently available for any estimator based on regularized optimal transport.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives non-asymptotic upper bounds on the bias and variance of the empirical regularized optimal transport cost for general convex regularizers (including KL and L^p penalties). The bounds are stated with explicit dependence on the regularization strength λ and the intrinsic dimension d of the marginals. Special cases recover improved rates for entropic OT and yield the first fully quantitative guarantees for 1 < p < ∞. For quadratic cost with L^2 regularization the paper obtains the rate n^{-2/(d+4)} by balancing bias and variance terms, claimed to be the fastest non-asymptotic rate among regularized OT estimators.
Significance. If the derivations are correct, the work unifies and strengthens finite-sample theory for regularized OT by controlling variance through the entropy of the function class induced by the regularizer. The explicit dimension dependence and the specialized n^{-2/(d+4)} rate for quadratic regularization constitute a concrete advance over prior non-asymptotic results. The paper's approach of balancing an explicit bias term against a variance term controlled by entropy is a methodological strength that yields parameter-dependent rates without hidden uniformity assumptions.
minor comments (3)
- [§1] §1 (Introduction): the claim that the n^{-2/(d+4)} rate is 'the fastest non-asymptotic rate currently available' would benefit from an explicit side-by-side comparison (in a table or paragraph) with the best previously published rates for entropic and other regularized estimators.
- [Main theorems] Theorem statements (e.g., the general bias-variance bound): the dependence on the intrinsic dimension d is stated via covering numbers or entropy integrals; a short remark clarifying whether d is the ambient or the manifold dimension would prevent misinterpretation.
- [Notation] Notation section: the definition of the regularized empirical cost uses both λ and the regularizer φ; a single displayed equation collecting all symbols would improve readability for readers tracking the bias-variance tradeoff.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The report does not list any specific major comments.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives non-asymptotic bias-variance bounds for the empirical regularized OT cost under finite intrinsic dimension d and general convex regularizers. The rate n^{-2/(d+4)} for quadratic cost with L^2 regularization follows from explicitly balancing the regularization bias term against the variance term whose dimension dependence enters via entropy of the induced function class. This balancing uses only the stated assumptions and contains no reduction of a claimed prediction to a fitted input, no self-definitional steps, and no load-bearing self-citations. The work is a direct theoretical derivation with explicit quantitative dependence on the premises; the central claim does not collapse to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The regularization is a general convex function.
- domain assumption The marginal distributions have finite intrinsic dimension d.
Reference graph
Works this paper leans on
-
[1]
S. Balakrishnan, T. Manole, and L. Wasserman. Statistical inference for optimal transport maps: Recent advances and perspectives.arXiv:2506.19025, 2025
arXiv 2025
-
[2]
Bayraktar, S
E. Bayraktar, S. Eckstein, and X. Zhang. Stability and sample complexity of divergence regu- larized optimal transport.Bernoulli, 31(1):213–239, 2025
2025
-
[3]
Bigot, E
J. Bigot, E. Cazelles, and N. Papadakis. Central limit theorems for entropy-regularized op- timal transport on finite spaces and statistical applications.Electronic Journal of Statistics, 13(2):5120–5150, 2019
2019
-
[4]
Blondel, V
M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. volume 84 of Proceedings of Machine Learning Research, pages 880–889, 2018
2018
-
[5]
Boucheron, G
S. Boucheron, G. Lugosi, and P. Massart.Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Feb. 2013
2013
-
[6]
G. Carlier. On the linear convergence of the multimarginal Sinkhorn algorithm.SIAM J. Optim., 32(2):786–794, 2022
2022
-
[7]
Chewi, J
S. Chewi, J. Niles-Weed, and P. Rigollet.Statistical Optimal Transport. Springer, 2025
2025
-
[8]
Chizat, A
L. Chizat, A. Delalande, and T. Vaškevičius. Sharper exponential convergence rates for Sinkhorn’s algorithm in continuous settings.Math. Program., 215(1-2):809–858, 2026
2026
-
[9]
M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors,Advances in Neural Infor- mation Processing Systems, volume 26. Curran Associates, Inc., 2013
2013
-
[10]
del Barrio, A
E. del Barrio, A. González-Sanz, and J.-M. Loubes. Central limit theorems for general trans- portation costs.Ann. Inst. Henri Poincaré Probab. Stat., 60(2):847–873, 2024
2024
-
[11]
del Barrio, A
E. del Barrio, A. González-Sanz, and J.-M. Loubes. Central limit theorems for semi-discrete Wasserstein distances.Bernoulli, 30(1):554–580, 2024
2024
-
[12]
del Barrio, A
E. del Barrio, A. González-Sanz, J.-M. Loubes, and J. Niles-Weed. An improved central limit theorem and fast convergence rates for entropic transportation costs.SIAM Journal on Mathe- matics of Data Science, 5(3):639–669, 2023
2023
-
[13]
E. del Barrio, A. González-Sanz, J.-M. Loubes, and D. Rodríguez-Vítores. Distributional limit theory for optimal transport.arXiv:2505.19104, 2025
arXiv 2025
-
[14]
del Barrio and J.-M
E. del Barrio and J.-M. Loubes. Central limit theorems for empirical transportation cost in general dimension.Ann. Probab., 47(2):926–951, 2019
2019
-
[15]
R. M. Dudley. The speed of mean Glivenko-Cantelli convergence.Ann. Math. Statist., 40:40–50, 1968
1968
-
[16]
Eckstein and M
S. Eckstein and M. Nutz. Convergence rates for regularized optimal transport via quantization. Math. Oper. Res., 49(2):1223–1240, 2024. 20
2024
-
[17]
Essid and J
M. Essid and J. Solomon. Quadratically regularized optimal transport on graphs.SIAM J. Sci. Comput., 40(4):A1961–A1986, 2018
2018
-
[18]
Fournier and A
N. Fournier and A. Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probab. Theory Related Fields, 162(3-4):707–738, 2015
2015
-
[19]
Franklin and J
J. Franklin and J. Lorenz. On the scaling of multidimensional matrices.Linear Algebra Appl., 114/115:717–735, 1989
1989
-
[20]
García Trillos, M
N. García Trillos, M. Gerlach, M. Hein, and D. Slepčev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator. Found. Comput. Math., 20(4):827–887, 2020
2020
-
[21]
García Trillos and D
N. García Trillos and D. Slepčev. On the rate of convergence of empirical measures in∞- transportation distance.Canad. J. Math., 67(6):1358–1383, 2015
2015
-
[22]
Genevay, L
A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré. Sample complexity of Sinkhorn divergences. In K. Chaudhuri and M. Sugiyama, editors,Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 1574–1583. PMLR, 16–18 Apr 2019
2019
-
[23]
Goldfeld, K
Z. Goldfeld, K. Kato, G. Rioux, and R. Sadhu. Limit theorems for entropic optimal transport maps and Sinkhorn divergence.Electron. J. Statist., 18:980–1041, 2024
2024
-
[24]
Goldfeld, K
Z. Goldfeld, K. Kato, G. Rioux, and R. Sadhu. Statistical inference with regularized optimal transport.Inf. Inference J. IMA, 13(1):iaad056, 2024
2024
-
[25]
A. González-Sanz, E. del Barrio, and M. Nutz. Sample complexity of quadratically regularized optimal transport.arXiv:2511.09807, 2025
arXiv 2025
-
[26]
A. González-Sanz, R. S. Gvalani, and L. Koch. Sharp local sparsity of regularized optimal transport.arXiv:2604.00843, 2026
Pith/arXiv arXiv 2026
-
[27]
González-Sanz and M
A. González-Sanz and M. Nutz. Sparsity of quadratically regularized optimal transport: Scalar case, 2024. SIAM J. Math. Anal., forthcoming
2024
-
[28]
A. González-Sanz, M. Nutz, and A. Riveros Valdevenito. Linear convergence of gradient descent for quadratically regularized optimal transport.Preprint arXiv:2509.08547, 2025
Pith/arXiv arXiv 2025
-
[29]
A. González-Sanz, M. Nutz, and A. Riveros Valdevenito. Polyak–Łojasiewicz inequality for quadratically regularized optimal transport.SIAM J. Optim., 2026. forthcoming, arXiv:2605.27175
Pith/arXiv arXiv 2026
-
[30]
A. González-Sanz, S. Eckstein, and M. Nutz. Sparse regularized optimal transport without curse of dimensionality.Preprint arXiv:2505.04721, 2025
arXiv 2025
-
[31]
A. González-Sanz and S. Hundrieser. Weak limits for empirical entropic optimal transport: Beyond smooth costs.arXiv:2305.09745, 2023
arXiv 2023
-
[32]
A. González-Sanz, J.-M. Loubes, and J. Niles-Weed. Weak limits of entropy regularized optimal transport; potentials, plans and divergences.arXiv:2207.07427, 2024
arXiv 2024
-
[33]
Graf and H
S. Graf and H. Luschgy.Foundations of Quantization for Probability Distributions. Springer Berlin Heidelberg, 2000
2000
-
[34]
Groppe and S
M. Groppe and S. Hundrieser. Lower complexity adaptation for empirical entropic optimal transport.J. Mach. Learn. Res., 25:Paper No. 344, 55, 2024
2024
-
[35]
Hundrieser, T
S. Hundrieser, T. Staudt, and A. Munk. Empirical optimal transport between different measures adapts to lower complexity.Ann. Inst. Henri Poincaré Probab. Stat., 60(2):824–846, 2024
2024
- [36]
-
[37]
Liu, J.-G
A. Liu, J.-G. Liu, and Y. Lu. On the rate of convergence of empirical measure in∞-Wasserstein distance for unbounded density function.Quart. Appl. Math., 77(4):811–829, 2019
2019
-
[38]
D. A. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport.Appl. Math. Optim., 83(3):1919–1949, 2021
1919
-
[39]
Mena and J
G. Mena and J. Niles-Weed. Statistical bounds for entropic optimal transport: sample complex- ity and the central limit theorem. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
2019
-
[40]
Muzellec, R
B. Muzellec, R. Nock, G. Patrini, and F. Nielsen. Tsallis regularized optimal transport and ecological inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb. 2017
2017
-
[41]
Niles-Weed and P
J. Niles-Weed and P. Rigollet. Estimation of Wasserstein distances in the spiked transport model.Bernoulli, 28(4):2663–2688, 2022
2022
-
[42]
M. Nutz. Quadratically regularized optimal transport: existence and multiplicity of potentials. SIAM J. Math. Anal., 57(3):2622–2649, 2025
2025
-
[43]
Peyré and M
G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019
2019
-
[44]
Rigollet and A
P. Rigollet and A. J. Stromme. On the sample complexity of entropic optimal transport.Ann. Statist., 53(1):61–90, 2025
2025
-
[45]
Sadhu, Z
R. Sadhu, Z. Goldfeld, and K. Kato. Stability and statistical inference for semidiscrete optimal transport maps.Ann. Appl. Probab., 34(6):5694–5736, 2024
2024
-
[46]
Santambrogio.Optimal transport for applied mathematicians, volume 87 ofProgress in Non- linear Differential Equations and their Applications
F. Santambrogio.Optimal transport for applied mathematicians, volume 87 ofProgress in Non- linear Differential Equations and their Applications. Birkhäuser/Springer, Cham, 2015. Calculus of variations, PDEs, and modeling
2015
-
[47]
A. J. Stromme. Minimum intrinsic dimension scaling for entropic optimal transport. arXiv:2306.03398, 2023
arXiv 2023
-
[48]
A. W. van der Vaart and J. A. Wellner.Weak Convergence and Empirical Processes. Springer, 1996
1996
-
[49]
Villani.Optimal Transport: Old and New
C. Villani.Optimal Transport: Old and New. Springer-Verlag, Berlin, 2009
2009
-
[50]
Wiesel and X
J. Wiesel and X. Xu. Sparsity of quadratically regularized optimal transport: Bounds on con- centration and bias.SIAM J. Math. Anal., 57(6):6498–6521, 2025. A Proofs of Section 2 Proof of Proposition 2.3.Thisfollowsfromstandardargumentsasintheproofsof[30, Propo- sition 2.3, Theorem 3.2] (which are in turn partially based on [2]). Note that while [30] as- ...
2025
-
[51]
Sinceψis convex, it follows that bΦϕ,ε(bf ,bg)−bΦ(1) ϕ,ε(bf (1),bg(1))≤ bf(X 1)− bf(X ′ 1) n + 1 n2 nX j=1 n bf(X ′ 1)− bf(X 1)−c(X ′ 1, Yj) +c(X 1, Yj) o ψ′ bf(X ′
+bg(Yj)−c(X ′ 1, Yj) ε −ψ bf(X 1) +bg(Yj)−c(X 1, Yj) ε o + bf(X 1)− bf(X ′ 1) n . Sinceψis convex, it follows that bΦϕ,ε(bf ,bg)−bΦ(1) ϕ,ε(bf (1),bg(1))≤ bf(X 1)− bf(X ′ 1) n + 1 n2 nX j=1 n bf(X ′ 1)− bf(X 1)−c(X ′ 1, Yj) +c(X 1, Yj) o ψ′ bf(X ′
-
[52]
24 Choosing the pointwise extension ofbfgiven by Proposition 2.3(vi), the empirical first-order condition holds atX′
+bg(Yj)−c(X ′ 1, Yj) ε . 24 Choosing the pointwise extension ofbfgiven by Proposition 2.3(vi), the empirical first-order condition holds atX′
-
[53]
Using this optimality condition and the boundedness ofc(see Propo- sition 2.3), it follows that bΦϕ,ε(bf ,bg)−bΦ(1) ϕ,ε(bf (1),bg(1)) ≤ 1 n2 nX j=1 ψ′ bf(X ′
-
[54]
bf(X i) +bg(Yj)−c(X i, Yj) ε q−2 + + 1 # . 43 Consequently, 1 n nX i=1 π(1) i,j ≤(q−1)C q
+bg(Yj)−c(X ′ 1, Yj) ε (c(X1, Yj)−c(X ′ 1, Yj))≤ 2∥c∥∞ n . By symmetry, this impliesbΦϕ,ε(bf ,bg)−bΦ(1) ϕ,ε(bf (1),bg(1))≥ − 2∥c∥∞ n . We conclude thatA≤ 4∥c∥2 ∞ n2 , and analogously,B≤ 4∥c∥2 ∞ n2 . The claim follows. Proof of Lemma 3.6.Fix(x 0, y0)∈Ω×Ω ′. By Proposition 2.3, the function x7→f ε(x) +g ε(y0)−c(x, y 0) is Lipschitz with constant2L. Hence, f...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.