Polyak-Lojasiewicz Inequality for Quadratically Regularized Optimal Transport

Alberto Gonz\'alez-Sanz; Andr\'es Riveros Valdevenito; Marcel Nutz

arxiv: 2605.27175 · v1 · pith:V7ZOC2T2new · submitted 2026-05-26 · 🧮 math.OC · math.AP· math.FA

Polyak-Lojasiewicz Inequality for Quadratically Regularized Optimal Transport

Alberto Gonz\'alez-Sanz , Marcel Nutz , Andr\'es Riveros Valdevenito This is my paper

Pith reviewed 2026-06-29 15:35 UTC · model grok-4.3

classification 🧮 math.OC math.APmath.FA

keywords optimal transportquadratic regularizationPolyak-Lojasiewicz inequalitylinear convergencedual problemerror boundssemi-discrete transport

0 comments

The pith

The dual of quadratically regularized optimal transport satisfies a local Polyak-Lojasiewicz inequality near the optimum with explicit constants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Quadratically regularized optimal transport yields sparse couplings and avoids exponential instabilities, yet its dual objective includes a positive part that blocks standard strong-concavity arguments. The paper establishes that near the optimum this positive part behaves sufficiently well to produce a local error bound and Polyak-Lojasiewicz inequality. The results cover both continuous and semi-discrete settings and supply constants that depend only on the cost and marginals. These curvature properties immediately imply linear convergence for gradient ascent, coordinate ascent, and coordinate gradient ascent on the dual, each with an explicit contraction factor.

Core claim

Under mild assumptions covering both continuous and semi-discrete transport problems, the dual QOT objective satisfies a local error bound and a Polyak-Lojasiewicz inequality, with explicit constants depending only on the problem primitives. These results are obtained by functional-analytic techniques exploiting that near the optimum, the argument of the positive part function is positive on the interior of the support of the optimal coupling.

What carries the argument

Local positivity of the argument inside the positive part function on the interior of the optimal coupling support, which supplies the quantitative curvature needed for the error bound and PL inequality.

If this is right

Gradient ascent on the dual converges linearly with an explicit contraction rate that depends only on problem primitives.
Coordinate ascent on the dual converges linearly with an explicit rate depending only on problem primitives.
Coordinate gradient ascent on the dual converges linearly with an explicit rate depending only on problem primitives.
The same curvature constants apply uniformly to both continuous and semi-discrete formulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Quadratic regularization may therefore be algorithmically comparable to entropic regularization in terms of guaranteed linear rates.
The explicit constants could be used to select step sizes or stopping tolerances in practical QOT solvers.
Analogous positivity arguments might extend the PL property to other non-smooth regularizers in optimal transport.

Load-bearing premise

Near the optimum the argument of the positive part function stays positive on the interior of the support of the optimal coupling.

What would settle it

A concrete QOT instance in which the dual objective violates the stated local error bound or in which gradient ascent on the dual exhibits only sublinear convergence.

read the original abstract

Quadratically regularized optimal transport (QOT) is an alternative to entropic regularization that yields sparse couplings and avoids numerical instabilities due to exponential scaling. From an optimization viewpoint, the dual QOT objective is concave but features a positive part function which prevents strong concavity and reduces smoothness of optimizers. Consequently, standard arguments for linear convergence of algorithms do not apply. In this paper, we nevertheless establish a quantitative curvature property for the QOT dual. Under mild assumptions covering both continuous and semi-discrete transport problems, we prove a local error bound and a Polyak-Lojasiewicz (PL) inequality, with explicit constants depending only on the problem primitives. These results are obtained by functional-analytic techniques exploiting that near the optimum, the argument of the positive part function is positive on the interior of the support of the optimal coupling. As applications, we derive linear convergence of the gradient ascent, coordinate ascent, and coordinate gradient ascent algorithms on the dual problem, with explicit contraction rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a local PL inequality and error bound for the QOT dual via functional analysis on a positivity condition, yielding explicit linear rates for three dual algorithms.

read the letter

The main result is a local Polyak-Lojasiewicz inequality and error bound for the dual of quadratically regularized optimal transport, with constants that depend only on the problem data. This is new because the positive-part term blocks the usual strong-concavity arguments that work for entropic OT, so the authors supply a tailored local curvature property instead.

They obtain the result by functional-analytic estimates that exploit positivity of the positive-part argument on the interior support near the optimum. The same property then gives linear convergence for gradient ascent, coordinate ascent, and coordinate gradient ascent on the dual, again with explicit contraction factors. The coverage of both continuous and semi-discrete problems under one set of mild assumptions is a practical plus.

The soft spot is exactly the positivity step flagged in the stress test. The abstract presents it as following from the mild assumptions rather than as an extra hypothesis, yet if the argument of the positive part fails to stay positive near the optimum in some semi-discrete or boundary-touching cases, the local curvature and the explicit constants would not hold. The paper needs to show clearly whether this is proved or merely assumed; otherwise the rates rest on an unverified local condition.

The work is aimed at researchers who optimize non-entropic regularized transport problems and want convergence guarantees beyond the entropic case. A reader already working in that subfield will find the explicit rates and the functional-analytic route useful. The paper deserves peer review so the positivity argument and the constant derivations can be checked in detail.

Referee Report

1 major / 0 minor

Summary. The paper establishes a local error bound and Polyak-Łojasiewicz inequality for the dual objective of quadratically regularized optimal transport (QOT). Under mild assumptions applicable to both continuous and semi-discrete problems, it derives explicit constants depending only on problem primitives via functional-analytic techniques that exploit positivity of the positive-part argument on the interior support of the optimal coupling near the optimum. This yields linear convergence guarantees (with explicit rates) for gradient ascent, coordinate ascent, and coordinate gradient ascent on the dual.

Significance. If the local positivity property holds under the stated mild assumptions, the result supplies the first explicit curvature analysis for the non-strongly-concave QOT dual, enabling rigorous linear-rate guarantees for first-order methods in a setting where entropic regularization is avoided. The explicit dependence of constants solely on primitives is a clear strength.

major comments (1)

[Abstract] Abstract (and the functional-analytic argument described therein): the derivation of the local error bound and PL inequality with explicit constants rests on the claim that, near the optimum, the argument of the positive-part function is positive on the interior of the support of the optimal coupling. The paper presents this as following from the mild assumptions covering semi-discrete cases, yet atoms or boundary mass in the optimal plan could violate interior positivity; without an explicit verification or additional hypothesis in the main argument, the claimed constants and local curvature do not necessarily follow.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying a point that requires clarification in the justification of the interior positivity property. We address the major comment below and will strengthen the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract (and the functional-analytic argument described therein): the derivation of the local error bound and PL inequality with explicit constants rests on the claim that, near the optimum, the argument of the positive-part function is positive on the interior of the support of the optimal coupling. The paper presents this as following from the mild assumptions covering semi-discrete cases, yet atoms or boundary mass in the optimal plan could violate interior positivity; without an explicit verification or additional hypothesis in the main argument, the claimed constants and local curvature do not necessarily follow.

Authors: We agree that the current presentation does not contain an explicit verification that the mild assumptions suffice to guarantee the required interior positivity near the optimum, particularly when the optimal plan may involve atoms (as in semi-discrete settings) or boundary mass. While the assumptions are chosen to cover both continuous and semi-discrete problems, an additional lemma or remark making this step rigorous would eliminate any ambiguity in the derivation of the explicit constants. In the revised manuscript we will insert such a verification (based on the problem primitives and the definition of the support) immediately before the main functional-analytic argument, thereby confirming that the local error bound and PL inequality hold with the stated constants under the given hypotheses. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent functional analysis under stated assumptions

full rationale

The paper proves a local error bound and PL inequality for the dual QOT objective via functional-analytic techniques that exploit positivity of the positive-part argument near the optimum on the interior support of the optimal coupling. This positivity is presented as following from the mild assumptions covering continuous and semi-discrete problems, not as a self-referential definition or fitted input. No equations reduce by construction to prior results via self-citation chains, uniqueness theorems imported from the same authors, or ansatzes smuggled through citations. The explicit constants depend only on problem primitives, and the derivation chain is self-contained without renaming known empirical patterns or calling fitted quantities predictions. This is the normal case of a proof paper whose central claims rest on external mathematical techniques rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities beyond the stated mild assumptions and the local positivity condition on the positive-part argument.

axioms (1)

domain assumption Mild assumptions covering continuous and semi-discrete transport problems
Invoked to guarantee the local positivity property near the optimum.

pith-pipeline@v0.9.1-grok · 5712 in / 1156 out tokens · 27870 ms · 2026-06-29T15:35:13.984034+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Finite-sample bounds for regularized optimal transport
math.ST 2026-06 unverdicted novelty 7.0

Establishes explicit finite-sample bias and variance bounds for regularized OT costs that improve prior entropic results, deliver the first quantitative bounds for L^p regularization, and yield an n^{-2/(d+4)} rate fo...
Stability of Quadratically Regularized Optimal Transport
math.OC 2026-05 unverdicted novelty 7.0

Establishes L^∞-stability of dual potentials in QOT, yielding local Lipschitz stability of the optimal coupling support in Hausdorff distance for quadratic cost under marginal perturbations.

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Bayraktar, S

E. Bayraktar, S. Eckstein, and X. Zhang. Stability and sample complexity of divergence regu- larized optimal transport.Bernoulli, 31(1):213–239, 2025

2025
[2]

Blondel, V

M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. volume 84 of Proceedings of Machine Learning Research, pages 880–889, 2018

2018
[3]

J. F. Bonnans and A. Shapiro.Perturbation analysis of optimization problems. Springer Series in Operations Research. Springer-Verlag, New York, 2000

2000
[4]

Bourgain, H

J. Bourgain, H. Brezis, and P. Mironescu. Another look at Sobolev spaces. InOptimal Control and Partial Differential Equations, pages 439–455. IOS, Amsterdam, 2001

2001
[5]

Brezis.Functional analysis, Sobolev spaces and partial differential equations

H. Brezis.Functional analysis, Sobolev spaces and partial differential equations. Universitext. Springer, New York, 2011

2011
[6]

G. Carlier. On the linear convergence of the multimarginal Sinkhorn algorithm.SIAM J. Optim., 32(2):786–794, 2022

2022
[7]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors,Advances in Neural Infor- mation Processing Systems, volume 26. Curran Associates, Inc., 2013

2013
[8]

Eckstein and M

S. Eckstein and M. Kupper. Computation of optimal transport and related hedging problems via penalization and neural networks.Appl. Math. Optim., 83(2):639–667, 2021

2021
[9]

Eckstein and M

S. Eckstein and M. Nutz. Convergence rates for regularized optimal transport via quantization. Math. Oper. Res., 49(2):1223–1240, 2024

2024
[10]

Essid and J

M. Essid and J. Solomon. Quadratically regularized optimal transport on graphs.SIAM J. Sci. Comput., 40(4):A1961–A1986, 2018

2018
[11]

G. B. Folland.Real analysis. Pure and Applied Mathematics. John Wiley & Sons, New York, second edition, 1999

1999
[12]

Genevay, M

A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport. InAdvances in Neural Information Processing Systems 29, pages 3440–3448, 2016

2016
[13]

Sample complexity of quadratically regularized optimal transport

A. González-Sanz, E. del Barrio, and M. Nutz. Sample complexity of quadratically regularized optimal transport.arXiv:2511.09807, 2025. 31

work page arXiv 2025
[14]

González-Sanz, S

A. González-Sanz, S. Eckstein, and M. Nutz. Sparse regularized optimal transport without curse of dimensionality.arXiv:2505.04721, 2025

work page arXiv 2025
[15]

González-Sanz and M

A. González-Sanz and M. Nutz. Stability of quadratically regularized optimal transport.In preparation, 2026

2026
[16]

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport

A. González-Sanz, M. Nutz, and A. Riveros Valdevenito. Linear convergence of gradient descent for quadratically regularized optimal transport.arXiv:2509.08547, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

and Nutz, M

A. González-Sanz and M. Nutz. Sparsity of quadratically regularized optimal transport: Scalar case.SIAM J. Math. Anal., forthcoming, 2024. Preprint arXiv:2410.03353

work page arXiv 2024
[18]

Grisvard.Elliptic Problems in Nonsmooth Domains, volume 24 ofMonographs and Studies in Mathematics

P. Grisvard.Elliptic Problems in Nonsmooth Domains, volume 24 ofMonographs and Studies in Mathematics. Pitman, Boston, 1985

1985
[19]

Gulrajani, F

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of Wasserstein GANs. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 5769–5779, 2017

2017
[20]

L. Li, A. Genevay, M. Yurochkin, and J. Solomon. Continuous regularized Wasserstein barycen- ters. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 17755–17765. Curran Associates, Inc., 2020

2020
[21]

D. A. Lorenz and H. Mahler. Orlicz-space regularization for optimal transport and algorithms for quadratic regularization.arXiv:1909.06082, 2019

work page arXiv 1909
[22]

D. A. Lorenz and H. Mahler. Orlicz space regularization of continuous optimal transport prob- lems.Appl. Math. Optim., 85(2):Paper No. 14, 33, 2022

2022
[23]

D. A. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport.Appl. Math. Optim., 83(3):1919–1949, 2021

1919
[24]

Muzellec, R

B. Muzellec, R. Nock, G. Patrini, and F. Nielsen. Tsallis regularized optimal transport and ecological inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017

2017
[25]

M. Nutz. Introduction to entropic optimal transport, 2021. Lecture notes, Columbia University, https://www.math.columbia.edu/~mnutz/docs/EOT_lecture_notes.pdf

2021
[26]

M. Nutz. Quadratically regularized optimal transport: existence and multiplicity of potentials. SIAM J. Math. Anal., 57(3):2622–2649, 2025

2025
[27]

Peyré and M

G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5–6):355–607, 2019

2019
[28]

Rüschendorf and W

L. Rüschendorf and W. Thomsen. Note on the Schrödinger equation andI-projections.Statist. Probab. Lett., 17(5):369–375, 1993

1993
[29]

Schmitzer

B. Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM J. Sci. Comput., 41(3):A1443–A1481, 2019

2019
[30]

Seguy, B

V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel. Large scale optimal transport and mapping estimation. InInternational Conference on Learning Represen- tations, 2018

2018
[31]

A. J. Stromme. Minimum intrinsic dimension scaling for entropic optimal transport. arXiv:2306.03398, 2023

work page arXiv 2023
[32]

Wiesel and X

J. Wiesel and X. Xu. Sparsity of quadratically regularized optimal transport: Bounds on con- centration and bias.SIAM J. Math. Anal., 57(6):6498–6521, 2025

2025
[33]

Zhang, G

S. Zhang, G. Mordant, T. Matsumoto, and G. Schiebinger. Manifold learning with sparse regularised optimal transport.arXiv:2307.09816, 2023. 32

work page arXiv 2023

[1] [1]

Bayraktar, S

E. Bayraktar, S. Eckstein, and X. Zhang. Stability and sample complexity of divergence regu- larized optimal transport.Bernoulli, 31(1):213–239, 2025

2025

[2] [2]

Blondel, V

M. Blondel, V. Seguy, and A. Rolet. Smooth and sparse optimal transport. volume 84 of Proceedings of Machine Learning Research, pages 880–889, 2018

2018

[3] [3]

J. F. Bonnans and A. Shapiro.Perturbation analysis of optimization problems. Springer Series in Operations Research. Springer-Verlag, New York, 2000

2000

[4] [4]

Bourgain, H

J. Bourgain, H. Brezis, and P. Mironescu. Another look at Sobolev spaces. InOptimal Control and Partial Differential Equations, pages 439–455. IOS, Amsterdam, 2001

2001

[5] [5]

Brezis.Functional analysis, Sobolev spaces and partial differential equations

H. Brezis.Functional analysis, Sobolev spaces and partial differential equations. Universitext. Springer, New York, 2011

2011

[6] [6]

G. Carlier. On the linear convergence of the multimarginal Sinkhorn algorithm.SIAM J. Optim., 32(2):786–794, 2022

2022

[7] [7]

M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors,Advances in Neural Infor- mation Processing Systems, volume 26. Curran Associates, Inc., 2013

2013

[8] [8]

Eckstein and M

S. Eckstein and M. Kupper. Computation of optimal transport and related hedging problems via penalization and neural networks.Appl. Math. Optim., 83(2):639–667, 2021

2021

[9] [9]

Eckstein and M

S. Eckstein and M. Nutz. Convergence rates for regularized optimal transport via quantization. Math. Oper. Res., 49(2):1223–1240, 2024

2024

[10] [10]

Essid and J

M. Essid and J. Solomon. Quadratically regularized optimal transport on graphs.SIAM J. Sci. Comput., 40(4):A1961–A1986, 2018

2018

[11] [11]

G. B. Folland.Real analysis. Pure and Applied Mathematics. John Wiley & Sons, New York, second edition, 1999

1999

[12] [12]

Genevay, M

A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport. InAdvances in Neural Information Processing Systems 29, pages 3440–3448, 2016

2016

[13] [13]

Sample complexity of quadratically regularized optimal transport

A. González-Sanz, E. del Barrio, and M. Nutz. Sample complexity of quadratically regularized optimal transport.arXiv:2511.09807, 2025. 31

work page arXiv 2025

[14] [14]

González-Sanz, S

A. González-Sanz, S. Eckstein, and M. Nutz. Sparse regularized optimal transport without curse of dimensionality.arXiv:2505.04721, 2025

work page arXiv 2025

[15] [15]

González-Sanz and M

A. González-Sanz and M. Nutz. Stability of quadratically regularized optimal transport.In preparation, 2026

2026

[16] [16]

Linear Convergence of Gradient Descent for Quadratically Regularized Optimal Transport

A. González-Sanz, M. Nutz, and A. Riveros Valdevenito. Linear convergence of gradient descent for quadratically regularized optimal transport.arXiv:2509.08547, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

and Nutz, M

A. González-Sanz and M. Nutz. Sparsity of quadratically regularized optimal transport: Scalar case.SIAM J. Math. Anal., forthcoming, 2024. Preprint arXiv:2410.03353

work page arXiv 2024

[18] [18]

Grisvard.Elliptic Problems in Nonsmooth Domains, volume 24 ofMonographs and Studies in Mathematics

P. Grisvard.Elliptic Problems in Nonsmooth Domains, volume 24 ofMonographs and Studies in Mathematics. Pitman, Boston, 1985

1985

[19] [19]

Gulrajani, F

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of Wasserstein GANs. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 5769–5779, 2017

2017

[20] [20]

L. Li, A. Genevay, M. Yurochkin, and J. Solomon. Continuous regularized Wasserstein barycen- ters. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 17755–17765. Curran Associates, Inc., 2020

2020

[21] [21]

D. A. Lorenz and H. Mahler. Orlicz-space regularization for optimal transport and algorithms for quadratic regularization.arXiv:1909.06082, 2019

work page arXiv 1909

[22] [22]

D. A. Lorenz and H. Mahler. Orlicz space regularization of continuous optimal transport prob- lems.Appl. Math. Optim., 85(2):Paper No. 14, 33, 2022

2022

[23] [23]

D. A. Lorenz, P. Manns, and C. Meyer. Quadratically regularized optimal transport.Appl. Math. Optim., 83(3):1919–1949, 2021

1919

[24] [24]

Muzellec, R

B. Muzellec, R. Nock, G. Patrini, and F. Nielsen. Tsallis regularized optimal transport and ecological inference.Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 2017

2017

[25] [25]

M. Nutz. Introduction to entropic optimal transport, 2021. Lecture notes, Columbia University, https://www.math.columbia.edu/~mnutz/docs/EOT_lecture_notes.pdf

2021

[26] [26]

M. Nutz. Quadratically regularized optimal transport: existence and multiplicity of potentials. SIAM J. Math. Anal., 57(3):2622–2649, 2025

2025

[27] [27]

Peyré and M

G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends®in Machine Learning, 11(5–6):355–607, 2019

2019

[28] [28]

Rüschendorf and W

L. Rüschendorf and W. Thomsen. Note on the Schrödinger equation andI-projections.Statist. Probab. Lett., 17(5):369–375, 1993

1993

[29] [29]

Schmitzer

B. Schmitzer. Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM J. Sci. Comput., 41(3):A1443–A1481, 2019

2019

[30] [30]

Seguy, B

V. Seguy, B. B. Damodaran, R. Flamary, N. Courty, A. Rolet, and M. Blondel. Large scale optimal transport and mapping estimation. InInternational Conference on Learning Represen- tations, 2018

2018

[31] [31]

A. J. Stromme. Minimum intrinsic dimension scaling for entropic optimal transport. arXiv:2306.03398, 2023

work page arXiv 2023

[32] [32]

Wiesel and X

J. Wiesel and X. Xu. Sparsity of quadratically regularized optimal transport: Bounds on con- centration and bias.SIAM J. Math. Anal., 57(6):6498–6521, 2025

2025

[33] [33]

Zhang, G

S. Zhang, G. Mordant, T. Matsumoto, and G. Schiebinger. Manifold learning with sparse regularised optimal transport.arXiv:2307.09816, 2023. 32

work page arXiv 2023