arxiv: 2602.01372 · v2 · submitted 2026-02-01 · 🧮 math.OC · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Robust Sublinear Convergence Rates for Iterative Bregman Projections

Gabriel Peyr\'e

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:31 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords entropic regularizationBregman projectionsSinkhorn algorithmconvergence ratesoptimal transportWasserstein distancequotient norms

0 comments

The pith

A general blueprint establishes O(1/k) dual convergence for cyclic KL projections on entropically regularized split linear programs, with the constant scaling only linearly in 1/γ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper supplies a reusable method to prove sublinear dual convergence for alternating Bregman projections that solve entropically regularized linear programs whose constraints split into tractable blocks. It isolates the dependence on the regularization parameter γ so that the iteration count grows only linearly with 1/γ rather than worse. This mild scaling lets the same algorithm approximate the original unregularized problem by driving γ to zero without the total work exploding. The method reduces the entire proof to two verifiable inputs: a uniform bound on the primal iterates and a bound on the dual gap measured in a quotient norm induced by the constraint split. Two helper lemmas convert non-expansiveness of the dual steps into the needed quotient-norm control.

Core claim

The paper shows that robust O(1/k) dual convergence follows once a uniform primal bound and a dual bound in the quotient norm are available; non-expansiveness of the KL projection steps then yields the quotient bound automatically. Instantiating the blueprint on graph-structured transport produces a flow-Sinkhorn algorithm whose arithmetic complexity for ε-accurate transshipment cost is O(p diameter³ / ε⁴) up to logs, where p is the number of edges.

What carries the argument

The quotient dual norm induced by the constraint split, controlled via non-expansiveness of the dual iterations.

Load-bearing premise

The primal iterates stay bounded by a constant that does not grow with the regularization strength γ.

What would settle it

For a concrete split constraint such as matrix scaling or graph transshipment, run the projections while decreasing γ and check whether the number of iterations needed for fixed dual accuracy grows faster than linearly in 1/γ.

read the original abstract

Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/\gamma$, where $\gamma$ is the entropic regularization parameter. We call such rates "robust", because this mild dependence on $\gamma$ underpins favorable complexity bounds for approximating the unregularized problem via alternating KL projections. The blueprint reduces the proof to a uniform primal bound and a dual bound for a quotient norm induced by the constraint split. To make these inputs usable, we propose two helper results, which rely on the non-expansiveness of the dual iterations in this quotient dual norm. Instantiating this blueprint for graph-structured transport yields a new flow-Sinkhorn algorithm for the Wasserstein-1 distance on graphs. It achieves $\varepsilon$-additive accuracy on the transshipment cost in $O(p\,\mathrm{diameter}^3/\varepsilon^{4})$ arithmetic operations (up to logarithmic factors), where $p$ is the number of edges. We also provide a machine-checked Lean formalization of the core blueprint and its graph-$\mathrm{W}_1$ instantiation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies a reusable blueprint for proving robust O(1/k) dual rates in cyclic KL projections that scale only linearly in 1/γ, backed by Lean and a concrete flow-Sinkhorn algorithm for graph W1.

read the letter

The main thing to know is that this work reduces the proof of robust sublinear convergence for entropic Bregman projections to a uniform primal bound plus a quotient-norm dual bound, both obtained from non-expansiveness, and then verifies the whole thing in Lean for the graph-W1 case. That linear-in-1/γ dependence is the practical payoff, since it lets you use these methods to approximate unregularized problems without the rate deteriorating badly as γ shrinks. The flow-Sinkhorn instantiation gives an explicit O(p * diameter^3 / ε^4) arithmetic complexity for ε-accurate transshipment cost on graphs, which is a usable bound for network problems. The Lean formalization of the core blueprint and its instantiation is genuine evidence that the logic is tight and not circular. The helper lemmas based on non-expansiveness look standard but are placed exactly where needed to make the inputs usable. The citation pattern stays within the Bregman-projection and Sinkhorn literature without obvious omissions. The soft spot is that the uniform primal bound still has to be checked for any new constraint split, and while the paper supplies general lemmas, that verification step is not automatic and could be the main effort in fresh applications. The graph-W1 example works cleanly, but broader usefulness hinges on how often the bound comes easily elsewhere. No internal contradictions or γ-dependent leakage show up in the argument. This is for researchers who need complexity guarantees for regularized block-structured LPs or graph transport and who value formal checks. It deserves peer review; the formalization and explicit rate are strong enough to justify referee time even if the generality claims need more examples.

Referee Report

2 major / 2 minor

Summary. The paper presents a general blueprint for proving O(1/k) dual convergence rates for cyclic KL Bregman projections applied to entropically regularized linear programs with split constraints. The rates are robust in that the leading constant scales linearly with 1/γ (γ the regularization parameter). The blueprint reduces the proof to a uniform primal bound plus a dual bound on a quotient norm induced by the constraint split; two helper results establish these bounds from non-expansiveness of the dual map. The blueprint is instantiated on graph-structured optimal transport to obtain a flow-Sinkhorn algorithm for the Wasserstein-1 distance whose arithmetic complexity is O(p·diameter³/ε⁴) (up to logs). A machine-checked Lean formalization of the core statements and the graph-W₁ instantiation is supplied.

Significance. If the central claims hold, the work supplies a reusable, parameter-light framework for obtaining robust sublinear rates that remain useful when γ must be driven to zero to recover the unregularized problem. The explicit graph-W₁ instantiation yields a concrete new algorithm with a fully explicit rate, while the Lean formalization provides machine-checked verification of the blueprint and its instantiation—both of which are genuine strengths.

major comments (2)

[§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.
[§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.

minor comments (2)

[Abstract] The abstract claims the complexity holds “up to logarithmic factors”; the precise logarithmic terms (arising from the number of outer iterations or from the Lean-checked constants) should be stated explicitly in the main theorem statement.
[§2] Notation for the quotient dual norm is introduced in §2 but used heavily in §4; an early displayed definition or a short table of norms would improve readability for readers unfamiliar with the constraint-split construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.

Authors: We agree that an explicit delineation of the minimal conditions would improve the reusability of the blueprint. In the revised manuscript we will add a short remark immediately after Assumption 3.2 that states the precise requirements on the constraint split (linearity of the blocks and boundedness of the dual map) and on the uniform primal bound that together guarantee the O(1/k) dual rate is independent of γ. The helper lemmas for the graph case will remain unchanged, as they already satisfy these conditions. revision: yes
Referee: [§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.

Authors: The cubic diameter factor arises because the quotient norm is taken with respect to the natural splitting of the flow-conservation constraints on the graph; the diameter enters when bounding the dual variables in this norm via shortest-path distances. This choice is required to obtain the non-expansiveness of the dual map that underpins the helper results. While a different quotient could be considered, it would necessitate a separate non-expansiveness proof and, on general graphs, is unlikely to remove the cubic term without additional assumptions (e.g., bounded degree or planarity). We will insert a clarifying paragraph in §5.3 that explains this dependence and notes that improving the exponent remains an open direction. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation reduces the O(1/k) dual rate to a uniform primal bound plus a quotient-norm dual bound, both obtained directly from non-expansiveness of the dual map (standard for Bregman projections). These inputs are independent of γ and are covered by an explicit machine-checked Lean formalization of the core statements and the graph-W1 instantiation. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the rate constant is derived from the norm bound rather than assumed or renamed.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard convex-analysis facts about Bregman projections and KL divergence together with the non-expansiveness property in the quotient dual norm; no new free parameters or invented entities are introduced.

axioms (2)

standard math Bregman projections onto convex sets are non-expansive in the dual quotient norm induced by the constraint split
Invoked to obtain the dual bound independent of γ
domain assumption The primal objective admits a uniform bound independent of the regularization parameter γ
Required for the O(1/k) rate to remain robust when γ → 0

pith-pipeline@v0.9.0 · 5549 in / 1385 out tokens · 32650 ms · 2026-05-16T08:31:02.284826+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.1 (Sub-linear dual rate) ... Δ_k ≤ 8 X_γ U_γ² ∥A∥_{1→1}² / (γ k)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 4.2 ... non-expansiveness of Ψ w.r.t. ∥·∥_V1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Ahuja, Thomas L

Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin.Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. 21

work page 1993
[2]

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration.Advances in Neural Information Processing Systems, 30, 2017

Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration.Advances in Neural Information Processing Systems, 30, 2017

work page 2017
[3]

Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM.Advances in Neural Information Processing Systems, 35:17263–17275, 2022

Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien L´ eger. Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM.Advances in Neural Information Processing Systems, 35:17263–17275, 2022

work page 2022
[4]

A continuous model of transportation.Econometrica, 20:643–660, 1952

Martin Beckmann. A continuous model of transportation.Econometrica, 20:643–660, 1952

work page 1952
[5]

Entropy minimization, dad problems, and doubly stochastic kernels.Journal of Functional Analysis, 123(2):264–307, 1994

Jonathan Borwein, Adrian Lewis, and Roger Nussbaum. Entropy minimization, dad problems, and doubly stochastic kernels.Journal of Functional Analysis, 123(2):264–307, 1994

work page 1994
[6]

Lev M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967

work page 1967
[7]

Some relations between non expansive and order preserving maps.Proceedings of the AMS, 78(3):385–390, 1980

Michael Candrall and Luc Tartar. Some relations between non expansive and order preserving maps.Proceedings of the AMS, 78(3):385–390, 1980

work page 1980
[8]

On the linear convergence of the multimarginal sinkhorn algorithm.SIAM Journal on Optimization, 32(2):786–794, 2022

Guillaume Carlier. On the linear convergence of the multimarginal sinkhorn algorithm.SIAM Journal on Optimization, 32(2):786–794, 2022

work page 2022
[9]

A continuous theory of traffic congestion and wardrop equilibria.Journal of Mathematical Sciences, 181(6):792–804, 2012

Guillaume Carlier and Filippo Santambrogio. A continuous theory of traffic congestion and wardrop equilibria.Journal of Mathematical Sciences, 181(6):792–804, 2012

work page 2012
[10]

Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015

Yair Censor and Michal Rezaˇ c. Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015

work page 2015
[11]

Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling.Mathematical Programming, 188(1):395–407, 2021

Deeparnab Chakrabarty and Sanjeev Khanna. Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling.Mathematical Programming, 188(1):395–407, 2021

work page 2021
[12]

Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025

Li Chen, Rasmus Kyng, Yang Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025

work page 2025
[13]

Entropic and displacement interpola- tion: A computational approach using the hilbert metric.SIAM Journal on Applied Mathe- matics, 76(6):2375–2396, 2016

Yongxin Chen, Tryphon Georgiou, and Michele Pavon. Entropic and displacement interpola- tion: A computational approach using the hilbert metric.SIAM Journal on Applied Mathe- matics, 76(6):2375–2396, 2016

work page 2016
[14]

Sharper exponential convergence rates for sinkhorn’s algorithm in continuous settings.Mathematical Programming, pages 1–50, 2025

L´ ena¨ ıc Chizat, Alex Delalande, and Tomas Vaˇ skeviˇ cius. Sharper exponential convergence rates for sinkhorn’s algorithm in continuous settings.Mathematical Programming, pages 1–50, 2025

work page 2025
[15]

Faster Wasserstein distance estimation with the Sinkhorn divergence.Advances in Neural Information Processing Systems, 33:2257–2269, 2020

L´ ena¨ ıc Chizat, Pierre Roussillon, Flavien L´ eger, Fran¸ cois-Xavier Vialard, and Gabriel Peyr´ e. Faster Wasserstein distance estimation with the Sinkhorn divergence.Advances in Neural Information Processing Systems, 33:2257–2269, 2020

work page 2020
[16]

Quantitative contraction rates for Sinkhorn's algorithm: beyond bounded costs and compact marginals

Giovanni Conforti, Alain Durmus, and Giacomo Greco. Quantitative contraction rates for Sinkhorn algorithm: beyond bounded costs and compact marginals.arXiv preprint arXiv:2304.04451, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Information geometry and alternating minimization proce- dures.Statistics & Decisions, Supplement 1:205–237, 1984

Imre Csisz´ ar and G´ abor Tusn´ ady. Information geometry and alternating minimization proce- dures.Statistics & Decisions, Supplement 1:205–237, 1984. 22

work page 1984
[18]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013

work page 2013
[19]

Faster approximate lossy generalized flow via in- terior point algorithms

Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized flow via in- terior point algorithms. InProceedings of the fortieth annual ACM symposium on Theory of computing, pages 451–460, 2008

work page 2008
[20]

Quantitative uniform stability of the iterative proportional fitting procedure.The Annals of Applied Probability, 34(1A):501– 516, 2024

George Deligiannidis, Valentin de Bortoli, and Arnaud Doucet. Quantitative uniform stability of the iterative proportional fitting procedure.The Annals of Applied Probability, 34(1A):501– 516, 2024

work page 2024
[21]

William Edwards Deming and Frederick F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known.The Annals of Math- ematical Statistics, 11(4):427–444, 1940

work page 1940
[22]

Nested dissection meets ipms: Planar min-cost flow in nearly-linear time

Sally Dong, Yu Gao, Gramoz Goranci, Yin Tat Lee, Sushant Sachdeva, Richard Peng, and Guanghao Ye. Nested dissection meets ipms: Planar min-cost flow in nearly-linear time. Journal of the ACM, 72(4):1–75, 2025

work page 2025
[23]

Computational optimal trans- port: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm

Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Computational optimal trans- port: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In International conference on machine learning, pages 1367–1376, 2018

work page 2018
[24]

Hilbert’s projective metric for functions of bounded growth and exponential convergence of sinkhorn’s algorithm, 2025

Stephan Eckstein. Hilbert’s projective metric for functions of bounded growth and exponential convergence of sinkhorn’s algorithm, 2025

work page 2025
[25]

The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(3):569–592, 2012

Steven N Evans and Frederick A Matsen. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(3):569–592, 2012

work page 2012
[26]

Franklin and Jens Lorenz

Joel N. Franklin and Jens Lorenz. On the scaling of multidimensional matrices.Linear Algebra and Its Applications, 114–115:717–735, 1989

work page 1989
[27]

On certain inequalities and characteristic symmetric bilinear forms.Mathe- matische Annalen, 115:249–290, 1938

Kurt Friedrichs. On certain inequalities and characteristic symmetric bilinear forms.Mathe- matische Annalen, 115:249–290, 1938

work page 1938
[28]

Fast contour matching using approximate earth mover’s distance

Kristen Grauman and Trevor Darrell. Fast contour matching using approximate earth mover’s distance. InProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I. IEEE, 2004

work page 2004
[29]

Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach

Giacomo Greco, Maxence Noble, Giovanni Conforti, and Alain Durmus. Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach. InProceedings of Thirty Sixth Conference on Learning Theory, volume 195, pages 716–746, 2023

work page 2023
[30]

On the complexity of general matrix scaling and entropy minimization via the RAS algorithm.Mathematical Programming, 112(2):371–401, 2008

Bahman Kalantari, Isabella Lari, Federica Ricca, and Bruno Simeone. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm.Mathematical Programming, 112(2):371–401, 2008

work page 2008
[31]

From word embeddings to document distances

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational conference on machine learning, pages 957–966. PMLR, 2015. 23

work page 2015
[32]

A gradient descent perspective on Sinkhorn.Applied Mathematics & Optimiza- tion, 84(2):1843–1855, 2021

Flavien L´ eger. A gradient descent perspective on Sinkhorn.Applied Mathematics & Optimiza- tion, 84(2):1843–1855, 2021

work page 2021
[33]

An optimal transport approach for the schr¨ odinger bridge problem and convergence of sinkhorn algorithm.Journal of Scientific Computing, 85(2):27, 2020

Simone Di Marino and Augusto Gerolin. An optimal transport approach for the schr¨ odinger bridge problem and convergence of sinkhorn algorithm.Journal of Scientific Computing, 85(2):27, 2020

work page 2020
[34]

A metric for distributions with appli- cations to image databases

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. A metric for distributions with appli- cations to image databases. InSixth international conference on computer vision (IEEE Cat. No. 98CH36271), pages 59–66. IEEE, 1998

work page 1998
[35]

Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015

Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Sen- babaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015

work page 2015
[36]

Springer, 2015

Filippo Santambrogio.Optimal transport for applied mathematicians. Springer, 2015

work page 2015
[37]

Ollivier-Ricci curvature-based method to community detection in complex networks.Scientific reports, 9(1):9800, 2019

Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-Ricci curvature-based method to community detection in complex networks.Scientific reports, 9(1):9800, 2019

work page 2019
[38]

Sinkhorn

R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matri- ces.Ann. Math. Statist., 35:876–879, 1964

work page 1964
[39]

Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014

Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014

work page 2014
[40]

topical maps

Udny Yule. On the methods of measuring association between two attributes.Journal of the Royal Statistical Society, 75(6):579–652, 1912. A Non-expansiveness in Variation Semi-norm A.1 Topical maps and non-expansiveness We first recall a classical result of so-called “topical maps” in the nonlinear Perron–Frobenius/max–plus theory [7]. This ensures the non...

work page 1912