Recognition: 2 theorem links
· Lean TheoremRobust Sublinear Convergence Rates for Iterative Bregman Projections
Pith reviewed 2026-05-16 08:31 UTC · model grok-4.3
The pith
A general blueprint establishes O(1/k) dual convergence for cyclic KL projections on entropically regularized split linear programs, with the constant scaling only linearly in 1/γ.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that robust O(1/k) dual convergence follows once a uniform primal bound and a dual bound in the quotient norm are available; non-expansiveness of the KL projection steps then yields the quotient bound automatically. Instantiating the blueprint on graph-structured transport produces a flow-Sinkhorn algorithm whose arithmetic complexity for ε-accurate transshipment cost is O(p diameter³ / ε⁴) up to logs, where p is the number of edges.
What carries the argument
The quotient dual norm induced by the constraint split, controlled via non-expansiveness of the dual iterations.
Load-bearing premise
The primal iterates stay bounded by a constant that does not grow with the regularization strength γ.
What would settle it
For a concrete split constraint such as matrix scaling or graph transshipment, run the projections while decreasing γ and check whether the number of iterations needed for fixed dual accuracy grows faster than linearly in 1/γ.
read the original abstract
Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/\gamma$, where $\gamma$ is the entropic regularization parameter. We call such rates "robust", because this mild dependence on $\gamma$ underpins favorable complexity bounds for approximating the unregularized problem via alternating KL projections. The blueprint reduces the proof to a uniform primal bound and a dual bound for a quotient norm induced by the constraint split. To make these inputs usable, we propose two helper results, which rely on the non-expansiveness of the dual iterations in this quotient dual norm. Instantiating this blueprint for graph-structured transport yields a new flow-Sinkhorn algorithm for the Wasserstein-1 distance on graphs. It achieves $\varepsilon$-additive accuracy on the transshipment cost in $O(p\,\mathrm{diameter}^3/\varepsilon^{4})$ arithmetic operations (up to logarithmic factors), where $p$ is the number of edges. We also provide a machine-checked Lean formalization of the core blueprint and its graph-$\mathrm{W}_1$ instantiation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a general blueprint for proving O(1/k) dual convergence rates for cyclic KL Bregman projections applied to entropically regularized linear programs with split constraints. The rates are robust in that the leading constant scales linearly with 1/γ (γ the regularization parameter). The blueprint reduces the proof to a uniform primal bound plus a dual bound on a quotient norm induced by the constraint split; two helper results establish these bounds from non-expansiveness of the dual map. The blueprint is instantiated on graph-structured optimal transport to obtain a flow-Sinkhorn algorithm for the Wasserstein-1 distance whose arithmetic complexity is O(p·diameter³/ε⁴) (up to logs). A machine-checked Lean formalization of the core statements and the graph-W₁ instantiation is supplied.
Significance. If the central claims hold, the work supplies a reusable, parameter-light framework for obtaining robust sublinear rates that remain useful when γ must be driven to zero to recover the unregularized problem. The explicit graph-W₁ instantiation yields a concrete new algorithm with a fully explicit rate, while the Lean formalization provides machine-checked verification of the blueprint and its instantiation—both of which are genuine strengths.
major comments (2)
- [§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.
- [§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.
minor comments (2)
- [Abstract] The abstract claims the complexity holds “up to logarithmic factors”; the precise logarithmic terms (arising from the number of outer iterations or from the Lean-checked constants) should be stated explicitly in the main theorem statement.
- [§2] Notation for the quotient dual norm is introduced in §2 but used heavily in §4; an early displayed definition or a short table of norms would improve readability for readers unfamiliar with the constraint-split construction.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.
Authors: We agree that an explicit delineation of the minimal conditions would improve the reusability of the blueprint. In the revised manuscript we will add a short remark immediately after Assumption 3.2 that states the precise requirements on the constraint split (linearity of the blocks and boundedness of the dual map) and on the uniform primal bound that together guarantee the O(1/k) dual rate is independent of γ. The helper lemmas for the graph case will remain unchanged, as they already satisfy these conditions. revision: yes
-
Referee: [§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.
Authors: The cubic diameter factor arises because the quotient norm is taken with respect to the natural splitting of the flow-conservation constraints on the graph; the diameter enters when bounding the dual variables in this norm via shortest-path distances. This choice is required to obtain the non-expansiveness of the dual map that underpins the helper results. While a different quotient could be considered, it would necessitate a separate non-expansiveness proof and, on general graphs, is unlikely to remove the cubic term without additional assumptions (e.g., bounded degree or planarity). We will insert a clarifying paragraph in §5.3 that explains this dependence and notes that improving the exponent remains an open direction. revision: partial
Circularity Check
No significant circularity identified
full rationale
The derivation reduces the O(1/k) dual rate to a uniform primal bound plus a quotient-norm dual bound, both obtained directly from non-expansiveness of the dual map (standard for Bregman projections). These inputs are independent of γ and are covered by an explicit machine-checked Lean formalization of the core statements and the graph-W1 instantiation. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the rate constant is derived from the norm bound rather than assumed or renamed.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Bregman projections onto convex sets are non-expansive in the dual quotient norm induced by the constraint split
- domain assumption The primal objective admits a uniform bound independent of the regularization parameter γ
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.1 (Sub-linear dual rate) ... Δ_k ≤ 8 X_γ U_γ² ∥A∥_{1→1}² / (γ k)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 4.2 ... non-expansiveness of Ψ w.r.t. ∥·∥_V1
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin.Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. 21
work page 1993
-
[2]
Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration.Advances in Neural Information Processing Systems, 30, 2017
work page 2017
-
[3]
Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien L´ eger. Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM.Advances in Neural Information Processing Systems, 35:17263–17275, 2022
work page 2022
-
[4]
A continuous model of transportation.Econometrica, 20:643–660, 1952
Martin Beckmann. A continuous model of transportation.Econometrica, 20:643–660, 1952
work page 1952
-
[5]
Jonathan Borwein, Adrian Lewis, and Roger Nussbaum. Entropy minimization, dad problems, and doubly stochastic kernels.Journal of Functional Analysis, 123(2):264–307, 1994
work page 1994
-
[6]
Lev M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967
work page 1967
-
[7]
Michael Candrall and Luc Tartar. Some relations between non expansive and order preserving maps.Proceedings of the AMS, 78(3):385–390, 1980
work page 1980
-
[8]
Guillaume Carlier. On the linear convergence of the multimarginal sinkhorn algorithm.SIAM Journal on Optimization, 32(2):786–794, 2022
work page 2022
-
[9]
Guillaume Carlier and Filippo Santambrogio. A continuous theory of traffic congestion and wardrop equilibria.Journal of Mathematical Sciences, 181(6):792–804, 2012
work page 2012
-
[10]
Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015
Yair Censor and Michal Rezaˇ c. Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015
work page 2015
-
[11]
Deeparnab Chakrabarty and Sanjeev Khanna. Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling.Mathematical Programming, 188(1):395–407, 2021
work page 2021
-
[12]
Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025
Li Chen, Rasmus Kyng, Yang Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025
work page 2025
-
[13]
Yongxin Chen, Tryphon Georgiou, and Michele Pavon. Entropic and displacement interpola- tion: A computational approach using the hilbert metric.SIAM Journal on Applied Mathe- matics, 76(6):2375–2396, 2016
work page 2016
-
[14]
L´ ena¨ ıc Chizat, Alex Delalande, and Tomas Vaˇ skeviˇ cius. Sharper exponential convergence rates for sinkhorn’s algorithm in continuous settings.Mathematical Programming, pages 1–50, 2025
work page 2025
-
[15]
L´ ena¨ ıc Chizat, Pierre Roussillon, Flavien L´ eger, Fran¸ cois-Xavier Vialard, and Gabriel Peyr´ e. Faster Wasserstein distance estimation with the Sinkhorn divergence.Advances in Neural Information Processing Systems, 33:2257–2269, 2020
work page 2020
-
[16]
Quantitative contraction rates for Sinkhorn's algorithm: beyond bounded costs and compact marginals
Giovanni Conforti, Alain Durmus, and Giacomo Greco. Quantitative contraction rates for Sinkhorn algorithm: beyond bounded costs and compact marginals.arXiv preprint arXiv:2304.04451, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Imre Csisz´ ar and G´ abor Tusn´ ady. Information geometry and alternating minimization proce- dures.Statistics & Decisions, Supplement 1:205–237, 1984. 22
work page 1984
-
[18]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013
work page 2013
-
[19]
Faster approximate lossy generalized flow via in- terior point algorithms
Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized flow via in- terior point algorithms. InProceedings of the fortieth annual ACM symposium on Theory of computing, pages 451–460, 2008
work page 2008
-
[20]
George Deligiannidis, Valentin de Bortoli, and Arnaud Doucet. Quantitative uniform stability of the iterative proportional fitting procedure.The Annals of Applied Probability, 34(1A):501– 516, 2024
work page 2024
-
[21]
William Edwards Deming and Frederick F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known.The Annals of Math- ematical Statistics, 11(4):427–444, 1940
work page 1940
-
[22]
Nested dissection meets ipms: Planar min-cost flow in nearly-linear time
Sally Dong, Yu Gao, Gramoz Goranci, Yin Tat Lee, Sushant Sachdeva, Richard Peng, and Guanghao Ye. Nested dissection meets ipms: Planar min-cost flow in nearly-linear time. Journal of the ACM, 72(4):1–75, 2025
work page 2025
-
[23]
Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Computational optimal trans- port: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In International conference on machine learning, pages 1367–1376, 2018
work page 2018
-
[24]
Stephan Eckstein. Hilbert’s projective metric for functions of bounded growth and exponential convergence of sinkhorn’s algorithm, 2025
work page 2025
-
[25]
Steven N Evans and Frederick A Matsen. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(3):569–592, 2012
work page 2012
-
[26]
Joel N. Franklin and Jens Lorenz. On the scaling of multidimensional matrices.Linear Algebra and Its Applications, 114–115:717–735, 1989
work page 1989
-
[27]
Kurt Friedrichs. On certain inequalities and characteristic symmetric bilinear forms.Mathe- matische Annalen, 115:249–290, 1938
work page 1938
-
[28]
Fast contour matching using approximate earth mover’s distance
Kristen Grauman and Trevor Darrell. Fast contour matching using approximate earth mover’s distance. InProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I. IEEE, 2004
work page 2004
-
[29]
Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach
Giacomo Greco, Maxence Noble, Giovanni Conforti, and Alain Durmus. Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach. InProceedings of Thirty Sixth Conference on Learning Theory, volume 195, pages 716–746, 2023
work page 2023
-
[30]
Bahman Kalantari, Isabella Lari, Federica Ricca, and Bruno Simeone. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm.Mathematical Programming, 112(2):371–401, 2008
work page 2008
-
[31]
From word embeddings to document distances
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational conference on machine learning, pages 957–966. PMLR, 2015. 23
work page 2015
-
[32]
Flavien L´ eger. A gradient descent perspective on Sinkhorn.Applied Mathematics & Optimiza- tion, 84(2):1843–1855, 2021
work page 2021
-
[33]
Simone Di Marino and Augusto Gerolin. An optimal transport approach for the schr¨ odinger bridge problem and convergence of sinkhorn algorithm.Journal of Scientific Computing, 85(2):27, 2020
work page 2020
-
[34]
A metric for distributions with appli- cations to image databases
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. A metric for distributions with appli- cations to image databases. InSixth international conference on computer vision (IEEE Cat. No. 98CH36271), pages 59–66. IEEE, 1998
work page 1998
-
[35]
Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015
Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Sen- babaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015
work page 2015
-
[36]
Filippo Santambrogio.Optimal transport for applied mathematicians. Springer, 2015
work page 2015
-
[37]
Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-Ricci curvature-based method to community detection in complex networks.Scientific reports, 9(1):9800, 2019
work page 2019
- [38]
-
[39]
Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014
Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014
work page 2014
-
[40]
Udny Yule. On the methods of measuring association between two attributes.Journal of the Royal Statistical Society, 75(6):579–652, 1912. A Non-expansiveness in Variation Semi-norm A.1 Topical maps and non-expansiveness We first recall a classical result of so-called “topical maps” in the nonlinear Perron–Frobenius/max–plus theory [7]. This ensures the non...
work page 1912
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.