pith. machine review for the scientific record. sign in

arxiv: 2602.01372 · v2 · submitted 2026-02-01 · 🧮 math.OC · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Robust Sublinear Convergence Rates for Iterative Bregman Projections

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:31 UTC · model grok-4.3

classification 🧮 math.OC cs.LG
keywords entropic regularizationBregman projectionsSinkhorn algorithmconvergence ratesoptimal transportWasserstein distancequotient norms
0
0 comments X

The pith

A general blueprint establishes O(1/k) dual convergence for cyclic KL projections on entropically regularized split linear programs, with the constant scaling only linearly in 1/γ.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper supplies a reusable method to prove sublinear dual convergence for alternating Bregman projections that solve entropically regularized linear programs whose constraints split into tractable blocks. It isolates the dependence on the regularization parameter γ so that the iteration count grows only linearly with 1/γ rather than worse. This mild scaling lets the same algorithm approximate the original unregularized problem by driving γ to zero without the total work exploding. The method reduces the entire proof to two verifiable inputs: a uniform bound on the primal iterates and a bound on the dual gap measured in a quotient norm induced by the constraint split. Two helper lemmas convert non-expansiveness of the dual steps into the needed quotient-norm control.

Core claim

The paper shows that robust O(1/k) dual convergence follows once a uniform primal bound and a dual bound in the quotient norm are available; non-expansiveness of the KL projection steps then yields the quotient bound automatically. Instantiating the blueprint on graph-structured transport produces a flow-Sinkhorn algorithm whose arithmetic complexity for ε-accurate transshipment cost is O(p diameter³ / ε⁴) up to logs, where p is the number of edges.

What carries the argument

The quotient dual norm induced by the constraint split, controlled via non-expansiveness of the dual iterations.

Load-bearing premise

The primal iterates stay bounded by a constant that does not grow with the regularization strength γ.

What would settle it

For a concrete split constraint such as matrix scaling or graph transshipment, run the projections while decreasing γ and check whether the number of iterations needed for fixed dual accuracy grows faster than linearly in 1/γ.

read the original abstract

Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/\gamma$, where $\gamma$ is the entropic regularization parameter. We call such rates "robust", because this mild dependence on $\gamma$ underpins favorable complexity bounds for approximating the unregularized problem via alternating KL projections. The blueprint reduces the proof to a uniform primal bound and a dual bound for a quotient norm induced by the constraint split. To make these inputs usable, we propose two helper results, which rely on the non-expansiveness of the dual iterations in this quotient dual norm. Instantiating this blueprint for graph-structured transport yields a new flow-Sinkhorn algorithm for the Wasserstein-1 distance on graphs. It achieves $\varepsilon$-additive accuracy on the transshipment cost in $O(p\,\mathrm{diameter}^3/\varepsilon^{4})$ arithmetic operations (up to logarithmic factors), where $p$ is the number of edges. We also provide a machine-checked Lean formalization of the core blueprint and its graph-$\mathrm{W}_1$ instantiation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a general blueprint for proving O(1/k) dual convergence rates for cyclic KL Bregman projections applied to entropically regularized linear programs with split constraints. The rates are robust in that the leading constant scales linearly with 1/γ (γ the regularization parameter). The blueprint reduces the proof to a uniform primal bound plus a dual bound on a quotient norm induced by the constraint split; two helper results establish these bounds from non-expansiveness of the dual map. The blueprint is instantiated on graph-structured optimal transport to obtain a flow-Sinkhorn algorithm for the Wasserstein-1 distance whose arithmetic complexity is O(p·diameter³/ε⁴) (up to logs). A machine-checked Lean formalization of the core statements and the graph-W₁ instantiation is supplied.

Significance. If the central claims hold, the work supplies a reusable, parameter-light framework for obtaining robust sublinear rates that remain useful when γ must be driven to zero to recover the unregularized problem. The explicit graph-W₁ instantiation yields a concrete new algorithm with a fully explicit rate, while the Lean formalization provides machine-checked verification of the blueprint and its instantiation—both of which are genuine strengths.

major comments (2)
  1. [§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.
  2. [§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.
minor comments (2)
  1. [Abstract] The abstract claims the complexity holds “up to logarithmic factors”; the precise logarithmic terms (arising from the number of outer iterations or from the Lean-checked constants) should be stated explicitly in the main theorem statement.
  2. [§2] Notation for the quotient dual norm is introduced in §2 but used heavily in §4; an early displayed definition or a short table of norms would improve readability for readers unfamiliar with the constraint-split construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§3] §3, Theorem 3.1 and Assumption 3.2: the O(1/k) dual rate is stated to hold once a uniform primal bound (independent of γ) is available; the paper supplies helper lemmas that verify this bound via non-expansiveness for the graph case, but the general blueprint would benefit from an explicit statement of the minimal conditions under which the primal bound remains uniform across arbitrary constraint splits.

    Authors: We agree that an explicit delineation of the minimal conditions would improve the reusability of the blueprint. In the revised manuscript we will add a short remark immediately after Assumption 3.2 that states the precise requirements on the constraint split (linearity of the blocks and boundedness of the dual map) and on the uniform primal bound that together guarantee the O(1/k) dual rate is independent of γ. The helper lemmas for the graph case will remain unchanged, as they already satisfy these conditions. revision: yes

  2. Referee: [§5.3] §5.3, complexity derivation: the final O(p·diameter³/ε⁴) bound for the flow-Sinkhorn algorithm combines the robust dual rate with a specific quotient-norm estimate; it is not immediately clear whether the cubic diameter dependence is an artifact of the chosen norm or whether a tighter analysis (perhaps via a different quotient) could reduce the exponent.

    Authors: The cubic diameter factor arises because the quotient norm is taken with respect to the natural splitting of the flow-conservation constraints on the graph; the diameter enters when bounding the dual variables in this norm via shortest-path distances. This choice is required to obtain the non-expansiveness of the dual map that underpins the helper results. While a different quotient could be considered, it would necessitate a separate non-expansiveness proof and, on general graphs, is unlikely to remove the cubic term without additional assumptions (e.g., bounded degree or planarity). We will insert a clarifying paragraph in §5.3 that explains this dependence and notes that improving the exponent remains an open direction. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation reduces the O(1/k) dual rate to a uniform primal bound plus a quotient-norm dual bound, both obtained directly from non-expansiveness of the dual map (standard for Bregman projections). These inputs are independent of γ and are covered by an explicit machine-checked Lean formalization of the core statements and the graph-W1 instantiation. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the rate constant is derived from the norm bound rather than assumed or renamed.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard convex-analysis facts about Bregman projections and KL divergence together with the non-expansiveness property in the quotient dual norm; no new free parameters or invented entities are introduced.

axioms (2)
  • standard math Bregman projections onto convex sets are non-expansive in the dual quotient norm induced by the constraint split
    Invoked to obtain the dual bound independent of γ
  • domain assumption The primal objective admits a uniform bound independent of the regularization parameter γ
    Required for the O(1/k) rate to remain robust when γ → 0

pith-pipeline@v0.9.0 · 5549 in / 1385 out tokens · 32650 ms · 2026-05-16T08:31:02.284826+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Ahuja, Thomas L

    Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin.Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. 21

  2. [2]

    Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration.Advances in Neural Information Processing Systems, 30, 2017

    Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration.Advances in Neural Information Processing Systems, 30, 2017

  3. [3]

    Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM.Advances in Neural Information Processing Systems, 35:17263–17275, 2022

    Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien L´ eger. Mirror descent with relative smoothness in measure spaces, with application to Sinkhorn and EM.Advances in Neural Information Processing Systems, 35:17263–17275, 2022

  4. [4]

    A continuous model of transportation.Econometrica, 20:643–660, 1952

    Martin Beckmann. A continuous model of transportation.Econometrica, 20:643–660, 1952

  5. [5]

    Entropy minimization, dad problems, and doubly stochastic kernels.Journal of Functional Analysis, 123(2):264–307, 1994

    Jonathan Borwein, Adrian Lewis, and Roger Nussbaum. Entropy minimization, dad problems, and doubly stochastic kernels.Journal of Functional Analysis, 123(2):264–307, 1994

  6. [6]

    Lev M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967

  7. [7]

    Some relations between non expansive and order preserving maps.Proceedings of the AMS, 78(3):385–390, 1980

    Michael Candrall and Luc Tartar. Some relations between non expansive and order preserving maps.Proceedings of the AMS, 78(3):385–390, 1980

  8. [8]

    On the linear convergence of the multimarginal sinkhorn algorithm.SIAM Journal on Optimization, 32(2):786–794, 2022

    Guillaume Carlier. On the linear convergence of the multimarginal sinkhorn algorithm.SIAM Journal on Optimization, 32(2):786–794, 2022

  9. [9]

    A continuous theory of traffic congestion and wardrop equilibria.Journal of Mathematical Sciences, 181(6):792–804, 2012

    Guillaume Carlier and Filippo Santambrogio. A continuous theory of traffic congestion and wardrop equilibria.Journal of Mathematical Sciences, 181(6):792–804, 2012

  10. [10]

    Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015

    Yair Censor and Michal Rezaˇ c. Iterative projection methods in structured optimization.Op- timization, 64(11):2343–2361, 2015

  11. [11]

    Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling.Mathematical Programming, 188(1):395–407, 2021

    Deeparnab Chakrabarty and Sanjeev Khanna. Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling.Mathematical Programming, 188(1):395–407, 2021

  12. [12]

    Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025

    Li Chen, Rasmus Kyng, Yang Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time.Journal of the ACM, 72(3):1–103, 2025

  13. [13]

    Entropic and displacement interpola- tion: A computational approach using the hilbert metric.SIAM Journal on Applied Mathe- matics, 76(6):2375–2396, 2016

    Yongxin Chen, Tryphon Georgiou, and Michele Pavon. Entropic and displacement interpola- tion: A computational approach using the hilbert metric.SIAM Journal on Applied Mathe- matics, 76(6):2375–2396, 2016

  14. [14]

    Sharper exponential convergence rates for sinkhorn’s algorithm in continuous settings.Mathematical Programming, pages 1–50, 2025

    L´ ena¨ ıc Chizat, Alex Delalande, and Tomas Vaˇ skeviˇ cius. Sharper exponential convergence rates for sinkhorn’s algorithm in continuous settings.Mathematical Programming, pages 1–50, 2025

  15. [15]

    Faster Wasserstein distance estimation with the Sinkhorn divergence.Advances in Neural Information Processing Systems, 33:2257–2269, 2020

    L´ ena¨ ıc Chizat, Pierre Roussillon, Flavien L´ eger, Fran¸ cois-Xavier Vialard, and Gabriel Peyr´ e. Faster Wasserstein distance estimation with the Sinkhorn divergence.Advances in Neural Information Processing Systems, 33:2257–2269, 2020

  16. [16]

    Quantitative contraction rates for Sinkhorn's algorithm: beyond bounded costs and compact marginals

    Giovanni Conforti, Alain Durmus, and Giacomo Greco. Quantitative contraction rates for Sinkhorn algorithm: beyond bounded costs and compact marginals.arXiv preprint arXiv:2304.04451, 2023

  17. [17]

    Information geometry and alternating minimization proce- dures.Statistics & Decisions, Supplement 1:205–237, 1984

    Imre Csisz´ ar and G´ abor Tusn´ ady. Information geometry and alternating minimization proce- dures.Statistics & Decisions, Supplement 1:205–237, 1984. 22

  18. [18]

    Sinkhorn distances: Lightspeed computation of optimal transport

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013

  19. [19]

    Faster approximate lossy generalized flow via in- terior point algorithms

    Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized flow via in- terior point algorithms. InProceedings of the fortieth annual ACM symposium on Theory of computing, pages 451–460, 2008

  20. [20]

    Quantitative uniform stability of the iterative proportional fitting procedure.The Annals of Applied Probability, 34(1A):501– 516, 2024

    George Deligiannidis, Valentin de Bortoli, and Arnaud Doucet. Quantitative uniform stability of the iterative proportional fitting procedure.The Annals of Applied Probability, 34(1A):501– 516, 2024

  21. [21]

    William Edwards Deming and Frederick F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known.The Annals of Math- ematical Statistics, 11(4):427–444, 1940

  22. [22]

    Nested dissection meets ipms: Planar min-cost flow in nearly-linear time

    Sally Dong, Yu Gao, Gramoz Goranci, Yin Tat Lee, Sushant Sachdeva, Richard Peng, and Guanghao Ye. Nested dissection meets ipms: Planar min-cost flow in nearly-linear time. Journal of the ACM, 72(4):1–75, 2025

  23. [23]

    Computational optimal trans- port: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm

    Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Computational optimal trans- port: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In International conference on machine learning, pages 1367–1376, 2018

  24. [24]

    Hilbert’s projective metric for functions of bounded growth and exponential convergence of sinkhorn’s algorithm, 2025

    Stephan Eckstein. Hilbert’s projective metric for functions of bounded growth and exponential convergence of sinkhorn’s algorithm, 2025

  25. [25]

    The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(3):569–592, 2012

    Steven N Evans and Frederick A Matsen. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples.Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(3):569–592, 2012

  26. [26]

    Franklin and Jens Lorenz

    Joel N. Franklin and Jens Lorenz. On the scaling of multidimensional matrices.Linear Algebra and Its Applications, 114–115:717–735, 1989

  27. [27]

    On certain inequalities and characteristic symmetric bilinear forms.Mathe- matische Annalen, 115:249–290, 1938

    Kurt Friedrichs. On certain inequalities and characteristic symmetric bilinear forms.Mathe- matische Annalen, 115:249–290, 1938

  28. [28]

    Fast contour matching using approximate earth mover’s distance

    Kristen Grauman and Trevor Darrell. Fast contour matching using approximate earth mover’s distance. InProceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages I–I. IEEE, 2004

  29. [29]

    Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach

    Giacomo Greco, Maxence Noble, Giovanni Conforti, and Alain Durmus. Non-asymptotic con- vergence bounds for sinkhorn iterates and their gradients: a coupling approach. InProceedings of Thirty Sixth Conference on Learning Theory, volume 195, pages 716–746, 2023

  30. [30]

    On the complexity of general matrix scaling and entropy minimization via the RAS algorithm.Mathematical Programming, 112(2):371–401, 2008

    Bahman Kalantari, Isabella Lari, Federica Ricca, and Bruno Simeone. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm.Mathematical Programming, 112(2):371–401, 2008

  31. [31]

    From word embeddings to document distances

    Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. InInternational conference on machine learning, pages 957–966. PMLR, 2015. 23

  32. [32]

    A gradient descent perspective on Sinkhorn.Applied Mathematics & Optimiza- tion, 84(2):1843–1855, 2021

    Flavien L´ eger. A gradient descent perspective on Sinkhorn.Applied Mathematics & Optimiza- tion, 84(2):1843–1855, 2021

  33. [33]

    An optimal transport approach for the schr¨ odinger bridge problem and convergence of sinkhorn algorithm.Journal of Scientific Computing, 85(2):27, 2020

    Simone Di Marino and Augusto Gerolin. An optimal transport approach for the schr¨ odinger bridge problem and convergence of sinkhorn algorithm.Journal of Scientific Computing, 85(2):27, 2020

  34. [34]

    A metric for distributions with appli- cations to image databases

    Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. A metric for distributions with appli- cations to image databases. InSixth international conference on computer vision (IEEE Cat. No. 98CH36271), pages 59–66. IEEE, 1998

  35. [35]

    Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015

    Romeil Sandhu, Tryphon Georgiou, Ed Reznik, Liangjia Zhu, Ivan Kolesov, Yasin Sen- babaoglu, and Allen Tannenbaum. Graph curvature for differentiating cancer networks.Sci- entific reports, 5(1):12323, 2015

  36. [36]

    Springer, 2015

    Filippo Santambrogio.Optimal transport for applied mathematicians. Springer, 2015

  37. [37]

    Ollivier-Ricci curvature-based method to community detection in complex networks.Scientific reports, 9(1):9800, 2019

    Jayson Sia, Edmond Jonckheere, and Paul Bogdan. Ollivier-Ricci curvature-based method to community detection in complex networks.Scientific reports, 9(1):9800, 2019

  38. [38]

    Sinkhorn

    R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matri- ces.Ann. Math. Statist., 35:876–879, 1964

  39. [39]

    Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014

    Justin Solomon, Raif Rustamov, Leonidas Guibas, and Adrian Butscher. Earth mover’s dis- tances on discrete surfaces.ACM Transactions on Graphics (ToG), 33(4):1–12, 2014

  40. [40]

    topical maps

    Udny Yule. On the methods of measuring association between two attributes.Journal of the Royal Statistical Society, 75(6):579–652, 1912. A Non-expansiveness in Variation Semi-norm A.1 Topical maps and non-expansiveness We first recall a classical result of so-called “topical maps” in the nonlinear Perron–Frobenius/max–plus theory [7]. This ensures the non...