Multilevel Bregman Proximal Gradient Descent
Pith reviewed 2026-05-19 11:15 UTC · model grok-4.3
The pith
The Multilevel Bregman Proximal Gradient Descent method achieves global linear convergence for constrained convex optimization problems with relative Lipschitz smoothness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that the ML BPGD algorithm, when analyzed for multiple coarse levels, exhibits a global linear convergence rate for constrained convex problems under the relative Lipschitz smoothness assumption. This establishes the well-posedness of the multilevel framework and is supported by numerical experiments in image reconstruction.
What carries the argument
The ML BPGD method, which uses Bregman proximal gradient steps combined with multilevel corrections to accelerate convergence on constrained domains.
Load-bearing premise
The problems being solved are constrained convex optimization problems satisfying relative Lipschitz smoothness.
What would settle it
Running the algorithm on a constrained convex problem known to satisfy relative Lipschitz smoothness and observing that the convergence rate is not linear would falsify the claim.
Figures
read the original abstract
We present the Multilevel Bregman Proximal Gradient Descent (ML BPGD) method, a novel multilevel optimization framework tailored to constrained convex problems with relative Lipschitz smoothness. Our approach extends the classical multilevel optimization framework (MGOPT) to handle Bregman-based geometries and constrained domains. We provide a rigorous analysis of ML BPGD for multiple coarse levels and establish a global linear convergence rate. We demonstrate the effectiveness of ML BPGD in the context of image reconstruction, providing theoretical guarantees for the well-posedness of the multilevel framework and validating its performance through numerical experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Multilevel Bregman Proximal Gradient Descent (ML BPGD) algorithm, extending the classical MGOPT multilevel framework to Bregman proximal gradient methods on constrained convex problems that satisfy relative Lipschitz smoothness. It claims to deliver a rigorous convergence analysis establishing a global linear rate for hierarchies with arbitrarily many coarse levels, along with well-posedness guarantees and numerical validation on image-reconstruction tasks.
Significance. Should the uniform linear-rate analysis hold, the work would meaningfully advance multilevel optimization by accommodating Bregman geometries and explicit constraints, potentially yielding faster practical solvers for large-scale convex problems where Euclidean assumptions fail.
major comments (1)
- [Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.
minor comments (2)
- [Section 2] Notation for the Bregman distance and the relative smoothness modulus should be introduced with explicit definitions before first use to aid readability.
- [Numerical results] The numerical experiments section would benefit from a table comparing iteration counts or CPU time against single-level BPGD and standard MGOPT baselines for the same tolerance.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for identifying this important point about the uniformity of the linear convergence rate. We address the concern directly below and are prepared to revise the exposition for greater clarity.
read point-by-point responses
-
Referee: [Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.
Authors: We appreciate the referee highlighting the need for an explicit uniformity argument. In the manuscript, Assumption 2.1 posits that the objective satisfies relative Lipschitz smoothness with a constant L that is independent of the discretization level; this is inherited by all coarse-grid problems through the standard restriction/prolongation operators used to construct the hierarchy (see the paragraph following Assumption 2.1 and the definition of the coarse-grid Bregman proximal operators in Section 2.3). Theorem 3.2 then proves global linear convergence by induction on the number of levels, showing that the per-level contraction factor is bounded by a fixed ρ < 1 that depends only on L, the strong convexity modulus, and the step-size parameters, none of which degrade with additional coarse levels. The composite rate therefore remains linear with a level-independent factor. We acknowledge that the current write-up could state this uniformity more explicitly and will add a short clarifying remark (and a reference back to Assumption 2.1) immediately before the induction in the revised Section 3. revision: partial
Circularity Check
No circularity: convergence rate derived from independent analysis
full rationale
The paper presents a rigorous analysis establishing global linear convergence for ML BPGD on constrained convex problems under relative Lipschitz smoothness, extending MGOPT to Bregman geometries. No quoted steps reduce the claimed rate to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is positioned as an outcome of the proof rather than equivalent to its inputs by construction. The derivation remains self-contained against external benchmarks, with the skeptic concern addressing proof validity rather than circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The optimization problems are constrained convex problems satisfying relative Lipschitz smoothness.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We provide a rigorous analysis of ML BPGD for multiple coarse levels and establish a global linear convergence rate... under relative Lipschitz smoothness
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Df(x,y) ≤ L Dφ(x,y) ... Assumption B (Polyak-Łojasiewicz-like condition)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A.H. Andersen and A.C. Kak. Simultaneous algebraic reconstruction technique (sart): A superior implementation of the art algorithm.Ultrasonic Imaging, 6(1):81– 94, 1984
work page 1984
-
[2]
Andersen Ang, Hans De Sterck, and Stephen Vavasis. Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization. SIAM J. Optim., 34(3):2788–2820, 2024
work page 2024
-
[3]
Callum Atkinson and Julio Soria. An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009
work page 2009
-
[4]
Corwin L. Atwood. Optimal and efficient designs of experiments.Ann. Math. Statist., 40:1570–1602, 1969
work page 1969
-
[5]
Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM
Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien Léger. Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM. In Advances in Neural Information Processing Systems, 2022
work page 2022
-
[6]
Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang
Heinz H. Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity.J. Optim. Theory Appl., 182(3):1068–1087, 2019
work page 2019
-
[7]
Bauschke, Jérôme Bolte, and Marc Teboulle
Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Math. Oper. Res., 42(2):330–348, 2017
work page 2017
-
[8]
Heinz H. Bauschke and Jonathan M. Borwein. Legendre functions and the method of random Bregman projections.J. Convex Anal., 4(1):27–67, 1997
work page 1997
-
[9]
Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper
Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper. Res. Lett., 31(3):167–175, 2003
work page 2003
-
[10]
Briggs, Van Emden Henson, and Steve F
William L. Briggs, Van Emden Henson, and Steve F. McCormick.A multigrid tutorial. SIAM, Philadelphia, PA, second edition, 2000
work page 2000
-
[11]
Convergence analysis of a proximal-like min- imization algorithm using Bregman functions
Gong Chen and Marc Teboulle. Convergence analysis of a proximal-like min- imization algorithm using Bregman functions. SIAM J. Optim., 3(3):538–543, 1993. 26 YARA ELSHIATY AND STEF ANIA PETRA
work page 1993
-
[12]
Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint.Trust-region methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, PA; MPS, Philadelphia, PA, 2000
work page 2000
-
[13]
Taylor, Alexandre d’Aspremont, and Jérôme Bolte
Radu-Alexandru Dragomir, Adrien B. Taylor, Alexandre d’Aspremont, and Jérôme Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1-2):41–83, 2022
work page 2022
-
[14]
Hamid Fathi, Alexander Skorikov, and Tristan van Leeuwen. Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography. InScale Space and Variational Methods in Computer Vision, pages 123–135. Springer Nature Switzerland, 2025
work page 2025
-
[15]
E. Gelman and J. Mandel. On multilevel iterative methods for optimization problems. Math. Programming, 48(1):1–17, 1990
work page 1990
-
[16]
Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984
work page 1984
-
[17]
Exponentiated gradient meets gradient descent
Udaya Ghai, Elad Hazan, and Yoram Singer. Exponentiated gradient meets gradient descent. InProceedings of the 31st International Conference on Algorithmic Learning Theory, volume 117 ofProceedings of Machine Learning Research, pages 386–407. PMLR, 2020
work page 2020
-
[18]
Toint, and Melissa Weber-Mendon¸ca
Serge Gratton, Mélodie Mouffe, Philippe L. Toint, and Melissa Weber-Mendon¸ca. A recursive l∞-trust-region method for bound-constrained nonlinear optimization. IMA J. Numer. Anal., 28(4):827–861, 2008
work page 2008
-
[19]
Wolfgang Hackbusch.Multi-Grid Methods and Applications. Springer, 1985
work page 1985
-
[20]
Justin P. Haldar and Daeun Kim. Oedipus: An experiment design framework for sparsity-constrained mri.IEEE Transactions on Medical Imaging, 38:1545–1558, 2018
work page 2018
-
[21]
Tomographic x-ray data of a walnut, 2015
Keijo Hämäläinen, Lauri Harhanen, Aki Kallonen, Antti Kujanpää, Esa Niemi, and Samuli Siltanen. Tomographic x-ray data of a walnut, 2015
work page 2015
-
[22]
Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput
Filip Hanzely, Peter Richtárik, and Lin Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput. Optim. Appl., 79(2):405–440, 2021
work page 2021
-
[23]
Vahan Hovhannisyan, Panos Parpas, and Stefanos Zafeiriou. MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. SIAM J. Imaging Sci., 9(4):1829–1857, 2016
work page 2016
-
[24]
IML FISTA: a multilevel framework for inexact and inertial forward-backward
Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, and Paulo Gon¸calves. IML FISTA: a multilevel framework for inexact and inertial forward-backward. Applica- tion to image restoration.SIAM J. Imaging Sci., 17(3):1347–1376, 2024
work page 2024
-
[25]
Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex opti- mization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018
work page 2018
-
[26]
Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl
Sebastian Müller, Stefania Petra, and Matthias Zisler. Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl. Funct. Anal., 8(3):855–880, 2023
work page 2023
-
[27]
Stephen G. Nash. A multigrid approach to discretized optimization problems. Optim. Methods Softw., 14(1-2):99–116, 2000
work page 2000
-
[28]
A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson, A Wiley-Interscience Publication. MULTILEVEL BREGMAN PROXIMAL GRADIENT DESCENT 27
work page 1983
-
[29]
A multilevel proximal gradient algorithm for a class of composite optimization problems
Panos Parpas. A multilevel proximal gradient algorithm for a class of composite optimization problems. SIAM J. Sci. Comput., 39(5):S681–S701, 2017
work page 2017
- [30]
-
[31]
The information geometry of mirror descent
Garvesh Raskutti and Sayan Mukherjee. The information geometry of mirror descent. IEEE Trans. Inform. Theory, 61(3):1451–1457, 2015
work page 2015
-
[32]
Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J
Maren Raus, Yara Elshiaty, and Stefania Petra. Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J. Appl. Numer. Optim., 6(1):1–40, 2024
work page 2024
-
[33]
Tyrrell Rockafellar.Convex analysis
R. Tyrrell Rockafellar.Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997
work page 1997
-
[34]
A simplified view of first order methods for optimization.Math
Marc Teboulle. A simplified view of first order methods for optimization.Math. Program., 170(1):67–96, 2018
work page 2018
-
[35]
Ulrich Trottenberg, Cornelis Oosterlee, and Anton Schüller.Multigrid. Academic Press, 2001
work page 2001
- [36]
- [37]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.