pith. sign in

arxiv: 2506.03950 · v3 · submitted 2025-06-04 · 🧮 math.OC

Multilevel Bregman Proximal Gradient Descent

Pith reviewed 2026-05-19 11:15 UTC · model grok-4.3

classification 🧮 math.OC
keywords multilevel optimizationBregman proximal gradient descentlinear convergenceconvex optimizationrelative Lipschitz smoothnessimage reconstruction
0
0 comments X

The pith

The Multilevel Bregman Proximal Gradient Descent method achieves global linear convergence for constrained convex optimization problems with relative Lipschitz smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the Multilevel Bregman Proximal Gradient Descent method as a way to solve constrained convex optimization problems that meet a relative Lipschitz smoothness condition. It builds on classical multilevel optimization by incorporating Bregman-based geometries and handling constraints explicitly. The main contribution is a proof of global linear convergence when the method uses several coarse levels in the hierarchy. If correct, this would allow more efficient solving of large problems like those in image reconstruction by leveraging coarser approximations while maintaining fast convergence.

Core claim

The authors claim that the ML BPGD algorithm, when analyzed for multiple coarse levels, exhibits a global linear convergence rate for constrained convex problems under the relative Lipschitz smoothness assumption. This establishes the well-posedness of the multilevel framework and is supported by numerical experiments in image reconstruction.

What carries the argument

The ML BPGD method, which uses Bregman proximal gradient steps combined with multilevel corrections to accelerate convergence on constrained domains.

Load-bearing premise

The problems being solved are constrained convex optimization problems satisfying relative Lipschitz smoothness.

What would settle it

Running the algorithm on a constrained convex problem known to satisfy relative Lipschitz smoothness and observing that the convergence rate is not linear would falsify the claim.

Figures

Figures reproduced from arXiv: 2506.03950 by Stefania Petra, Yara Elshiaty.

Figure 3.1
Figure 3.1. Figure 3.1: Flowchart of Algorithm 1. Cooler and lighter colors refer to bigger function values. 3.2.1. Coarse model. At iteration k, the coarse constrained problem is constructed similarly to the unconstrained setting, while ensuring consistency between the coarse and fine feasibility sets. Specifically, we define closed and convex sets C k H ⊆ R nH , referred to as coarse constraints, such that they prolong back t… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: The reference images for the three experi￾ments. Left: The Crater Tycho on the Moon, taken by the Hubble Space Telescope, https://science.nasa.gov/image-detail/ tycho-crater/, for Poisson-noisy deconvolution. Center: The Walnut Phantom, [21], for tomographic reconstruction. Right: Jumping Mario, for D-optimal design for tomography. 4.1. Deconvolution. In astronomical image processing, the goal is to reco… view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Normalized function value vs CPU time (in sec￾onds). Deblurring performance across the various blur and noise condi￾tions specified in [PITH_FULL_IMAGE:figures/full_fig_p016_4_2.png] view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Deblurring results for the Crater Tycho image under varying noise and blur levels, as specified in [PITH_FULL_IMAGE:figures/full_fig_p017_4_3.png] view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: Comparison of multilevel vs. single-level BPGD for tomographic reconstruction. Results are shown for the 1023 × 1023 walnut phantom from 20% undersampled data. We use three discretization levels. Left: Normalized objective function values vs. cumulative CPU time (in seconds). The multilevel BPGD (ML-BPGD, blue, with markers at each multilevel iteration) rapidly catches up to the already fast single-level… view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: Results for the D-optimal problem for the experi￾mental setup for tomographic reconstruction. Left: Comparison of Multilevel vs. Single-Level BPGD. The ML-BPGD variant outper￾forms its single-level counterpart initially. Right: Reconstruction of pixelated mario using 15 equidistant angles (upper left) and the 15 Fisher-information maximizing angles (upper right). The bottom row depicts the images downsam… view at source ↗
read the original abstract

We present the Multilevel Bregman Proximal Gradient Descent (ML BPGD) method, a novel multilevel optimization framework tailored to constrained convex problems with relative Lipschitz smoothness. Our approach extends the classical multilevel optimization framework (MGOPT) to handle Bregman-based geometries and constrained domains. We provide a rigorous analysis of ML BPGD for multiple coarse levels and establish a global linear convergence rate. We demonstrate the effectiveness of ML BPGD in the context of image reconstruction, providing theoretical guarantees for the well-posedness of the multilevel framework and validating its performance through numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Multilevel Bregman Proximal Gradient Descent (ML BPGD) algorithm, extending the classical MGOPT multilevel framework to Bregman proximal gradient methods on constrained convex problems that satisfy relative Lipschitz smoothness. It claims to deliver a rigorous convergence analysis establishing a global linear rate for hierarchies with arbitrarily many coarse levels, along with well-posedness guarantees and numerical validation on image-reconstruction tasks.

Significance. Should the uniform linear-rate analysis hold, the work would meaningfully advance multilevel optimization by accommodating Bregman geometries and explicit constraints, potentially yielding faster practical solvers for large-scale convex problems where Euclidean assumptions fail.

major comments (1)
  1. [Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.
minor comments (2)
  1. [Section 2] Notation for the Bregman distance and the relative smoothness modulus should be introduced with explicit definitions before first use to aid readability.
  2. [Numerical results] The numerical experiments section would benefit from a table comparing iteration counts or CPU time against single-level BPGD and standard MGOPT baselines for the same tolerance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying this important point about the uniformity of the linear convergence rate. We address the concern directly below and are prepared to revise the exposition for greater clarity.

read point-by-point responses
  1. Referee: [Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.

    Authors: We appreciate the referee highlighting the need for an explicit uniformity argument. In the manuscript, Assumption 2.1 posits that the objective satisfies relative Lipschitz smoothness with a constant L that is independent of the discretization level; this is inherited by all coarse-grid problems through the standard restriction/prolongation operators used to construct the hierarchy (see the paragraph following Assumption 2.1 and the definition of the coarse-grid Bregman proximal operators in Section 2.3). Theorem 3.2 then proves global linear convergence by induction on the number of levels, showing that the per-level contraction factor is bounded by a fixed ρ < 1 that depends only on L, the strong convexity modulus, and the step-size parameters, none of which degrade with additional coarse levels. The composite rate therefore remains linear with a level-independent factor. We acknowledge that the current write-up could state this uniformity more explicitly and will add a short clarifying remark (and a reference back to Assumption 2.1) immediately before the induction in the revised Section 3. revision: partial

Circularity Check

0 steps flagged

No circularity: convergence rate derived from independent analysis

full rationale

The paper presents a rigorous analysis establishing global linear convergence for ML BPGD on constrained convex problems under relative Lipschitz smoothness, extending MGOPT to Bregman geometries. No quoted steps reduce the claimed rate to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is positioned as an outcome of the proof rather than equivalent to its inputs by construction. The derivation remains self-contained against external benchmarks, with the skeptic concern addressing proof validity rather than circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of relative Lipschitz smoothness for the objective and the well-posedness of the multilevel extension to constrained Bregman settings; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption The optimization problems are constrained convex problems satisfying relative Lipschitz smoothness.
    This condition is explicitly stated as the setting for which ML BPGD is tailored and analyzed.

pith-pipeline@v0.9.0 · 5613 in / 1189 out tokens · 58083 ms · 2026-05-19T11:15:20.264237+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Andersen and A.C

    A.H. Andersen and A.C. Kak. Simultaneous algebraic reconstruction technique (sart): A superior implementation of the art algorithm.Ultrasonic Imaging, 6(1):81– 94, 1984

  2. [2]

    Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization

    Andersen Ang, Hans De Sterck, and Stephen Vavasis. Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization. SIAM J. Optim., 34(3):2788–2820, 2024

  3. [3]

    An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

    Callum Atkinson and Julio Soria. An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

  4. [4]

    Corwin L. Atwood. Optimal and efficient designs of experiments.Ann. Math. Statist., 40:1570–1602, 1969

  5. [5]

    Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM

    Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien Léger. Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM. In Advances in Neural Information Processing Systems, 2022

  6. [6]

    Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang

    Heinz H. Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity.J. Optim. Theory Appl., 182(3):1068–1087, 2019

  7. [7]

    Bauschke, Jérôme Bolte, and Marc Teboulle

    Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Math. Oper. Res., 42(2):330–348, 2017

  8. [8]

    Bauschke and Jonathan M

    Heinz H. Bauschke and Jonathan M. Borwein. Legendre functions and the method of random Bregman projections.J. Convex Anal., 4(1):27–67, 1997

  9. [9]

    Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper

    Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper. Res. Lett., 31(3):167–175, 2003

  10. [10]

    Briggs, Van Emden Henson, and Steve F

    William L. Briggs, Van Emden Henson, and Steve F. McCormick.A multigrid tutorial. SIAM, Philadelphia, PA, second edition, 2000

  11. [11]

    Convergence analysis of a proximal-like min- imization algorithm using Bregman functions

    Gong Chen and Marc Teboulle. Convergence analysis of a proximal-like min- imization algorithm using Bregman functions. SIAM J. Optim., 3(3):538–543, 1993. 26 YARA ELSHIATY AND STEF ANIA PETRA

  12. [12]

    Conn, Nicholas I

    Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint.Trust-region methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, PA; MPS, Philadelphia, PA, 2000

  13. [13]

    Taylor, Alexandre d’Aspremont, and Jérôme Bolte

    Radu-Alexandru Dragomir, Adrien B. Taylor, Alexandre d’Aspremont, and Jérôme Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1-2):41–83, 2022

  14. [14]

    Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography

    Hamid Fathi, Alexander Skorikov, and Tristan van Leeuwen. Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography. InScale Space and Variational Methods in Computer Vision, pages 123–135. Springer Nature Switzerland, 2025

  15. [15]

    Gelman and J

    E. Gelman and J. Mandel. On multilevel iterative methods for optimization problems. Math. Programming, 48(1):1–17, 1990

  16. [16]

    Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

    Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

  17. [17]

    Exponentiated gradient meets gradient descent

    Udaya Ghai, Elad Hazan, and Yoram Singer. Exponentiated gradient meets gradient descent. InProceedings of the 31st International Conference on Algorithmic Learning Theory, volume 117 ofProceedings of Machine Learning Research, pages 386–407. PMLR, 2020

  18. [18]

    Toint, and Melissa Weber-Mendon¸ca

    Serge Gratton, Mélodie Mouffe, Philippe L. Toint, and Melissa Weber-Mendon¸ca. A recursive l∞-trust-region method for bound-constrained nonlinear optimization. IMA J. Numer. Anal., 28(4):827–861, 2008

  19. [19]

    Springer, 1985

    Wolfgang Hackbusch.Multi-Grid Methods and Applications. Springer, 1985

  20. [20]

    Haldar and Daeun Kim

    Justin P. Haldar and Daeun Kim. Oedipus: An experiment design framework for sparsity-constrained mri.IEEE Transactions on Medical Imaging, 38:1545–1558, 2018

  21. [21]

    Tomographic x-ray data of a walnut, 2015

    Keijo Hämäläinen, Lauri Harhanen, Aki Kallonen, Antti Kujanpää, Esa Niemi, and Samuli Siltanen. Tomographic x-ray data of a walnut, 2015

  22. [22]

    Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput

    Filip Hanzely, Peter Richtárik, and Lin Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput. Optim. Appl., 79(2):405–440, 2021

  23. [23]

    MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization

    Vahan Hovhannisyan, Panos Parpas, and Stefanos Zafeiriou. MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. SIAM J. Imaging Sci., 9(4):1829–1857, 2016

  24. [24]

    IML FISTA: a multilevel framework for inexact and inertial forward-backward

    Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, and Paulo Gon¸calves. IML FISTA: a multilevel framework for inexact and inertial forward-backward. Applica- tion to image restoration.SIAM J. Imaging Sci., 17(3):1347–1376, 2024

  25. [25]

    Freund, and Yurii Nesterov

    Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex opti- mization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

  26. [26]

    Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl

    Sebastian Müller, Stefania Petra, and Matthias Zisler. Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl. Funct. Anal., 8(3):855–880, 2023

  27. [27]

    Stephen G. Nash. A multigrid approach to discretized optimization problems. Optim. Methods Softw., 14(1-2):99–116, 2000

  28. [28]

    A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson, A Wiley-Interscience Publication. MULTILEVEL BREGMAN PROXIMAL GRADIENT DESCENT 27

  29. [29]

    A multilevel proximal gradient algorithm for a class of composite optimization problems

    Panos Parpas. A multilevel proximal gradient algorithm for a class of composite optimization problems. SIAM J. Sci. Comput., 39(5):S681–S701, 2017

  30. [30]

    Petra, C

    S. Petra, C. Schnörr, F. Becker, and F. Lenzen. B-SMART: Bregman-Based First- Order Algorithms for Non-Negative Compressed Sensing Problems. InProc. SSVM, LNCS, volume 7893, pages 110–124. Springer, 2013

  31. [31]

    The information geometry of mirror descent

    Garvesh Raskutti and Sayan Mukherjee. The information geometry of mirror descent. IEEE Trans. Inform. Theory, 61(3):1451–1457, 2015

  32. [32]

    Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J

    Maren Raus, Yara Elshiaty, and Stefania Petra. Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J. Appl. Numer. Optim., 6(1):1–40, 2024

  33. [33]

    Tyrrell Rockafellar.Convex analysis

    R. Tyrrell Rockafellar.Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997

  34. [34]

    A simplified view of first order methods for optimization.Math

    Marc Teboulle. A simplified view of first order methods for optimization.Math. Program., 170(1):67–96, 2018

  35. [35]

    Academic Press, 2001

    Ulrich Trottenberg, Cornelis Oosterlee, and Anton Schüller.Multigrid. Academic Press, 2001

  36. [36]

    Vardi, L

    Y. Vardi, L. Shepp, and L. Kaufman. A Statistical Model for Positron Emission Tomography. Journal of the American Statistical Association, 80:8–37, 1985

  37. [37]

    Wen and D

    Z. Wen and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM J. Optim., 20(3):1478–1503, 2009