Multilevel Bregman Proximal Gradient Descent

Stefania Petra; Yara Elshiaty

arxiv: 2506.03950 · v3 · submitted 2025-06-04 · 🧮 math.OC

Multilevel Bregman Proximal Gradient Descent

Yara Elshiaty , Stefania Petra This is my paper

Pith reviewed 2026-05-19 11:15 UTC · model grok-4.3

classification 🧮 math.OC

keywords multilevel optimizationBregman proximal gradient descentlinear convergenceconvex optimizationrelative Lipschitz smoothnessimage reconstruction

0 comments

The pith

The Multilevel Bregman Proximal Gradient Descent method achieves global linear convergence for constrained convex optimization problems with relative Lipschitz smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the Multilevel Bregman Proximal Gradient Descent method as a way to solve constrained convex optimization problems that meet a relative Lipschitz smoothness condition. It builds on classical multilevel optimization by incorporating Bregman-based geometries and handling constraints explicitly. The main contribution is a proof of global linear convergence when the method uses several coarse levels in the hierarchy. If correct, this would allow more efficient solving of large problems like those in image reconstruction by leveraging coarser approximations while maintaining fast convergence.

Core claim

The authors claim that the ML BPGD algorithm, when analyzed for multiple coarse levels, exhibits a global linear convergence rate for constrained convex problems under the relative Lipschitz smoothness assumption. This establishes the well-posedness of the multilevel framework and is supported by numerical experiments in image reconstruction.

What carries the argument

The ML BPGD method, which uses Bregman proximal gradient steps combined with multilevel corrections to accelerate convergence on constrained domains.

Load-bearing premise

The problems being solved are constrained convex optimization problems satisfying relative Lipschitz smoothness.

What would settle it

Running the algorithm on a constrained convex problem known to satisfy relative Lipschitz smoothness and observing that the convergence rate is not linear would falsify the claim.

Figures

Figures reproduced from arXiv: 2506.03950 by Stefania Petra, Yara Elshiaty.

**Figure 3.1.** Figure 3.1: Flowchart of Algorithm 1. Cooler and lighter colors refer to bigger function values. 3.2.1. Coarse model. At iteration k, the coarse constrained problem is constructed similarly to the unconstrained setting, while ensuring consistency between the coarse and fine feasibility sets. Specifically, we define closed and convex sets C k H ⊆ R nH , referred to as coarse constraints, such that they prolong back t… view at source ↗

**Figure 4.1.** Figure 4.1: The reference images for the three experiments. Left: The Crater Tycho on the Moon, taken by the Hubble Space Telescope, https://science.nasa.gov/image-detail/ tycho-crater/, for Poisson-noisy deconvolution. Center: The Walnut Phantom, [21], for tomographic reconstruction. Right: Jumping Mario, for D-optimal design for tomography. 4.1. Deconvolution. In astronomical image processing, the goal is to reco… view at source ↗

**Figure 4.2.** Figure 4.2: Normalized function value vs CPU time (in seconds). Deblurring performance across the various blur and noise conditions specified in [PITH_FULL_IMAGE:figures/full_fig_p016_4_2.png] view at source ↗

**Figure 4.3.** Figure 4.3: Deblurring results for the Crater Tycho image under varying noise and blur levels, as specified in [PITH_FULL_IMAGE:figures/full_fig_p017_4_3.png] view at source ↗

**Figure 4.4.** Figure 4.4: Comparison of multilevel vs. single-level BPGD for tomographic reconstruction. Results are shown for the 1023 × 1023 walnut phantom from 20% undersampled data. We use three discretization levels. Left: Normalized objective function values vs. cumulative CPU time (in seconds). The multilevel BPGD (ML-BPGD, blue, with markers at each multilevel iteration) rapidly catches up to the already fast single-level… view at source ↗

**Figure 4.5.** Figure 4.5: Results for the D-optimal problem for the experimental setup for tomographic reconstruction. Left: Comparison of Multilevel vs. Single-Level BPGD. The ML-BPGD variant outperforms its single-level counterpart initially. Right: Reconstruction of pixelated mario using 15 equidistant angles (upper left) and the 15 Fisher-information maximizing angles (upper right). The bottom row depicts the images downsam… view at source ↗

read the original abstract

We present the Multilevel Bregman Proximal Gradient Descent (ML BPGD) method, a novel multilevel optimization framework tailored to constrained convex problems with relative Lipschitz smoothness. Our approach extends the classical multilevel optimization framework (MGOPT) to handle Bregman-based geometries and constrained domains. We provide a rigorous analysis of ML BPGD for multiple coarse levels and establish a global linear convergence rate. We demonstrate the effectiveness of ML BPGD in the context of image reconstruction, providing theoretical guarantees for the well-posedness of the multilevel framework and validating its performance through numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a multilevel Bregman proximal gradient method with claimed global linear convergence on constrained relatively smooth problems, but the rate may degrade with added levels unless smoothness constants are uniformly controlled.

read the letter

The core contribution is an extension of the MGOPT multilevel framework to Bregman proximal gradient descent on constrained convex problems that satisfy relative Lipschitz smoothness. They adapt the coarse-grid correction to Bregman distances and prove global linear convergence for multiple levels, then show results on image reconstruction tasks. That combination of Bregman geometry, constraints, and multilevel structure is the actual new piece rather than a routine tweak of existing proximal methods. The numerical examples appear to confirm practical speed-ups, and the well-posedness argument for the hierarchy is a useful supporting step. Credit is due for moving beyond Euclidean settings while keeping the analysis explicit. The stress-test concern about uniformity across levels is worth checking in the proofs. If the relative smoothness constants grow with coarsening and are not bounded independently of the number of levels, the contraction factor can approach 1, turning the claimed global linear rate into something that depends on the depth of the hierarchy. The abstract states the rate holds for multiple coarse levels, but without seeing the explicit bound or induction step it is hard to tell whether they close this gap or simply assume the constants stay controlled. Minor gaps like missing parameter counts in the experiments or limited comparison to single-level Bregman methods are easy to fix and do not touch the main claim. This work is aimed at researchers in convex optimization who already use multilevel or Bregman techniques for imaging or large-scale problems. A reader who cares about convergence rates in non-Euclidean geometries will find concrete material to build on. It is solid enough to deserve a serious referee rather than a desk reject; the combination of theory and application is worth the time even if revisions are needed on the uniformity of the rate.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Multilevel Bregman Proximal Gradient Descent (ML BPGD) algorithm, extending the classical MGOPT multilevel framework to Bregman proximal gradient methods on constrained convex problems that satisfy relative Lipschitz smoothness. It claims to deliver a rigorous convergence analysis establishing a global linear rate for hierarchies with arbitrarily many coarse levels, along with well-posedness guarantees and numerical validation on image-reconstruction tasks.

Significance. Should the uniform linear-rate analysis hold, the work would meaningfully advance multilevel optimization by accommodating Bregman geometries and explicit constraints, potentially yielding faster practical solvers for large-scale convex problems where Euclidean assumptions fail.

major comments (1)

[Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.

minor comments (2)

[Section 2] Notation for the Bregman distance and the relative smoothness modulus should be introduced with explicit definitions before first use to aid readability.
[Numerical results] The numerical experiments section would benefit from a table comparing iteration counts or CPU time against single-level BPGD and standard MGOPT baselines for the same tolerance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for identifying this important point about the uniformity of the linear convergence rate. We address the concern directly below and are prepared to revise the exposition for greater clarity.

read point-by-point responses

Referee: [Abstract and main convergence theorem] The central claim of a global linear convergence rate independent of the number of coarse levels (stated in the abstract and presumably proved in the main analysis section) requires that the relative smoothness constants remain bounded uniformly across the hierarchy or that the coarse-grid contraction factor stays strictly below 1 without degradation. If the proof instead treats each level's smoothness parameter as independent without supplying an explicit uniform upper bound or scaling argument, the composite contraction can approach 1 as levels are added, reducing the guarantee to level-dependent or only locally linear convergence. This is load-bearing for the multilevel claim.

Authors: We appreciate the referee highlighting the need for an explicit uniformity argument. In the manuscript, Assumption 2.1 posits that the objective satisfies relative Lipschitz smoothness with a constant L that is independent of the discretization level; this is inherited by all coarse-grid problems through the standard restriction/prolongation operators used to construct the hierarchy (see the paragraph following Assumption 2.1 and the definition of the coarse-grid Bregman proximal operators in Section 2.3). Theorem 3.2 then proves global linear convergence by induction on the number of levels, showing that the per-level contraction factor is bounded by a fixed ρ < 1 that depends only on L, the strong convexity modulus, and the step-size parameters, none of which degrade with additional coarse levels. The composite rate therefore remains linear with a level-independent factor. We acknowledge that the current write-up could state this uniformity more explicitly and will add a short clarifying remark (and a reference back to Assumption 2.1) immediately before the induction in the revised Section 3. revision: partial

Circularity Check

0 steps flagged

No circularity: convergence rate derived from independent analysis

full rationale

The paper presents a rigorous analysis establishing global linear convergence for ML BPGD on constrained convex problems under relative Lipschitz smoothness, extending MGOPT to Bregman geometries. No quoted steps reduce the claimed rate to a fitted parameter, self-definition, or load-bearing self-citation chain; the result is positioned as an outcome of the proof rather than equivalent to its inputs by construction. The derivation remains self-contained against external benchmarks, with the skeptic concern addressing proof validity rather than circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of relative Lipschitz smoothness for the objective and the well-posedness of the multilevel extension to constrained Bregman settings; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The optimization problems are constrained convex problems satisfying relative Lipschitz smoothness.
This condition is explicitly stated as the setting for which ML BPGD is tailored and analyzed.

pith-pipeline@v0.9.0 · 5613 in / 1189 out tokens · 58083 ms · 2026-05-19T11:15:20.264237+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide a rigorous analysis of ML BPGD for multiple coarse levels and establish a global linear convergence rate... under relative Lipschitz smoothness
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Df(x,y) ≤ L Dφ(x,y) ... Assumption B (Polyak-Łojasiewicz-like condition)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Andersen and A.C

A.H. Andersen and A.C. Kak. Simultaneous algebraic reconstruction technique (sart): A superior implementation of the art algorithm.Ultrasonic Imaging, 6(1):81– 94, 1984

work page 1984
[2]

Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization

Andersen Ang, Hans De Sterck, and Stephen Vavasis. Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization. SIAM J. Optim., 34(3):2788–2820, 2024

work page 2024
[3]

An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

Callum Atkinson and Julio Soria. An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

work page 2009
[4]

Corwin L. Atwood. Optimal and efficient designs of experiments.Ann. Math. Statist., 40:1570–1602, 1969

work page 1969
[5]

Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM

Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien Léger. Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM. In Advances in Neural Information Processing Systems, 2022

work page 2022
[6]

Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang

Heinz H. Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity.J. Optim. Theory Appl., 182(3):1068–1087, 2019

work page 2019
[7]

Bauschke, Jérôme Bolte, and Marc Teboulle

Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Math. Oper. Res., 42(2):330–348, 2017

work page 2017
[8]

Bauschke and Jonathan M

Heinz H. Bauschke and Jonathan M. Borwein. Legendre functions and the method of random Bregman projections.J. Convex Anal., 4(1):27–67, 1997

work page 1997
[9]

Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper

Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper. Res. Lett., 31(3):167–175, 2003

work page 2003
[10]

Briggs, Van Emden Henson, and Steve F

William L. Briggs, Van Emden Henson, and Steve F. McCormick.A multigrid tutorial. SIAM, Philadelphia, PA, second edition, 2000

work page 2000
[11]

Convergence analysis of a proximal-like min- imization algorithm using Bregman functions

Gong Chen and Marc Teboulle. Convergence analysis of a proximal-like min- imization algorithm using Bregman functions. SIAM J. Optim., 3(3):538–543, 1993. 26 YARA ELSHIATY AND STEF ANIA PETRA

work page 1993
[12]

Conn, Nicholas I

Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint.Trust-region methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, PA; MPS, Philadelphia, PA, 2000

work page 2000
[13]

Taylor, Alexandre d’Aspremont, and Jérôme Bolte

Radu-Alexandru Dragomir, Adrien B. Taylor, Alexandre d’Aspremont, and Jérôme Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1-2):41–83, 2022

work page 2022
[14]

Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography

Hamid Fathi, Alexander Skorikov, and Tristan van Leeuwen. Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography. InScale Space and Variational Methods in Computer Vision, pages 123–135. Springer Nature Switzerland, 2025

work page 2025
[15]

Gelman and J

E. Gelman and J. Mandel. On multilevel iterative methods for optimization problems. Math. Programming, 48(1):1–17, 1990

work page 1990
[16]

Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

work page 1984
[17]

Exponentiated gradient meets gradient descent

Udaya Ghai, Elad Hazan, and Yoram Singer. Exponentiated gradient meets gradient descent. InProceedings of the 31st International Conference on Algorithmic Learning Theory, volume 117 ofProceedings of Machine Learning Research, pages 386–407. PMLR, 2020

work page 2020
[18]

Toint, and Melissa Weber-Mendon¸ca

Serge Gratton, Mélodie Mouffe, Philippe L. Toint, and Melissa Weber-Mendon¸ca. A recursive l∞-trust-region method for bound-constrained nonlinear optimization. IMA J. Numer. Anal., 28(4):827–861, 2008

work page 2008
[19]

Springer, 1985

Wolfgang Hackbusch.Multi-Grid Methods and Applications. Springer, 1985

work page 1985
[20]

Haldar and Daeun Kim

Justin P. Haldar and Daeun Kim. Oedipus: An experiment design framework for sparsity-constrained mri.IEEE Transactions on Medical Imaging, 38:1545–1558, 2018

work page 2018
[21]

Tomographic x-ray data of a walnut, 2015

Keijo Hämäläinen, Lauri Harhanen, Aki Kallonen, Antti Kujanpää, Esa Niemi, and Samuli Siltanen. Tomographic x-ray data of a walnut, 2015

work page 2015
[22]

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput

Filip Hanzely, Peter Richtárik, and Lin Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput. Optim. Appl., 79(2):405–440, 2021

work page 2021
[23]

MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization

Vahan Hovhannisyan, Panos Parpas, and Stefanos Zafeiriou. MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. SIAM J. Imaging Sci., 9(4):1829–1857, 2016

work page 2016
[24]

IML FISTA: a multilevel framework for inexact and inertial forward-backward

Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, and Paulo Gon¸calves. IML FISTA: a multilevel framework for inexact and inertial forward-backward. Applica- tion to image restoration.SIAM J. Imaging Sci., 17(3):1347–1376, 2024

work page 2024
[25]

Freund, and Yurii Nesterov

Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex opti- mization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

work page 2018
[26]

Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl

Sebastian Müller, Stefania Petra, and Matthias Zisler. Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl. Funct. Anal., 8(3):855–880, 2023

work page 2023
[27]

Stephen G. Nash. A multigrid approach to discretized optimization problems. Optim. Methods Softw., 14(1-2):99–116, 2000

work page 2000
[28]

A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson, A Wiley-Interscience Publication. MULTILEVEL BREGMAN PROXIMAL GRADIENT DESCENT 27

work page 1983
[29]

A multilevel proximal gradient algorithm for a class of composite optimization problems

Panos Parpas. A multilevel proximal gradient algorithm for a class of composite optimization problems. SIAM J. Sci. Comput., 39(5):S681–S701, 2017

work page 2017
[30]

Petra, C

S. Petra, C. Schnörr, F. Becker, and F. Lenzen. B-SMART: Bregman-Based First- Order Algorithms for Non-Negative Compressed Sensing Problems. InProc. SSVM, LNCS, volume 7893, pages 110–124. Springer, 2013

work page 2013
[31]

The information geometry of mirror descent

Garvesh Raskutti and Sayan Mukherjee. The information geometry of mirror descent. IEEE Trans. Inform. Theory, 61(3):1451–1457, 2015

work page 2015
[32]

Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J

Maren Raus, Yara Elshiaty, and Stefania Petra. Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J. Appl. Numer. Optim., 6(1):1–40, 2024

work page 2024
[33]

Tyrrell Rockafellar.Convex analysis

R. Tyrrell Rockafellar.Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997

work page 1997
[34]

A simplified view of first order methods for optimization.Math

Marc Teboulle. A simplified view of first order methods for optimization.Math. Program., 170(1):67–96, 2018

work page 2018
[35]

Academic Press, 2001

Ulrich Trottenberg, Cornelis Oosterlee, and Anton Schüller.Multigrid. Academic Press, 2001

work page 2001
[36]

Vardi, L

Y. Vardi, L. Shepp, and L. Kaufman. A Statistical Model for Positron Emission Tomography. Journal of the American Statistical Association, 80:8–37, 1985

work page 1985
[37]

Wen and D

Z. Wen and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM J. Optim., 20(3):1478–1503, 2009

work page 2009

[1] [1]

Andersen and A.C

A.H. Andersen and A.C. Kak. Simultaneous algebraic reconstruction technique (sart): A superior implementation of the art algorithm.Ultrasonic Imaging, 6(1):81– 94, 1984

work page 1984

[2] [2]

Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization

Andersen Ang, Hans De Sterck, and Stephen Vavasis. Mgprox: a nonsmooth multigrid proximal gradient method with adaptive restriction for strongly convex optimization. SIAM J. Optim., 34(3):2788–2820, 2024

work page 2024

[3] [3]

An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

Callum Atkinson and Julio Soria. An efficient simultaneous reconstruction technique for tomographic particle image velocimetry.Experiments in Fluids, 47(4):553–568, October 2009

work page 2009

[4] [4]

Corwin L. Atwood. Optimal and efficient designs of experiments.Ann. Math. Statist., 40:1570–1602, 1969

work page 1969

[5] [5]

Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM

Pierre-Cyril Aubin-Frankowski, Anna Korba, and Flavien Léger. Mirror descent with relative smoothness in measure spaces, with application to sinkhorn and EM. In Advances in Neural Information Processing Systems, 2022

work page 2022

[6] [6]

Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang

Heinz H. Bauschke, Jérôme Bolte, Jiawei Chen, Marc Teboulle, and Xianfu Wang. On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity.J. Optim. Theory Appl., 182(3):1068–1087, 2019

work page 2019

[7] [7]

Bauschke, Jérôme Bolte, and Marc Teboulle

Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Math. Oper. Res., 42(2):330–348, 2017

work page 2017

[8] [8]

Bauschke and Jonathan M

Heinz H. Bauschke and Jonathan M. Borwein. Legendre functions and the method of random Bregman projections.J. Convex Anal., 4(1):27–67, 1997

work page 1997

[9] [9]

Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper

Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Oper. Res. Lett., 31(3):167–175, 2003

work page 2003

[10] [10]

Briggs, Van Emden Henson, and Steve F

William L. Briggs, Van Emden Henson, and Steve F. McCormick.A multigrid tutorial. SIAM, Philadelphia, PA, second edition, 2000

work page 2000

[11] [11]

Convergence analysis of a proximal-like min- imization algorithm using Bregman functions

Gong Chen and Marc Teboulle. Convergence analysis of a proximal-like min- imization algorithm using Bregman functions. SIAM J. Optim., 3(3):538–543, 1993. 26 YARA ELSHIATY AND STEF ANIA PETRA

work page 1993

[12] [12]

Conn, Nicholas I

Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint.Trust-region methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, PA; MPS, Philadelphia, PA, 2000

work page 2000

[13] [13]

Taylor, Alexandre d’Aspremont, and Jérôme Bolte

Radu-Alexandru Dragomir, Adrien B. Taylor, Alexandre d’Aspremont, and Jérôme Bolte. Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1-2):41–83, 2022

work page 2022

[14] [14]

Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography

Hamid Fathi, Alexander Skorikov, and Tristan van Leeuwen. Bi-level optimization and implicit differentiation as a framework for optimal experimental design in tomography. InScale Space and Variational Methods in Computer Vision, pages 123–135. Springer Nature Switzerland, 2025

work page 2025

[15] [15]

Gelman and J

E. Gelman and J. Mandel. On multilevel iterative methods for optimization problems. Math. Programming, 48(1):1–17, 1990

work page 1990

[16] [16]

Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984

work page 1984

[17] [17]

Exponentiated gradient meets gradient descent

Udaya Ghai, Elad Hazan, and Yoram Singer. Exponentiated gradient meets gradient descent. InProceedings of the 31st International Conference on Algorithmic Learning Theory, volume 117 ofProceedings of Machine Learning Research, pages 386–407. PMLR, 2020

work page 2020

[18] [18]

Toint, and Melissa Weber-Mendon¸ca

Serge Gratton, Mélodie Mouffe, Philippe L. Toint, and Melissa Weber-Mendon¸ca. A recursive l∞-trust-region method for bound-constrained nonlinear optimization. IMA J. Numer. Anal., 28(4):827–861, 2008

work page 2008

[19] [19]

Springer, 1985

Wolfgang Hackbusch.Multi-Grid Methods and Applications. Springer, 1985

work page 1985

[20] [20]

Haldar and Daeun Kim

Justin P. Haldar and Daeun Kim. Oedipus: An experiment design framework for sparsity-constrained mri.IEEE Transactions on Medical Imaging, 38:1545–1558, 2018

work page 2018

[21] [21]

Tomographic x-ray data of a walnut, 2015

Keijo Hämäläinen, Lauri Harhanen, Aki Kallonen, Antti Kujanpää, Esa Niemi, and Samuli Siltanen. Tomographic x-ray data of a walnut, 2015

work page 2015

[22] [22]

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput

Filip Hanzely, Peter Richtárik, and Lin Xiao. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization.Comput. Optim. Appl., 79(2):405–440, 2021

work page 2021

[23] [23]

MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization

Vahan Hovhannisyan, Panos Parpas, and Stefanos Zafeiriou. MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. SIAM J. Imaging Sci., 9(4):1829–1857, 2016

work page 2016

[24] [24]

IML FISTA: a multilevel framework for inexact and inertial forward-backward

Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, and Paulo Gon¸calves. IML FISTA: a multilevel framework for inexact and inertial forward-backward. Applica- tion to image restoration.SIAM J. Imaging Sci., 17(3):1347–1376, 2024

work page 2024

[25] [25]

Freund, and Yurii Nesterov

Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively smooth convex opti- mization by first-order methods, and applications.SIAM Journal on Optimization, 28(1):333–354, 2018

work page 2018

[26] [26]

Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl

Sebastian Müller, Stefania Petra, and Matthias Zisler. Multilevel geometric op- timization for regularised constrained linear inverse problems.Pure Appl. Funct. Anal., 8(3):855–880, 2023

work page 2023

[27] [27]

Stephen G. Nash. A multigrid approach to discretized optimization problems. Optim. Methods Softw., 14(1-2):99–116, 2000

work page 2000

[28] [28]

A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson, A Wiley-Interscience Publication. MULTILEVEL BREGMAN PROXIMAL GRADIENT DESCENT 27

work page 1983

[29] [29]

A multilevel proximal gradient algorithm for a class of composite optimization problems

Panos Parpas. A multilevel proximal gradient algorithm for a class of composite optimization problems. SIAM J. Sci. Comput., 39(5):S681–S701, 2017

work page 2017

[30] [30]

Petra, C

S. Petra, C. Schnörr, F. Becker, and F. Lenzen. B-SMART: Bregman-Based First- Order Algorithms for Non-Negative Compressed Sensing Problems. InProc. SSVM, LNCS, volume 7893, pages 110–124. Springer, 2013

work page 2013

[31] [31]

The information geometry of mirror descent

Garvesh Raskutti and Sayan Mukherjee. The information geometry of mirror descent. IEEE Trans. Inform. Theory, 61(3):1451–1457, 2015

work page 2015

[32] [32]

Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J

Maren Raus, Yara Elshiaty, and Stefania Petra. Accelerated Bregman divergence optimization with SMART: an information geometric point of view.J. Appl. Numer. Optim., 6(1):1–40, 2024

work page 2024

[33] [33]

Tyrrell Rockafellar.Convex analysis

R. Tyrrell Rockafellar.Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ, 1997

work page 1997

[34] [34]

A simplified view of first order methods for optimization.Math

Marc Teboulle. A simplified view of first order methods for optimization.Math. Program., 170(1):67–96, 2018

work page 2018

[35] [35]

Academic Press, 2001

Ulrich Trottenberg, Cornelis Oosterlee, and Anton Schüller.Multigrid. Academic Press, 2001

work page 2001

[36] [36]

Vardi, L

Y. Vardi, L. Shepp, and L. Kaufman. A Statistical Model for Positron Emission Tomography. Journal of the American Statistical Association, 80:8–37, 1985

work page 1985

[37] [37]

Wen and D

Z. Wen and D. Goldfarb. A line search multigrid method for large-scale nonlinear optimization. SIAM J. Optim., 20(3):1478–1503, 2009

work page 2009