pith. sign in

arxiv: 2604.04551 · v1 · submitted 2026-04-06 · 🧮 math.OC

A Near-Optimal Total Complexity for the Inexact Accelerated Proximal Gradient Method via Quadratic Growth

Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3

classification 🧮 math.OC
keywords conic polyhedral functioninexact accelerated proximal gradientquadratic growthtotal complexityproximal operatordouble-loop methodconvex optimization
0
0 comments X

The pith

For conic polyhedral regularizers, the inexact accelerated proximal gradient method achieves O(ln(1/ε)/√ε) total complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that the inexact accelerated proximal gradient method run in a double-loop structure attains ε-optimality in function value with a total of O(ln(1/ε)/√ε) calls to the proximal operator of the convex conjugate and the gradient of the smooth term, when the regularizer is conic polyhedral. This improves prior complexity bounds for the inexact method. The result follows from a quadratic growth condition on the dual of the inexact proximal subproblem that forces linear convergence of the inner loop. Readers would care because the bound is near-optimal and applies to problems where exact proximal steps are unavailable.

Core claim

We prove that when ω is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of O(ln(1/ε)/√ε) measured by the total number of calls to the proximal operator of the convex conjugate ω* and the gradient of f to achieve ε-optimality in function value. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of ω and implies linear convergence of the inner proximal gradient loop.

What carries the argument

the quadratic growth condition on the dual of the inexact proximal subproblem that arises from the conic polyhedral structure of ω and drives linear convergence of the inner proximal gradient loop

If this is right

  • The double-loop IAPG requires only O(ln(1/ε)/√ε) total proximal and gradient evaluations to reach ε-optimality.
  • The inner loop converges linearly once the quadratic growth property holds.
  • The bound applies to concrete problems such as robust TV-ℓ2 signal recovery.
  • The total complexity improves on all previously known rates for IAPG.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If quadratic growth can be established for wider classes of regularizers, the same complexity would extend beyond conic polyhedral cases.
  • The double-loop structure suggests that tuning inner-loop accuracy tolerances could further reduce practical runtime.
  • Similar dual-growth arguments might improve complexity analyses for other inexact first-order methods.

Load-bearing premise

ω must be a conic polyhedral function to guarantee the quadratic growth condition on the dual of the inexact proximal subproblem.

What would settle it

A concrete conic polyhedral ω for which the inner proximal gradient loop converges only sublinearly, or for which the observed total number of proximal and gradient calls exceeds O(ln(1/ε)/√ε) on a problem instance.

Figures

Figures reproduced from arXiv: 2604.04551 by Hongda Li, Xianfu Wang.

Figure 1
Figure 1. Figure 1: Five-number summary of the smallest inner loop iteration [PITH_FULL_IMAGE:figures/full_fig_p055_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing the observed recovered signal with the observed signal ˜x [PITH_FULL_IMAGE:figures/full_fig_p057_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a): The model for the reference line is [PITH_FULL_IMAGE:figures/full_fig_p058_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a): The model we fitted for the reference line is [PITH_FULL_IMAGE:figures/full_fig_p059_4.png] view at source ↗
read the original abstract

We consider the optimization problem $\min_{x\in \mathbb R^n}{F(x):=f(x)+\omega(Ax)}$, where $f$ is an $L$-Lipschitz smooth function, and $\omega$ is a proper, lower semicontinuous, and convex function. We prove in this paper that when $\omega$ is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of $\mathcal O(\ln(1/\varepsilon)/\sqrt{\varepsilon})$ measured by the total number of calls to the proximal operator of the convex conjugate $\omega^\star$ and the gradient of $f$ to achieve $\varepsilon$-optimality in function value. To the best of our knowledge, this improves upon the best-known complexity for IAPG. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of $\omega$ and implies linear convergence of the inner proximal gradient loop. To validate these findings, we conduct numerical experiments on a robust TV-$\ell_2$ signal recovery problem, demonstrating fast convergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript analyzes the inexact accelerated proximal gradient method (IAPG) in a double-loop framework for the composite problem min_x f(x) + ω(Ax), with f L-smooth and ω a conic polyhedral convex function. It claims that this yields a total complexity of O(ln(1/ε)/√ε) measured by the aggregate number of proximal evaluations of ω* and gradient evaluations of f to reach ε-optimality in function value. The central technical step is the derivation of a quadratic growth condition on the dual of the inexact proximal subproblem, which is asserted to follow from the conic polyhedral structure of ω and to imply linear convergence of the inner proximal gradient loop.

Significance. If the claimed complexity holds, the result improves upon existing bounds for inexact accelerated proximal methods by achieving a near-optimal total iteration count that accounts for both outer acceleration and inner-loop cost. The exploitation of conic polyhedrality to obtain a usable quadratic growth modulus on the dual subproblem is a potentially useful structural insight. The numerical experiments on robust TV-ℓ2 signal recovery provide empirical evidence of practical performance, though they are not the primary contribution.

major comments (1)
  1. [Analysis of inner-loop linear convergence (quadratic growth lemma and its application to the double-loop complexity)] The proof that the quadratic growth modulus on the dual of the inexact proximal subproblem is bounded away from zero uniformly in the outer iteration index k and in the inexactness tolerance δ_k (which must decrease as O(1/k²) for acceleration) is not established. If the modulus depends on the distance of the current dual point to the solution set or shrinks with δ_k, the inner-loop contraction factor approaches 1 and the per-outer-step inner iteration count exceeds O(log(1/δ_k)), invalidating the total O(ln(1/ε)/√ε) bound. An explicit uniform lower bound on the growth constant across the outer sequence is required to support the central complexity claim.
minor comments (2)
  1. [Numerical experiments] The numerical section provides only a high-level description of the TV-ℓ2 recovery experiments; inclusion of problem dimensions, specific parameter choices for the double-loop tolerances, and quantitative comparison against prior IAPG variants would make the empirical support more concrete.
  2. [Abstract] The abstract states that the result 'improves upon the best-known complexity for IAPG' but does not cite the precise prior bound being improved; adding a short comparison of the new versus previous rates would clarify the advance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for pinpointing the critical requirement of uniformity in the quadratic growth modulus. This directly impacts the validity of the double-loop complexity bound, and we address the concern in detail below while committing to a revision that makes the uniformity explicit.

read point-by-point responses
  1. Referee: The proof that the quadratic growth modulus on the dual of the inexact proximal subproblem is bounded away from zero uniformly in the outer iteration index k and in the inexactness tolerance δ_k (which must decrease as O(1/k²) for acceleration) is not established. If the modulus depends on the distance of the current dual point to the solution set or shrinks with δ_k, the inner-loop contraction factor approaches 1 and the per-outer-step inner iteration count exceeds O(log(1/δ_k)), invalidating the total O(ln(1/ε)/√ε) bound. An explicit uniform lower bound on the growth constant across the outer sequence is required to support the central complexity claim.

    Authors: We thank the referee for this precise and important observation. In Lemma 3.2, the quadratic growth modulus μ for the dual of the inexact proximal subproblem is derived exclusively from the conic polyhedral structure of ω (via its finite facial decomposition and recession cone properties). This yields a global lower bound μ > 0 that depends only on the fixed problem data (the matrix A, the Lipschitz constant L, and the polyhedral constants of ω) and is invariant to the outer iteration index k as well as to the inexactness tolerance δ_k. The term δ_k appears as an additive perturbation in the dual objective but does not alter the growth constant itself; the distance of the current dual iterate to the solution set influences the function value but not μ. Consequently, the inner proximal gradient loop retains a uniform contraction factor strictly less than 1, requiring only O(log(1/δ_k)) inner iterations per outer step and preserving the overall O(ln(1/ε)/√ε) total complexity. We will revise the manuscript by adding an explicit corollary immediately after Lemma 3.2 that states the uniform lower bound on μ together with a short proof sketch emphasizing its independence from k and δ_k. revision: yes

Circularity Check

0 steps flagged

No circularity: complexity bound follows from stated quadratic growth assumption

full rationale

The derivation proceeds from the assumption that ω is conic polyhedral, which the paper states yields a quadratic growth condition on the dual of the inexact proximal subproblem. This growth implies linear convergence of the inner proximal gradient loop (with rate independent of outer iterates under the stated structure). The outer accelerated scheme then requires δ_k ~ 1/k² inexactness, and the summed inner iterations produce the claimed O(ln(1/ε)/√ε) total proximal/gradient calls. No equation reduces to a prior fitted quantity by construction, no parameter is renamed as a prediction, and no load-bearing step rests on a self-citation whose content is unverified. The central claim is a direct consequence of the quadratic-growth hypothesis plus standard accelerated proximal analysis; the paper is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proof relies on standard convexity and smoothness of f, proper lower-semicontinuity of ω, and the conic polyhedral structure of ω to obtain quadratic growth; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption f is L-Lipschitz smooth and ω is proper, lower semicontinuous, and convex
    Stated in the problem formulation; required for the proximal gradient framework.
  • domain assumption ω is conic polyhedral
    This is the key structural assumption that produces the quadratic growth condition on the dual proximal subproblem.

pith-pipeline@v0.9.0 · 5515 in / 1346 out tokens · 37602 ms · 2026-05-10T19:57:49.381372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Alamo, P

    T. Alamo, P. Krupa, and D. Limon , Gradient based restart FISTA, in 2019 IEEE 58th Conference on Decision and Control (CDC), Dec. 2019, pp. 3936–3941

  2. [2]

    A. Y. Aravkin, J. V. Burke, and G. Pillonetto , Sparse/Robust estimation and Kalman smoothing with nonsmooth log-concave densities: modeling, computation, and theory, Journal of Machine Learning Research, 14 (2013), pp. 2689–2728

  3. [3]

    Attouch, Z

    H. Attouch, Z. Chbani, J. F adili, and H. Riahi , First-order optimization algo- rithms via inertial systems with Hessian driven damping , Mathematical Programming, 193 (2022), pp. 113–155

  4. [4]

    H. H. Bauschke and P. L. Combettes , Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics, Springer International Publish- ing, Cham, 2017

  5. [5]

    Beck , First-order Methods in Optimization , MOS-SIAM Series in Optimization, SIAM, 2017

    A. Beck , First-order Methods in Optimization , MOS-SIAM Series in Optimization, SIAM, 2017

  6. [6]

    Beck and M

    A. Beck and M. Teboulle , A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), pp. 183–202. 63

  7. [7]

    Bello-Cruz, M

    Y. Bello-Cruz, M. L. N. Gonccalves, and N. Krislock , On Inexact Acceler- ated Proximal Gradient Methods with Relative Error Rules , arXiv: Optimization and Control, (2020)

  8. [8]

    Bezanson, A

    J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah , Julia: A fresh ap- proach to numerical computing, SIAM Review, 59 (2017), pp. 65–98

  9. [9]

    Calatroni and A

    L. Calatroni and A. Chambolle , Backtracking strategies for accelerated descent methods with smooth composite objectives , SIAM Journal on Optimization, 29 (2019), pp. 1772–1798

  10. [10]

    Chambolle and T

    A. Chambolle and T. Pock , A first-order primal-dual algorithm for convex problems with applications to Imaging , Journal of Mathematical Imaging and Vision, 40 (2011), pp. 120–145

  11. [11]

    , An introduction to continuous optimization for imaging , Acta Numerica, 25 (2016), pp. 161–319

  12. [12]

    Christou and M

    E. Christou and M. Grabchak , Risk estimation with composite quantile regression, Econometrics and Statistics, 33 (2025), pp. 166–179

  13. [13]

    M. J. Ehrhardt and M. M. Betcke , Multicontrast MRI reconstruction with structure-guided total variation, SIAM Journal on Imaging Sciences, 9 (2016), pp. 1084– 1106

  14. [14]

    G ¨uler, New proximal point algorithms for convex minimization , SIAM Journal on Optimization, 2 (1992), pp

    O. G ¨uler, New proximal point algorithms for convex minimization , SIAM Journal on Optimization, 2 (1992), pp. 649–664

  15. [15]

    S. H. Joshi, A. Marquina, S. J. Osher, I. Dinov, J. D. V an Horn, and A. W. Toga , MRI resolution enhencement using total variation regularization , IEEE International Symposium on Biomedical Imaging, 2009 (2009), pp. 161–164

  16. [16]

    P. D. Khanh, B. S. Mordukhovich, V. T. Phat, and D. B. Tran , Inexact prox- imal methods for weakly convex functions , Journal of Global Optimization, 91 (2025), pp. 611–646

  17. [17]

    H. Lin, J. Mairal, and Z. Harchaoui , Catalyst acceleration for first-order convex optimization: from theory to practice, Journal of Machine Learning Research, 18 (2018), pp. 1–54

  18. [18]

    Lin and Y

    Q. Lin and Y. Xu , Reducing the complexity of two classes of optimization problems by inexact accelerated proximal gradient method , SIAM Journal on Optimization, 33 (2023), pp. 1–35. 64

  19. [19]

    Mukherjee, S

    S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. ¨Oktem, and C.- B. Sch ¨onlieb, Data-driven convex regularizers for inverse problems , in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2024, pp. 13386–13390

  20. [20]

    Necoara, Y

    I. Necoara, Y. Nesterov, and F. Glineur , Linear convergence of first order methods for non-strongly convex optimization, Mathematical Programming, 175 (2019), pp. 69–107

  21. [21]

    Nesterov, A method for solving the convex programming problem with convergence rate O(1/kˆ2), Proceedings of the USSR Academy of Sciences, (1983), pp

    Y. Nesterov, A method for solving the convex programming problem with convergence rate O(1/kˆ2), Proceedings of the USSR Academy of Sciences, (1983), pp. 543–547

  22. [22]

    Nesterov , Lectures on Convex Optimization , Springer Optimization and Its Ap- plications, Springer International Publishing, 2018

    Y. Nesterov , Lectures on Convex Optimization , Springer Optimization and Its Ap- plications, Springer International Publishing, 2018

  23. [23]

    Rasch and A

    J. Rasch and A. Chambolle , Inexact first-order primal–dual algorithms, Computa- tional Optimization and Applications, 76 (2020), pp. 381–430

  24. [24]

    R. T. Rockafellar , Monotone operators and the proximal point algorithm , SIAM Journal on Control and Optimization, 14 (1976), pp. 877–898

  25. [25]

    R. T. Rockafellar and R. J. B. Wets , Variational Analysis, Grundlehren der mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 1998

  26. [26]

    Sawatzky, C

    A. Sawatzky, C. Brune, T. K ¨osters, F. W ¨ubbeling, and M. Burger , EM-TV methods for inverse problems with poisson noise , in Level Set and PDE Based Recon- struction Methods in Imaging, M. Burger, A. C. Mennucci, S. Osher, and M. Rumpf, eds., Springer International Publishing, Cham, 2013, pp. 71–142

  27. [27]

    Scherzer, M

    O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen , Variational Methods in Imaging , Applied Mathematical Sciences, Springer, New York, NY, 2009

  28. [28]

    Schmidt, N

    M. Schmidt, N. Roux, and F. Bach , Convergence rates of inexact proximal-gradient methods for convex optimization , in Advances in Neural Information Processing Sys- tems, vol. 24, Curran Associates, Inc., 2011

  29. [29]

    Tang and P

    L. Tang and P. X. Song , Fused lasso approach in regression coefficients cluster- ing learning rarameter heterogeneity in data integration , Journal of Machine Learning Research, 17 (2016), p. 113

  30. [30]

    Villa, S

    S. Villa, S. Salzo, L. Baldassarre, and A. Verri , Accelerated and inexact forward-backward algorithms, SIAM Journal on Optimization, 23 (2013), pp. 1607–1633. 65

  31. [31]

    Hjorth Larsen, J

    J. Xu and F. Noo , Convex optimization algorithms in medical image reconstruc- tion in the age of AI , Physics in Medicine and Biology, 67 (2022), pp. 10.1088/1361– 6560/ac3842

  32. [32]

    W. Yin, S. Osher, D. Goldfarb, and J. Darbon , Bregman iterative algorithms for L1-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, 1 (2008), pp. 143–168

  33. [33]

    Zalinescu, Convex Analysis in General Vector Spaces, World Scientific, River Edge, N.J

    C. Zalinescu, Convex Analysis in General Vector Spaces, World Scientific, River Edge, N.J. ; London, 2002

  34. [34]

    Zhang, M

    M. Zhang, M. Zhang, F. Zhang, A. Chaddad, and A. Evans , Robust brain MR image compressive sensing via re-weighted total variation and sparse regression , Magnetic Resonance Imaging, 85 (2022), pp. 271–286. 66