A Near-Optimal Total Complexity for the Inexact Accelerated Proximal Gradient Method via Quadratic Growth
Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3
The pith
For conic polyhedral regularizers, the inexact accelerated proximal gradient method achieves O(ln(1/ε)/√ε) total complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that when ω is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of O(ln(1/ε)/√ε) measured by the total number of calls to the proximal operator of the convex conjugate ω* and the gradient of f to achieve ε-optimality in function value. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of ω and implies linear convergence of the inner proximal gradient loop.
What carries the argument
the quadratic growth condition on the dual of the inexact proximal subproblem that arises from the conic polyhedral structure of ω and drives linear convergence of the inner proximal gradient loop
If this is right
- The double-loop IAPG requires only O(ln(1/ε)/√ε) total proximal and gradient evaluations to reach ε-optimality.
- The inner loop converges linearly once the quadratic growth property holds.
- The bound applies to concrete problems such as robust TV-ℓ2 signal recovery.
- The total complexity improves on all previously known rates for IAPG.
Where Pith is reading between the lines
- If quadratic growth can be established for wider classes of regularizers, the same complexity would extend beyond conic polyhedral cases.
- The double-loop structure suggests that tuning inner-loop accuracy tolerances could further reduce practical runtime.
- Similar dual-growth arguments might improve complexity analyses for other inexact first-order methods.
Load-bearing premise
ω must be a conic polyhedral function to guarantee the quadratic growth condition on the dual of the inexact proximal subproblem.
What would settle it
A concrete conic polyhedral ω for which the inner proximal gradient loop converges only sublinearly, or for which the observed total number of proximal and gradient calls exceeds O(ln(1/ε)/√ε) on a problem instance.
Figures
read the original abstract
We consider the optimization problem $\min_{x\in \mathbb R^n}{F(x):=f(x)+\omega(Ax)}$, where $f$ is an $L$-Lipschitz smooth function, and $\omega$ is a proper, lower semicontinuous, and convex function. We prove in this paper that when $\omega$ is a conic polyhedral function, the inexact accelerated proximal gradient method (IAPG), employed in a double-loop structure, achieves a total complexity of $\mathcal O(\ln(1/\varepsilon)/\sqrt{\varepsilon})$ measured by the total number of calls to the proximal operator of the convex conjugate $\omega^\star$ and the gradient of $f$ to achieve $\varepsilon$-optimality in function value. To the best of our knowledge, this improves upon the best-known complexity for IAPG. The key theoretical ingredient is a quadratic growth condition on the dual of the inexact proximal problem, which arises from the conic polyhedral structure of $\omega$ and implies linear convergence of the inner proximal gradient loop. To validate these findings, we conduct numerical experiments on a robust TV-$\ell_2$ signal recovery problem, demonstrating fast convergence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the inexact accelerated proximal gradient method (IAPG) in a double-loop framework for the composite problem min_x f(x) + ω(Ax), with f L-smooth and ω a conic polyhedral convex function. It claims that this yields a total complexity of O(ln(1/ε)/√ε) measured by the aggregate number of proximal evaluations of ω* and gradient evaluations of f to reach ε-optimality in function value. The central technical step is the derivation of a quadratic growth condition on the dual of the inexact proximal subproblem, which is asserted to follow from the conic polyhedral structure of ω and to imply linear convergence of the inner proximal gradient loop.
Significance. If the claimed complexity holds, the result improves upon existing bounds for inexact accelerated proximal methods by achieving a near-optimal total iteration count that accounts for both outer acceleration and inner-loop cost. The exploitation of conic polyhedrality to obtain a usable quadratic growth modulus on the dual subproblem is a potentially useful structural insight. The numerical experiments on robust TV-ℓ2 signal recovery provide empirical evidence of practical performance, though they are not the primary contribution.
major comments (1)
- [Analysis of inner-loop linear convergence (quadratic growth lemma and its application to the double-loop complexity)] The proof that the quadratic growth modulus on the dual of the inexact proximal subproblem is bounded away from zero uniformly in the outer iteration index k and in the inexactness tolerance δ_k (which must decrease as O(1/k²) for acceleration) is not established. If the modulus depends on the distance of the current dual point to the solution set or shrinks with δ_k, the inner-loop contraction factor approaches 1 and the per-outer-step inner iteration count exceeds O(log(1/δ_k)), invalidating the total O(ln(1/ε)/√ε) bound. An explicit uniform lower bound on the growth constant across the outer sequence is required to support the central complexity claim.
minor comments (2)
- [Numerical experiments] The numerical section provides only a high-level description of the TV-ℓ2 recovery experiments; inclusion of problem dimensions, specific parameter choices for the double-loop tolerances, and quantitative comparison against prior IAPG variants would make the empirical support more concrete.
- [Abstract] The abstract states that the result 'improves upon the best-known complexity for IAPG' but does not cite the precise prior bound being improved; adding a short comparison of the new versus previous rates would clarify the advance.
Simulated Author's Rebuttal
We thank the referee for their thorough review and for pinpointing the critical requirement of uniformity in the quadratic growth modulus. This directly impacts the validity of the double-loop complexity bound, and we address the concern in detail below while committing to a revision that makes the uniformity explicit.
read point-by-point responses
-
Referee: The proof that the quadratic growth modulus on the dual of the inexact proximal subproblem is bounded away from zero uniformly in the outer iteration index k and in the inexactness tolerance δ_k (which must decrease as O(1/k²) for acceleration) is not established. If the modulus depends on the distance of the current dual point to the solution set or shrinks with δ_k, the inner-loop contraction factor approaches 1 and the per-outer-step inner iteration count exceeds O(log(1/δ_k)), invalidating the total O(ln(1/ε)/√ε) bound. An explicit uniform lower bound on the growth constant across the outer sequence is required to support the central complexity claim.
Authors: We thank the referee for this precise and important observation. In Lemma 3.2, the quadratic growth modulus μ for the dual of the inexact proximal subproblem is derived exclusively from the conic polyhedral structure of ω (via its finite facial decomposition and recession cone properties). This yields a global lower bound μ > 0 that depends only on the fixed problem data (the matrix A, the Lipschitz constant L, and the polyhedral constants of ω) and is invariant to the outer iteration index k as well as to the inexactness tolerance δ_k. The term δ_k appears as an additive perturbation in the dual objective but does not alter the growth constant itself; the distance of the current dual iterate to the solution set influences the function value but not μ. Consequently, the inner proximal gradient loop retains a uniform contraction factor strictly less than 1, requiring only O(log(1/δ_k)) inner iterations per outer step and preserving the overall O(ln(1/ε)/√ε) total complexity. We will revise the manuscript by adding an explicit corollary immediately after Lemma 3.2 that states the uniform lower bound on μ together with a short proof sketch emphasizing its independence from k and δ_k. revision: yes
Circularity Check
No circularity: complexity bound follows from stated quadratic growth assumption
full rationale
The derivation proceeds from the assumption that ω is conic polyhedral, which the paper states yields a quadratic growth condition on the dual of the inexact proximal subproblem. This growth implies linear convergence of the inner proximal gradient loop (with rate independent of outer iterates under the stated structure). The outer accelerated scheme then requires δ_k ~ 1/k² inexactness, and the summed inner iterations produce the claimed O(ln(1/ε)/√ε) total proximal/gradient calls. No equation reduces to a prior fitted quantity by construction, no parameter is renamed as a prediction, and no load-bearing step rests on a self-citation whose content is unverified. The central claim is a direct consequence of the quadratic-growth hypothesis plus standard accelerated proximal analysis; the paper is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption f is L-Lipschitz smooth and ω is proper, lower semicontinuous, and convex
- domain assumption ω is conic polyhedral
Reference graph
Works this paper leans on
- [1]
-
[2]
A. Y. Aravkin, J. V. Burke, and G. Pillonetto , Sparse/Robust estimation and Kalman smoothing with nonsmooth log-concave densities: modeling, computation, and theory, Journal of Machine Learning Research, 14 (2013), pp. 2689–2728
work page 2013
-
[3]
H. Attouch, Z. Chbani, J. F adili, and H. Riahi , First-order optimization algo- rithms via inertial systems with Hessian driven damping , Mathematical Programming, 193 (2022), pp. 113–155
work page 2022
-
[4]
H. H. Bauschke and P. L. Combettes , Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books in Mathematics, Springer International Publish- ing, Cham, 2017
work page 2017
-
[5]
Beck , First-order Methods in Optimization , MOS-SIAM Series in Optimization, SIAM, 2017
A. Beck , First-order Methods in Optimization , MOS-SIAM Series in Optimization, SIAM, 2017
work page 2017
-
[6]
A. Beck and M. Teboulle , A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), pp. 183–202. 63
work page 2009
-
[7]
Y. Bello-Cruz, M. L. N. Gonccalves, and N. Krislock , On Inexact Acceler- ated Proximal Gradient Methods with Relative Error Rules , arXiv: Optimization and Control, (2020)
work page 2020
-
[8]
J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah , Julia: A fresh ap- proach to numerical computing, SIAM Review, 59 (2017), pp. 65–98
work page 2017
-
[9]
L. Calatroni and A. Chambolle , Backtracking strategies for accelerated descent methods with smooth composite objectives , SIAM Journal on Optimization, 29 (2019), pp. 1772–1798
work page 2019
-
[10]
A. Chambolle and T. Pock , A first-order primal-dual algorithm for convex problems with applications to Imaging , Journal of Mathematical Imaging and Vision, 40 (2011), pp. 120–145
work page 2011
-
[11]
, An introduction to continuous optimization for imaging , Acta Numerica, 25 (2016), pp. 161–319
work page 2016
-
[12]
E. Christou and M. Grabchak , Risk estimation with composite quantile regression, Econometrics and Statistics, 33 (2025), pp. 166–179
work page 2025
-
[13]
M. J. Ehrhardt and M. M. Betcke , Multicontrast MRI reconstruction with structure-guided total variation, SIAM Journal on Imaging Sciences, 9 (2016), pp. 1084– 1106
work page 2016
-
[14]
O. G ¨uler, New proximal point algorithms for convex minimization , SIAM Journal on Optimization, 2 (1992), pp. 649–664
work page 1992
-
[15]
S. H. Joshi, A. Marquina, S. J. Osher, I. Dinov, J. D. V an Horn, and A. W. Toga , MRI resolution enhencement using total variation regularization , IEEE International Symposium on Biomedical Imaging, 2009 (2009), pp. 161–164
work page 2009
-
[16]
P. D. Khanh, B. S. Mordukhovich, V. T. Phat, and D. B. Tran , Inexact prox- imal methods for weakly convex functions , Journal of Global Optimization, 91 (2025), pp. 611–646
work page 2025
-
[17]
H. Lin, J. Mairal, and Z. Harchaoui , Catalyst acceleration for first-order convex optimization: from theory to practice, Journal of Machine Learning Research, 18 (2018), pp. 1–54
work page 2018
- [18]
-
[19]
S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. ¨Oktem, and C.- B. Sch ¨onlieb, Data-driven convex regularizers for inverse problems , in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2024, pp. 13386–13390
work page 2024
-
[20]
I. Necoara, Y. Nesterov, and F. Glineur , Linear convergence of first order methods for non-strongly convex optimization, Mathematical Programming, 175 (2019), pp. 69–107
work page 2019
-
[21]
Y. Nesterov, A method for solving the convex programming problem with convergence rate O(1/kˆ2), Proceedings of the USSR Academy of Sciences, (1983), pp. 543–547
work page 1983
-
[22]
Y. Nesterov , Lectures on Convex Optimization , Springer Optimization and Its Ap- plications, Springer International Publishing, 2018
work page 2018
-
[23]
J. Rasch and A. Chambolle , Inexact first-order primal–dual algorithms, Computa- tional Optimization and Applications, 76 (2020), pp. 381–430
work page 2020
-
[24]
R. T. Rockafellar , Monotone operators and the proximal point algorithm , SIAM Journal on Control and Optimization, 14 (1976), pp. 877–898
work page 1976
-
[25]
R. T. Rockafellar and R. J. B. Wets , Variational Analysis, Grundlehren der mathematischen Wissenschaften, Springer, Berlin, Heidelberg, 1998
work page 1998
-
[26]
A. Sawatzky, C. Brune, T. K ¨osters, F. W ¨ubbeling, and M. Burger , EM-TV methods for inverse problems with poisson noise , in Level Set and PDE Based Recon- struction Methods in Imaging, M. Burger, A. C. Mennucci, S. Osher, and M. Rumpf, eds., Springer International Publishing, Cham, 2013, pp. 71–142
work page 2013
-
[27]
O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen , Variational Methods in Imaging , Applied Mathematical Sciences, Springer, New York, NY, 2009
work page 2009
-
[28]
M. Schmidt, N. Roux, and F. Bach , Convergence rates of inexact proximal-gradient methods for convex optimization , in Advances in Neural Information Processing Sys- tems, vol. 24, Curran Associates, Inc., 2011
work page 2011
-
[29]
L. Tang and P. X. Song , Fused lasso approach in regression coefficients cluster- ing learning rarameter heterogeneity in data integration , Journal of Machine Learning Research, 17 (2016), p. 113
work page 2016
- [30]
-
[31]
J. Xu and F. Noo , Convex optimization algorithms in medical image reconstruc- tion in the age of AI , Physics in Medicine and Biology, 67 (2022), pp. 10.1088/1361– 6560/ac3842
-
[32]
W. Yin, S. Osher, D. Goldfarb, and J. Darbon , Bregman iterative algorithms for L1-minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, 1 (2008), pp. 143–168
work page 2008
-
[33]
Zalinescu, Convex Analysis in General Vector Spaces, World Scientific, River Edge, N.J
C. Zalinescu, Convex Analysis in General Vector Spaces, World Scientific, River Edge, N.J. ; London, 2002
work page 2002
- [34]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.