An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit
Pith reviewed 2026-05-25 05:43 UTC · model grok-4.3
The pith
Prox-ITEM attains the minimax optimal distance convergence rate among span-based first-order methods for smooth strongly convex composite optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce Prox-ITEM, an optimal proximal gradient method for minimizing f+g, where f is smooth and strongly convex, and g is convex, proper, and lower semicontinuous. In the smooth case g=0, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for f and an arbitrary number of proximal-oracle calls for g. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to
What carries the argument
Prox-ITEM, the proximal extension of ITEM that interleaves gradient steps on f with proximal steps on g while preserving the span-based structure needed for exact distance bounds.
If this is right
- When g equals zero Prox-ITEM reduces exactly to ITEM.
- The distance-convergence rate is minimax optimal inside the span-based class with fixed gradient calls on f.
- Prox-TMM is recovered as the stationary limit and inherits the TMM distance rate in the composite case.
- The method permits an arbitrary number of proximal-oracle calls without increasing the gradient budget on f.
Where Pith is reading between the lines
- The decoupling of gradient and proximal costs may suggest similar constructions for other accelerated schemes in composite settings.
- The stationary-limit construction could be applied to additional momentum methods beyond TMM.
- Exact distance bounds of this type may become useful when comparing methods across different oracle models.
- Practical implementations could exploit the freedom to perform extra proximal steps at low marginal cost.
Load-bearing premise
The optimality statement holds only inside the restricted class of span-based first-order methods.
What would settle it
A single span-based first-order method that attains a strictly smaller distance-to-solution after the same number of gradient calls on f would falsify the claimed minimax optimality.
read the original abstract
We introduce Prox-ITEM, an optimal proximal gradient method for minimizing $f+g$, where $f$ is smooth and strongly convex, and $g$ is convex, proper, and lower semicontinuous. In the smooth case $g=0$, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for $f$ and an arbitrary number of proximal-oracle calls for $g$. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to the composite setting and achieves the corresponding TMM distance-convergence rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Prox-ITEM, a proximal gradient method for minimizing f + g with f smooth and strongly convex and g convex, proper, and lsc. When g = 0 it reduces to ITEM. The central claims are an exact distance-to-solution bound for Prox-ITEM that matches ITEM's rate, minimax optimality of this rate among span-based first-order methods (same number of gradient calls to f, arbitrary proximal calls to g), and identification of the stationary limit Prox-TMM as a proximal extension of TMM that attains the corresponding rate.
Significance. If the claims hold, the work supplies a concrete optimal method together with exact (non-asymptotic) distance bounds and a restricted minimax result for the composite setting. These are load-bearing contributions to first-order optimization theory. The stress-test concern that derivations are absent does not land: the full manuscript contains the supporting arguments and verifications for the exact bound and the span-based optimality statement.
minor comments (2)
- [§2] The precise definition of the span-based method class (including the precise span construction and the allowed proximal operations) should be stated once in a dedicated subsection or remark so that the optimality claim can be checked without cross-referencing multiple places.
- [§3] Notation for the distance-to-solution quantity and the exact bound expression could be introduced earlier (e.g., before the statement of the main theorem) to improve readability.
Simulated Author's Rebuttal
We thank the referee for their positive review, accurate summary of the contributions, and recommendation to accept. We are pleased that the referee finds the exact distance bounds, span-based minimax optimality result, and the identification of Prox-TMM to be load-bearing contributions, and confirms that the full manuscript contains the supporting derivations.
Circularity Check
No significant circularity; optimality claim restricted to explicitly defined span-based class under standard assumptions
full rationale
The derivation introduces Prox-ITEM as a proximal extension of ITEM, proves an exact distance-to-solution bound matching ITEM's rate, and establishes minimax optimality only within the class of span-based first-order methods (with the same number of gradient calls for f and arbitrary proximal calls for g). This restriction is stated explicitly in the abstract, and the assumptions (smooth strong convexity of f, convexity of g) are the standard definitions with no hidden self-referential elements. No equations reduce a claimed prediction or bound to a fitted parameter or self-definition by construction, and any references to prior ITEM/TMM results function as external benchmarks rather than load-bearing self-citations that close the argument. The central claims remain independently verifiable against the defined method class.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. M. Altschuler and P. A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1–2):1105–1118, 2025. doi:10.1007/s10107-024-02164-2
-
[2]
J. M. Altschuler and P. A. Parrilo. Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM, 72(2):1–38, 2025. doi:10.1145/3708502
-
[3]
A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006. doi:10.1137/S10526 23403427823
-
[4]
M. Barré, A. B. Taylor, and F. Bach. Principled analyses and design of first-order methods with inexact proximal operators.Mathematical Programming, 201(1–2):185–230, 2023. doi: 10.1007/s10107-022-01903-7
-
[5]
H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer International Publishing, Cham, 2nd edition, 2017. doi:10.1007/978-3-319-48311-5
-
[6]
Beck.First-Order Methods in Optimization
A. Beck.First-Order Methods in Optimization. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 2017. ISBN 978-1-61197-498-0. doi: 10.1137/1.9781611974997
-
[7]
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. doi:10.1137/080716542
- [8]
- [9]
- [10]
- [11]
-
[12]
P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. doi:10.1137/050626090. 17
-
[13]
J. P. Contreras and R. Cominetti. Optimal error bounds for non-expansive fixed-point iterations in normed spaces.Mathematical Programming, 199(1–2):343–374, 2023. doi: 10.1007/s10107-022-01830-7
-
[14]
S. Das Gupta, B. P. G. Van Parys, and E. K. Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 204(1–2):567–639, 2024. doi:10.1007/s10107-023-01973-1
-
[15]
A. d’Aspremont, D. Scieur, and A. B. Taylor. Acceleration methods.Foundations and Trends in Optimization, 5(1–2):1–245, 2021. doi:10.1561/2400000036
-
[16]
Y. Drori. The exact information-based complexity of smooth convex minimization.Journal of Complexity, 39:1–16, 2017. doi:10.1016/j.jco.2016.11.001
-
[17]
Y. Drori and A. B. Taylor. On the oracle complexity of smooth strongly convex minimization. Journal of Complexity, 68:101590, 2022. doi:10.1016/j.jco.2021.101590
-
[18]
Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex mini- mization: A novel approach.Mathematical Programming, 145(1–2):451–482, 2014. doi: 10.1007/s10107-013-0653-0
-
[19]
M. I. Florea. Adaptive first-order methods with enhanced worst-case rates, 2024. arXiv:24 05.07926[math.OC]
work page 2024
-
[20]
A. V. Gasnikov and Y. Nesterov. Universal method for stochastic composite optimization problems.Computational Mathematics and Mathematical Physics, 58(1):48–64, 2018. doi: 10.1134/S0965542518010050
-
[21]
G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part I.Numerische Mathematik, 3(1):147–156, 1961. doi:10.1007/BF01386013
-
[22]
G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part II.Numerische Mathematik, 3(1):157–168, 1961. doi:10.1007/BF01386014
-
[23]
B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, and A. Dieuleveut. PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python.Math- ematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-002 59-7
-
[24]
B. Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024. doi:10.1137/23M1588408
-
[25]
O. Güler. New proximal point algorithms for convex minimization.SIAM Journal on Optimization, 2(4):649–664, 1992. doi:10.1137/0802032
-
[26]
B. Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6):957–961, 1967. doi:10.1090/S0002-9904-1967-11864-0
- [28]
-
[29]
D.Kim. Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021. doi:10.1007/s10107-021-01643-0. 18
-
[30]
D. Kim and J. A. Fessler. Optimized first-order methods for smooth convex minimization. Mathematical Programming, 159(1–2):81–107, 2016. doi:10.1007/s10107-015-0949-3
-
[31]
D. Kim and J. A. Fessler. Generalizing the Optimized Gradient Method for smooth convex minimization.SIAM Journal on Optimization, 28(2):1920–1950, 2018. doi:10.1137/17M1 12124X
-
[32]
D. Kim and J. A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021. doi:10.1007/s10957-020-01770-2
- [33]
-
[34]
F. Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2): 405–418, 2021. doi:10.1007/s11590-020-01617-9
-
[35]
P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979. doi:10.1137/0716071
- [36]
-
[37]
Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Program- ming, 140(1):125–161, 2013. doi:10.1007/s10107-012-0629-5
-
[38]
Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications
Y. Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications. Springer Cham, 2 edition, 2018. doi:10.1007/978-3-319-91578-4
-
[39]
J. Park and E. K. Ryu. Exact optimal accelerated complexity for fixed-point iterations. In International Conference on Machine Learning (ICML), 2022
work page 2022
-
[40]
G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space.Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979. doi: 10.1016/0022-247X(79)90234-8
-
[41]
A. B. Taylor and Y. Drori. An optimal gradient method for smooth strongly convex minimization.Mathematical Programming, 199(1–2):557–594, 2023. doi:10.1007/s10107 -022-01839-y
-
[42]
A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, 161(1–2): 307–345, 2017. doi:10.1007/s10107-016-1009-3
-
[43]
A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case performance of first- order methods for composite convex optimization.SIAM Journal on Optimization, 27(3): 1283–1313, 2017. doi:10.1137/16M108104X
-
[44]
A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization.Journal of Optimization Theory and Applications, 178(2):455–476, 2018. doi:10.1007/s10957-018-1298-1
-
[45]
P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, 2008
work page 2008
- [46]
-
[47]
B. Van Scoy, R. A. Freeman, and K. M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018. doi:10.1109/LCSYS.2017.2722406
-
[48]
R. S. Varga.Matrix Iterative Analysis, volume 27 ofSpringer Series in Computational Mathematics. Springer, Berlin, 2nd edition, 2000. doi:10.1007/978-3-642-05156-2. 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.