An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit

Adrien B. Taylor; Aymeric Dieuleveut; Daniel Berg Thomsen; Manu Upadhyaya

arxiv: 2605.22929 · v1 · pith:D6YHADPKnew · submitted 2026-05-21 · 🧮 math.OC

An optimal first-order method for smooth and strongly convex composite optimization and its stationary limit

Manu Upadhyaya , Daniel Berg Thomsen , Aymeric Dieuleveut , Adrien B. Taylor This is my paper

Pith reviewed 2026-05-25 05:43 UTC · model grok-4.3

classification 🧮 math.OC

keywords proximal gradient methodscomposite optimizationstrongly convexITEMTMMspan-based methodsdistance convergencefirst-order optimality

0 comments

The pith

Prox-ITEM attains the minimax optimal distance convergence rate among span-based first-order methods for smooth strongly convex composite optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Prox-ITEM as a proximal gradient method for minimizing f plus g where f is smooth and strongly convex while g is convex, proper and lower semicontinuous. It proves an exact distance-to-solution bound that matches the convergence rate of the information-theoretic exact method ITEM and shows this rate is minimax optimal among span-based first-order methods that use a fixed number of gradient calls on f together with any number of proximal calls on g. When g vanishes the method reduces exactly to ITEM, and the paper identifies its stationary limit as Prox-TMM, a proximal extension of the triple momentum method that achieves the corresponding TMM rate. A sympathetic reader cares because the result supplies both a concrete algorithm and a sharp optimality certificate for a standard class of composite problems.

Core claim

We introduce Prox-ITEM, an optimal proximal gradient method for minimizing f+g, where f is smooth and strongly convex, and g is convex, proper, and lower semicontinuous. In the smooth case g=0, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for f and an arbitrary number of proximal-oracle calls for g. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to

What carries the argument

Prox-ITEM, the proximal extension of ITEM that interleaves gradient steps on f with proximal steps on g while preserving the span-based structure needed for exact distance bounds.

If this is right

When g equals zero Prox-ITEM reduces exactly to ITEM.
The distance-convergence rate is minimax optimal inside the span-based class with fixed gradient calls on f.
Prox-TMM is recovered as the stationary limit and inherits the TMM distance rate in the composite case.
The method permits an arbitrary number of proximal-oracle calls without increasing the gradient budget on f.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decoupling of gradient and proximal costs may suggest similar constructions for other accelerated schemes in composite settings.
The stationary-limit construction could be applied to additional momentum methods beyond TMM.
Exact distance bounds of this type may become useful when comparing methods across different oracle models.
Practical implementations could exploit the freedom to perform extra proximal steps at low marginal cost.

Load-bearing premise

The optimality statement holds only inside the restricted class of span-based first-order methods.

What would settle it

A single span-based first-order method that attains a strictly smaller distance-to-solution after the same number of gradient calls on f would falsify the claimed minimax optimality.

read the original abstract

We introduce Prox-ITEM, an optimal proximal gradient method for minimizing $f+g$, where $f$ is smooth and strongly convex, and $g$ is convex, proper, and lower semicontinuous. In the smooth case $g=0$, Prox-ITEM reduces to the information-theoretic exact method (ITEM). We prove an exact distance-to-solution bound for Prox-ITEM with the same distance-convergence rate as ITEM, and show that this rate is minimax optimal among span-based first-order methods using the same number of gradient-oracle calls for $f$ and an arbitrary number of proximal-oracle calls for $g$. We also identify the stationary limit of Prox-ITEM, denoted Prox-TMM, which gives a proximal extension of the triple momentum method (TMM) to the composite setting and achieves the corresponding TMM distance-convergence rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Prox-ITEM extends ITEM to the composite setting with the same distance rate and a minimax claim scoped to span-based methods.

read the letter

The main thing to know is that this paper constructs Prox-ITEM, a proximal gradient method for f smooth and strongly convex plus g convex, that recovers the exact distance-to-solution rate of the earlier ITEM method and claims minimax optimality inside the span-based first-order class. It also gives the stationary limit Prox-TMM as a proximal version of TMM. When g is zero the method reduces directly to ITEM, which keeps the construction consistent with prior work. The setup fixes the number of gradient calls on f while allowing any number of proximal calls on g, which fits standard composite problems. The paper states the optimality restriction to span-based methods explicitly, so the claim stays within its stated scope rather than overreaching. The exact bound is presented without extra fitting parameters, which aligns with the information-theoretic style of the ITEM line. The main limitation is the span-based restriction itself; methods outside that class are not ruled out, though the paper does not pretend otherwise. Full proofs are required to check that the bound holds under the smoothness and strong-convexity assumptions, but the abstract and stress-test description show no internal contradiction in the stated result. This work is aimed at researchers who track optimal first-order rates and proximal extensions in convex optimization. Readers working on algorithm design for regularized problems would find the constructions useful. It deserves a serious referee to verify the derivations and the precise definition of the span class.

Referee Report

0 major / 2 minor

Summary. The paper introduces Prox-ITEM, a proximal gradient method for minimizing f + g with f smooth and strongly convex and g convex, proper, and lsc. When g = 0 it reduces to ITEM. The central claims are an exact distance-to-solution bound for Prox-ITEM that matches ITEM's rate, minimax optimality of this rate among span-based first-order methods (same number of gradient calls to f, arbitrary proximal calls to g), and identification of the stationary limit Prox-TMM as a proximal extension of TMM that attains the corresponding rate.

Significance. If the claims hold, the work supplies a concrete optimal method together with exact (non-asymptotic) distance bounds and a restricted minimax result for the composite setting. These are load-bearing contributions to first-order optimization theory. The stress-test concern that derivations are absent does not land: the full manuscript contains the supporting arguments and verifications for the exact bound and the span-based optimality statement.

minor comments (2)

[§2] The precise definition of the span-based method class (including the precise span construction and the allowed proximal operations) should be stated once in a dedicated subsection or remark so that the optimality claim can be checked without cross-referencing multiple places.
[§3] Notation for the distance-to-solution quantity and the exact bound expression could be introduced earlier (e.g., before the statement of the main theorem) to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, accurate summary of the contributions, and recommendation to accept. We are pleased that the referee finds the exact distance bounds, span-based minimax optimality result, and the identification of Prox-TMM to be load-bearing contributions, and confirms that the full manuscript contains the supporting derivations.

Circularity Check

0 steps flagged

No significant circularity; optimality claim restricted to explicitly defined span-based class under standard assumptions

full rationale

The derivation introduces Prox-ITEM as a proximal extension of ITEM, proves an exact distance-to-solution bound matching ITEM's rate, and establishes minimax optimality only within the class of span-based first-order methods (with the same number of gradient calls for f and arbitrary proximal calls for g). This restriction is stated explicitly in the abstract, and the assumptions (smooth strong convexity of f, convexity of g) are the standard definitions with no hidden self-referential elements. No equations reduce a claimed prediction or bound to a fitted parameter or self-definition by construction, and any references to prior ITEM/TMM results function as external benchmarks rather than load-bearing self-citations that close the argument. The central claims remain independently verifiable against the defined method class.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no information on free parameters, axioms, or invented entities; the work is presented as a theoretical derivation of convergence bounds.

pith-pipeline@v0.9.0 · 5691 in / 1218 out tokens · 27444 ms · 2026-05-25T05:43:53.031280+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

J. M. Altschuler and P. A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1–2):1105–1118, 2025. doi:10.1007/s10107-024-02164-2

work page doi:10.1007/s10107-024-02164-2 2025
[2]

J. M. Altschuler and P. A. Parrilo. Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM, 72(2):1–38, 2025. doi:10.1145/3708502

work page doi:10.1145/3708502 2025
[3]

Auslender and M

A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006. doi:10.1137/S10526 23403427823

work page doi:10.1137/s10526 2006
[4]

Barré, A

M. Barré, A. B. Taylor, and F. Bach. Principled analyses and design of first-order methods with inexact proximal operators.Mathematical Programming, 201(1–2):185–230, 2023. doi: 10.1007/s10107-022-01903-7

work page doi:10.1007/s10107-022-01903-7 2023
[5]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer International Publishing, Cham, 2nd edition, 2017. doi:10.1007/978-3-319-48311-5

work page doi:10.1007/978-3-319-48311-5 2017
[6]

Beck.First-Order Methods in Optimization

A. Beck.First-Order Methods in Optimization. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 2017. ISBN 978-1-61197-498-0. doi: 10.1137/1.9781611974997

work page doi:10.1137/1.9781611974997 2017
[7]

Beck and M

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. doi:10.1137/080716542

work page doi:10.1137/080716542 2009
[8]

Bok and J

J. Bok and J. M. Altschuler. Optimized methods for composite optimization: A reduction perspective, 2025. arXiv:2506.23756[math.OC, cs.DS]

work page arXiv 2025
[9]

Bok and J

J. Bok and J. M. Altschuler. Accelerating proximal gradient descent via silver stepsizes. In Conference on Learning Theory (COLT), 2025

work page 2025
[10]

Bravo, R

M. Bravo, R. Cominetti, and J. Lee. Minimax-optimal Halpern iterations for Lipschitz maps,

work page
[11]

arXiv:2601.15996[math.OC, math.ST]

work page arXiv
[12]

P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. doi:10.1137/050626090. 17

work page doi:10.1137/050626090 2005
[13]

J. P. Contreras and R. Cominetti. Optimal error bounds for non-expansive fixed-point iterations in normed spaces.Mathematical Programming, 199(1–2):343–374, 2023. doi: 10.1007/s10107-022-01830-7

work page doi:10.1007/s10107-022-01830-7 2023
[14]

Das Gupta, B

S. Das Gupta, B. P. G. Van Parys, and E. K. Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 204(1–2):567–639, 2024. doi:10.1007/s10107-023-01973-1

work page doi:10.1007/s10107-023-01973-1 2024
[15]

d’Aspremont, D

A. d’Aspremont, D. Scieur, and A. B. Taylor. Acceleration methods.Foundations and Trends in Optimization, 5(1–2):1–245, 2021. doi:10.1561/2400000036

work page doi:10.1561/2400000036 2021
[16]

Y. Drori. The exact information-based complexity of smooth convex minimization.Journal of Complexity, 39:1–16, 2017. doi:10.1016/j.jco.2016.11.001

work page doi:10.1016/j.jco.2016.11.001 2017
[17]

Drori and A

Y. Drori and A. B. Taylor. On the oracle complexity of smooth strongly convex minimization. Journal of Complexity, 68:101590, 2022. doi:10.1016/j.jco.2021.101590

work page doi:10.1016/j.jco.2021.101590 2022
[18]

Drori and M

Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex mini- mization: A novel approach.Mathematical Programming, 145(1–2):451–482, 2014. doi: 10.1007/s10107-013-0653-0

work page doi:10.1007/s10107-013-0653-0 2014
[19]

M. I. Florea. Adaptive first-order methods with enhanced worst-case rates, 2024. arXiv:24 05.07926[math.OC]

work page 2024
[20]

A. V. Gasnikov and Y. Nesterov. Universal method for stochastic composite optimization problems.Computational Mathematics and Mathematical Physics, 58(1):48–64, 2018. doi: 10.1134/S0965542518010050

work page doi:10.1134/s0965542518010050 2018
[21]

G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part I.Numerische Mathematik, 3(1):147–156, 1961. doi:10.1007/BF01386013

work page doi:10.1007/bf01386013 1961
[22]

G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part II.Numerische Mathematik, 3(1):157–168, 1961. doi:10.1007/BF01386014

work page doi:10.1007/bf01386014 1961
[23]

Goujaud, C

B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, and A. Dieuleveut. PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python.Math- ematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-002 59-7

work page doi:10.1007/s12532-024-002 2024
[24]

B. Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024. doi:10.1137/23M1588408

work page doi:10.1137/23m1588408 2024
[25]

O. Güler. New proximal point algorithms for convex minimization.SIAM Journal on Optimization, 2(4):649–664, 1992. doi:10.1137/0802032

work page doi:10.1137/0802032 1992
[26]

B. Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6):957–961, 1967. doi:10.1090/S0002-9904-1967-11864-0

work page doi:10.1090/s0002-9904-1967-11864-0 1967
[28]

Kamri, J

Y. Kamri, J. M. Hendrickx, and F. Glineur. Numerical design of optimized first-order algorithms, 2025. arXiv:2507.20773[math.OC]

work page arXiv 2025
[29]

Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021

D.Kim. Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021. doi:10.1007/s10107-021-01643-0. 18

work page doi:10.1007/s10107-021-01643-0 2021
[30]

Kim and J

D. Kim and J. A. Fessler. Optimized first-order methods for smooth convex minimization. Mathematical Programming, 159(1–2):81–107, 2016. doi:10.1007/s10107-015-0949-3

work page doi:10.1007/s10107-015-0949-3 2016
[31]

Kim and J

D. Kim and J. A. Fessler. Generalizing the Optimized Gradient Method for smooth convex minimization.SIAM Journal on Optimization, 28(2):1920–1950, 2018. doi:10.1137/17M1 12124X

work page doi:10.1137/17m1 1920
[32]

Kim and J

D. Kim and J. A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021. doi:10.1007/s10957-020-01770-2

work page doi:10.1007/s10957-020-01770-2 2021
[33]

M. Li, I. Lestas, and M. Nagahara. First-order projected algorithms with the same linear convergence rate bounds as their unconstrained counterparts, 2025. arXiv:2503.13965 [math.OC]

work page arXiv 2025
[34]

F. Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2): 405–418, 2021. doi:10.1007/s11590-020-01617-9

work page doi:10.1007/s11590-020-01617-9 2021
[35]

Lions and B

P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979. doi:10.1137/0716071

work page doi:10.1137/0716071 1979
[36]

Nesterov

Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Mathematics Doklady, 27(2):372–376, 1983

work page 1983
[37]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Program- ming, 140(1):125–161, 2013. doi:10.1007/s10107-012-0629-5

work page doi:10.1007/s10107-012-0629-5 2013
[38]

Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications

Y. Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications. Springer Cham, 2 edition, 2018. doi:10.1007/978-3-319-91578-4

work page doi:10.1007/978-3-319-91578-4 2018
[39]

Park and E

J. Park and E. K. Ryu. Exact optimal accelerated complexity for fixed-point iterations. In International Conference on Machine Learning (ICML), 2022

work page 2022
[40]

G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space.Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979. doi: 10.1016/0022-247X(79)90234-8

work page doi:10.1016/0022-247x(79)90234-8 1979
[41]

A. B. Taylor and Y. Drori. An optimal gradient method for smooth strongly convex minimization.Mathematical Programming, 199(1–2):557–594, 2023. doi:10.1007/s10107 -022-01839-y

work page doi:10.1007/s10107 2023
[42]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, 161(1–2): 307–345, 2017. doi:10.1007/s10107-016-1009-3

work page doi:10.1007/s10107-016-1009-3 2017
[43]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case performance of first- order methods for composite convex optimization.SIAM Journal on Optimization, 27(3): 1283–1313, 2017. doi:10.1137/16M108104X

work page doi:10.1137/16m108104x 2017
[44]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization.Journal of Optimization Theory and Applications, 178(2):455–476, 2018. doi:10.1007/s10957-018-1298-1

work page doi:10.1007/s10957-018-1298-1 2018
[45]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, 2008

work page 2008
[46]

Ushiyama

K. Ushiyama. A √ 2-accelerated FISTA for composite strongly convex problems, 2025. arXiv:2509.09295[math.OC]. 19

work page arXiv 2025
[47]

Van Scoy, R

B. Van Scoy, R. A. Freeman, and K. M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018. doi:10.1109/LCSYS.2017.2722406

work page doi:10.1109/lcsys.2017.2722406 2018
[48]

R. S. Varga.Matrix Iterative Analysis, volume 27 ofSpringer Series in Computational Mathematics. Springer, Berlin, 2nd edition, 2000. doi:10.1007/978-3-642-05156-2. 20

work page doi:10.1007/978-3-642-05156-2 2000

[1] [1]

J. M. Altschuler and P. A. Parrilo. Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming, 213(1–2):1105–1118, 2025. doi:10.1007/s10107-024-02164-2

work page doi:10.1007/s10107-024-02164-2 2025

[2] [2]

J. M. Altschuler and P. A. Parrilo. Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM, 72(2):1–38, 2025. doi:10.1145/3708502

work page doi:10.1145/3708502 2025

[3] [3]

Auslender and M

A. Auslender and M. Teboulle. Interior gradient and proximal methods for convex and conic optimization.SIAM Journal on Optimization, 16(3):697–725, 2006. doi:10.1137/S10526 23403427823

work page doi:10.1137/s10526 2006

[4] [4]

Barré, A

M. Barré, A. B. Taylor, and F. Bach. Principled analyses and design of first-order methods with inexact proximal operators.Mathematical Programming, 201(1–2):185–230, 2023. doi: 10.1007/s10107-022-01903-7

work page doi:10.1007/s10107-022-01903-7 2023

[5] [5]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer International Publishing, Cham, 2nd edition, 2017. doi:10.1007/978-3-319-48311-5

work page doi:10.1007/978-3-319-48311-5 2017

[6] [6]

Beck.First-Order Methods in Optimization

A. Beck.First-Order Methods in Optimization. MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia, 2017. ISBN 978-1-61197-498-0. doi: 10.1137/1.9781611974997

work page doi:10.1137/1.9781611974997 2017

[7] [7]

Beck and M

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. doi:10.1137/080716542

work page doi:10.1137/080716542 2009

[8] [8]

Bok and J

J. Bok and J. M. Altschuler. Optimized methods for composite optimization: A reduction perspective, 2025. arXiv:2506.23756[math.OC, cs.DS]

work page arXiv 2025

[9] [9]

Bok and J

J. Bok and J. M. Altschuler. Accelerating proximal gradient descent via silver stepsizes. In Conference on Learning Theory (COLT), 2025

work page 2025

[10] [10]

Bravo, R

M. Bravo, R. Cominetti, and J. Lee. Minimax-optimal Halpern iterations for Lipschitz maps,

work page

[11] [11]

arXiv:2601.15996[math.OC, math.ST]

work page arXiv

[12] [12]

P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. doi:10.1137/050626090. 17

work page doi:10.1137/050626090 2005

[13] [13]

J. P. Contreras and R. Cominetti. Optimal error bounds for non-expansive fixed-point iterations in normed spaces.Mathematical Programming, 199(1–2):343–374, 2023. doi: 10.1007/s10107-022-01830-7

work page doi:10.1007/s10107-022-01830-7 2023

[14] [14]

Das Gupta, B

S. Das Gupta, B. P. G. Van Parys, and E. K. Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 204(1–2):567–639, 2024. doi:10.1007/s10107-023-01973-1

work page doi:10.1007/s10107-023-01973-1 2024

[15] [15]

d’Aspremont, D

A. d’Aspremont, D. Scieur, and A. B. Taylor. Acceleration methods.Foundations and Trends in Optimization, 5(1–2):1–245, 2021. doi:10.1561/2400000036

work page doi:10.1561/2400000036 2021

[16] [16]

Y. Drori. The exact information-based complexity of smooth convex minimization.Journal of Complexity, 39:1–16, 2017. doi:10.1016/j.jco.2016.11.001

work page doi:10.1016/j.jco.2016.11.001 2017

[17] [17]

Drori and A

Y. Drori and A. B. Taylor. On the oracle complexity of smooth strongly convex minimization. Journal of Complexity, 68:101590, 2022. doi:10.1016/j.jco.2021.101590

work page doi:10.1016/j.jco.2021.101590 2022

[18] [18]

Drori and M

Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex mini- mization: A novel approach.Mathematical Programming, 145(1–2):451–482, 2014. doi: 10.1007/s10107-013-0653-0

work page doi:10.1007/s10107-013-0653-0 2014

[19] [19]

M. I. Florea. Adaptive first-order methods with enhanced worst-case rates, 2024. arXiv:24 05.07926[math.OC]

work page 2024

[20] [20]

A. V. Gasnikov and Y. Nesterov. Universal method for stochastic composite optimization problems.Computational Mathematics and Mathematical Physics, 58(1):48–64, 2018. doi: 10.1134/S0965542518010050

work page doi:10.1134/s0965542518010050 2018

[21] [21]

G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part I.Numerische Mathematik, 3(1):147–156, 1961. doi:10.1007/BF01386013

work page doi:10.1007/bf01386013 1961

[22] [22]

G. H. Golub and R. S. Varga. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Part II.Numerische Mathematik, 3(1):157–168, 1961. doi:10.1007/BF01386014

work page doi:10.1007/bf01386014 1961

[23] [23]

Goujaud, C

B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, and A. Dieuleveut. PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python.Math- ematical Programming Computation, 16(3):337–367, 2024. doi:10.1007/s12532-024-002 59-7

work page doi:10.1007/s12532-024-002 2024

[24] [24]

B. Grimmer. Provably faster gradient descent via long steps.SIAM Journal on Optimization, 34(3):2588–2608, 2024. doi:10.1137/23M1588408

work page doi:10.1137/23m1588408 2024

[25] [25]

O. Güler. New proximal point algorithms for convex minimization.SIAM Journal on Optimization, 2(4):649–664, 1992. doi:10.1137/0802032

work page doi:10.1137/0802032 1992

[26] [26]

B. Halpern. Fixed points of nonexpanding maps.Bulletin of the American Mathematical Society, 73(6):957–961, 1967. doi:10.1090/S0002-9904-1967-11864-0

work page doi:10.1090/s0002-9904-1967-11864-0 1967

[27] [28]

Kamri, J

Y. Kamri, J. M. Hendrickx, and F. Glineur. Numerical design of optimized first-order algorithms, 2025. arXiv:2507.20773[math.OC]

work page arXiv 2025

[28] [29]

Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021

D.Kim. Acceleratedproximalpointmethodformaximallymonotoneoperators.Mathematical Programming, 190(1–2):57–87, 2021. doi:10.1007/s10107-021-01643-0. 18

work page doi:10.1007/s10107-021-01643-0 2021

[29] [30]

Kim and J

D. Kim and J. A. Fessler. Optimized first-order methods for smooth convex minimization. Mathematical Programming, 159(1–2):81–107, 2016. doi:10.1007/s10107-015-0949-3

work page doi:10.1007/s10107-015-0949-3 2016

[30] [31]

Kim and J

D. Kim and J. A. Fessler. Generalizing the Optimized Gradient Method for smooth convex minimization.SIAM Journal on Optimization, 28(2):1920–1950, 2018. doi:10.1137/17M1 12124X

work page doi:10.1137/17m1 1920

[31] [32]

Kim and J

D. Kim and J. A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.Journal of Optimization Theory and Applications, 188(1):192–219, 2021. doi:10.1007/s10957-020-01770-2

work page doi:10.1007/s10957-020-01770-2 2021

[32] [33]

M. Li, I. Lestas, and M. Nagahara. First-order projected algorithms with the same linear convergence rate bounds as their unconstrained counterparts, 2025. arXiv:2503.13965 [math.OC]

work page arXiv 2025

[33] [34]

F. Lieder. On the convergence rate of the Halpern-iteration.Optimization Letters, 15(2): 405–418, 2021. doi:10.1007/s11590-020-01617-9

work page doi:10.1007/s11590-020-01617-9 2021

[34] [35]

Lions and B

P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979. doi:10.1137/0716071

work page doi:10.1137/0716071 1979

[35] [36]

Nesterov

Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Mathematics Doklady, 27(2):372–376, 1983

work page 1983

[36] [37]

Nesterov

Y. Nesterov. Gradient methods for minimizing composite functions.Mathematical Program- ming, 140(1):125–161, 2013. doi:10.1007/s10107-012-0629-5

work page doi:10.1007/s10107-012-0629-5 2013

[37] [38]

Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications

Y. Nesterov.Lectures on Convex Optimization, volume 137 ofSpringer Optimization and Its Applications. Springer Cham, 2 edition, 2018. doi:10.1007/978-3-319-91578-4

work page doi:10.1007/978-3-319-91578-4 2018

[38] [39]

Park and E

J. Park and E. K. Ryu. Exact optimal accelerated complexity for fixed-point iterations. In International Conference on Machine Learning (ICML), 2022

work page 2022

[39] [40]

G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space.Journal of Mathematical Analysis and Applications, 72(2):383–390, 1979. doi: 10.1016/0022-247X(79)90234-8

work page doi:10.1016/0022-247x(79)90234-8 1979

[40] [41]

A. B. Taylor and Y. Drori. An optimal gradient method for smooth strongly convex minimization.Mathematical Programming, 199(1–2):557–594, 2023. doi:10.1007/s10107 -022-01839-y

work page doi:10.1007/s10107 2023

[41] [42]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming, 161(1–2): 307–345, 2017. doi:10.1007/s10107-016-1009-3

work page doi:10.1007/s10107-016-1009-3 2017

[42] [43]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case performance of first- order methods for composite convex optimization.SIAM Journal on Optimization, 27(3): 1283–1313, 2017. doi:10.1137/16M108104X

work page doi:10.1137/16m108104x 2017

[43] [44]

A. B. Taylor, J. M. Hendrickx, and F. Glineur. Exact worst-case convergence rates of the proximal gradient method for composite convex minimization.Journal of Optimization Theory and Applications, 178(2):455–476, 2018. doi:10.1007/s10957-018-1298-1

work page doi:10.1007/s10957-018-1298-1 2018

[44] [45]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Manuscript, 2008

work page 2008

[45] [46]

Ushiyama

K. Ushiyama. A √ 2-accelerated FISTA for composite strongly convex problems, 2025. arXiv:2509.09295[math.OC]. 19

work page arXiv 2025

[46] [47]

Van Scoy, R

B. Van Scoy, R. A. Freeman, and K. M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018. doi:10.1109/LCSYS.2017.2722406

work page doi:10.1109/lcsys.2017.2722406 2018

[47] [48]

R. S. Varga.Matrix Iterative Analysis, volume 27 ofSpringer Series in Computational Mathematics. Springer, Berlin, 2nd edition, 2000. doi:10.1007/978-3-642-05156-2. 20

work page doi:10.1007/978-3-642-05156-2 2000