Beyond Minimax Optimality: A Subgame Perfect Gradient Method

Alex L. Wang; Benjamin Grimmer; Kevin Shu

arxiv: 2412.06731 · v4 · submitted 2024-12-09 · 🧮 math.OC

Beyond Minimax Optimality: A Subgame Perfect Gradient Method

Benjamin Grimmer , Kevin Shu , Alex L. Wang This is my paper

Pith reviewed 2026-05-23 07:32 UTC · model grok-4.3

classification 🧮 math.OC

keywords convex optimizationfirst-order methodsdynamic optimalitysubgame perfect equilibriumoptimized gradient methodconvergence ratesworst-case analysis

0 comments

The pith

The Subgame Perfect Gradient Method is dynamically optimal at every iteration given the first-order information observed so far.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the Subgame Perfect Gradient Method as a refinement of the Optimized Gradient Method that incorporates the full history of gradients and function values at each step. It establishes that this method is dynamically optimal by showing it forms a subgame perfect equilibrium, so that no other first-order method can guarantee a strictly better convergence rate on every smooth convex function consistent with the data revealed up to that point. This moves past the classic minimax setting, where the worst-case rate is fixed in advance regardless of what the algorithm learns during execution. A sympathetic reader would care because the result suggests first-order methods can deliver stronger, information-dependent guarantees rather than always preparing for the single hardest function.

Core claim

SPGM refines OGM to make use of the full history of first-order information and is dynamically optimal in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration, with this notion formalized using the game-theoretic concept of a subgame perfect equilibrium.

What carries the argument

The Subgame Perfect Gradient Method (SPGM), a first-order iteration that at each step solves for the update guaranteeing the best possible remaining convergence rate against all functions consistent with the observed data.

If this is right

At every iteration SPGM's convergence bound is the best possible among all first-order methods given the history.
SPGM therefore improves on OGM once any nontrivial first-order information has been collected.
Numerical experiments confirm that SPGM strongly outperforms OGM on standard test problems.
The same dynamic-optimality argument applies to any problem class where first-order information can be revealed incrementally.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same equilibrium construction could be used to derive optimal methods for problems where only stochastic or inexact gradients are available.
It may be possible to replace the fixed minimax analysis of many existing methods with a sequence of subgame equilibria that tighten as data arrives.
The approach suggests designing new methods by recursively solving the equilibrium problem rather than a single static optimization over step coefficients.

Load-bearing premise

Modeling the choice of first-order methods as a game in which dynamic optimality is captured exactly by subgame perfect equilibrium identifies when one method is strictly better than others given the same information.

What would settle it

A concrete first-order method together with a smooth convex function that matches all observed first-order information up to iteration k but from that point achieves a strictly better convergence rate than SPGM on the remaining possible functions.

Figures

Figures reproduced from arXiv: 2412.06731 by Alex L. Wang, Benjamin Grimmer, Kevin Shu.

**Figure 1.** Figure 1: Left: The first five iterates of OGM on f(x) = Lx2/2 with x0 = 1. OGM produces x4 ≈ 0.304. Right: The first three iterates of SPGM on f(x) = Lx2/2 with x0 = 1. After seeing the history H = {(x0, f0, g0),(x1, f1, g1)}, SPGM determines that 0 is a minimizer for any L-smooth convex function agreeing with H. In effect, the history H completely determines the function f on the interval [x0, x1]. fhard so that f… view at source ↗

**Figure 2.** Figure 2: Two representative plots of convergence of [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Two representative plots of 1/τn,N as a function of n as SPGM-10 and SPGM are run on Least Squares Regression problems of dimension d = 512, m = 2048. OGM displays its constant guarantee, 1/τ0,N , providing the baseline, non-adaptive guarantee. Left instance is of the form (18) with N = 300 and right instance is of the form (21) with N = 30. 6.2 A sample of randomly generated problem instances We consider … view at source ↗

**Figure 4.** Figure 4: Performance comparison over 42 randomly generated instances for the problems (18)–(23). From left to right, the target accuracy ranges as 10−3 , 10−6 , 10−9 . First, we consider (regularized) least squares regression problems defined in the following four ways: min x∈Rd f(x) = 1 m ∥Ax − b∥ 2 2 , (18) min x∈Rd f(x) = 1 m ∥Ax − b∥ 2 2 + 1 2 ∥x∥ 2 2 , (19) min x∈Rd f(x) = 1 m ∥Ax − b∥ 2 2 + h100(∥x∥2), (20) m… view at source ↗

**Figure 5.** Figure 5: Performance comparison over twelve instances derived from LIBSVM data for the regression [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

read the original abstract

The study of convex optimization has historically been concerned with worst-case convergence rates. The development of the Optimized Gradient Method (OGM), due to \citet{drori2012PerformanceOF,Kim2016optimal}, marked a major milestone in this study, as OGM achieves the optimal worst-case convergence rate among all first-order methods for unconstrained smooth convex optimization. In order to examine the possibility of obtaining stronger convergence guarantees, we will consider algorithms with \emph{dynamic} convergence rates, which may improve as additional first-order information is revealed. Our main contribution is the development of an algorithm, the Subgame Perfect Gradient Method (SPGM), which refines OGM to make use of the full history of first-order information. We show that SPGM is \emph{dynamically optimal}, in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration. We formalize this notion of dynamic optimality using the game-theoretic notion of a subgame perfect equilibrium. We conclude our study with preliminary numerical experiments showing that SPGM strongly outperforms OGM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPGM adds a dynamic optimality notion via subgame perfect equilibrium on information-consistent functions, which is a real departure from static OGM rates, though the full proofs and experiment details need checking.

read the letter

The paper's main move is to replace the usual minimax rate with a dynamic one: at each step, SPGM is optimal among first-order methods on the set of functions that match the gradients seen so far. They formalize this with subgame perfect equilibrium and build SPGM by extending OGM to use the full history rather than just the last iterate or two. That framing is new relative to the OGM papers they cite, which stay inside fixed worst-case bounds. The abstract states the claim cleanly and the preliminary runs show SPGM pulling ahead of OGM on the tested problems. Those are the concrete advances. The central derivation is not visible here, so the key question is whether the equilibrium argument actually rules out every competing first-order method on every consistent function; that step carries the load. The experiments are labeled preliminary, which usually means limited test functions and no exhaustive ablation of the history mechanism. If the proofs close the gap without hidden assumptions on the function class, the result stands; otherwise the dynamic claim reduces to a restatement of OGM with extra bookkeeping. This work is aimed at people already deep in optimal first-order methods who want to think about adaptivity to observed information. It is coherent enough on its own terms to go to referees rather than desk reject, even if the proofs will need careful scrutiny and the numerics will need expansion.

Referee Report

0 major / 1 minor

Summary. The paper introduces the Subgame Perfect Gradient Method (SPGM) as a refinement of the Optimized Gradient Method (OGM) that uses the full history of first-order information. It claims SPGM is dynamically optimal: at each iteration, no other first-order method can guarantee a strictly better convergence rate on all functions consistent with the observed gradients up to that point. This notion is formalized via subgame perfect equilibrium in an appropriate game. The work concludes with preliminary numerical experiments indicating that SPGM outperforms OGM.

Significance. If the dynamic optimality result holds, the paper advances convex optimization beyond static minimax rates by introducing an adaptive, information-dependent optimality concept with a rigorous game-theoretic foundation. The explicit construction of SPGM that realizes subgame-perfect optimality and the preliminary experiments showing practical gains are strengths. This framing could guide development of adaptive first-order methods that improve as more gradient information is revealed.

minor comments (1)

[Abstract] Abstract and § on experiments: the description of the numerical results is labeled 'preliminary'; adding details on the specific test functions, problem dimensions, stopping criteria, and quantitative metrics (e.g., iteration counts or function-value gaps) would make the empirical support more reproducible and proportionate to the theoretical claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the paper, recognition of the significance of the dynamic optimality framework, and recommendation for minor revision. We are glad that the game-theoretic formulation and the potential for adaptive first-order methods were viewed favorably.

Circularity Check

0 steps flagged

No significant circularity: dynamic optimality is independently defined

full rationale

The paper's core claim defines dynamic optimality via subgame-perfect equilibrium on the class of functions consistent with observed first-order information; this game-theoretic notion is introduced as a new formalization and does not reduce by construction to the OGM rates or any fitted parameters. SPGM is constructed by refining the known OGM (externally cited) to incorporate full history, but the optimality proof is tied directly to the subgame definition rather than renaming or self-referentially fitting prior results. No self-citation load-bearing steps, ansatz smuggling, or uniqueness theorems imported from the authors appear in the derivation chain. The contribution remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper rests on standard domain assumptions of smooth convex optimization and introduces a new algorithmic construction; no free parameters or invented physical entities are indicated in the abstract.

axioms (2)

domain assumption The objective function is smooth and convex
Standard assumption required for first-order methods and OGM comparison.
domain assumption First-order information consists of function values and gradients at queried points
Implicit in the description of history-based refinement of OGM.

invented entities (1)

Subgame Perfect Gradient Method (SPGM) no independent evidence
purpose: To realize dynamic optimality using full first-order history
New algorithm introduced to achieve the stated dynamic optimality property.

pith-pipeline@v0.9.0 · 5731 in / 1428 out tokens · 28827 ms · 2026-05-23T07:32:26.443914+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging I: Multi-step descent and the silver stepsize schedule. 2023

work page 2023
[2]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging II: Silver stepsize schedule for smooth convex optimization. 2023

work page 2023
[3]

URL https://docs.mosek.com/latest/ juliaapi/index.html

MOSEK ApS.Optimizer API for Julia 10.2.10, 2019. URL https://docs.mosek.com/latest/ juliaapi/index.html

work page 2019
[4]

Taylor, and Francis Bach

Mathieu Barré, Adrien B. Taylor, and Francis Bach. Principled analyses and design of first-order methods with inexact proximal operators.Math. Program., 201(1–2):185–230, December 2022. ISSN 0025-5610

work page 2022
[5]

SIAM, Philadelphia, PA, 2001

Aharon Ben-Tal and Arkadi Nemirovski.Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM, Philadelphia, PA, 2001

work page 2001
[6]

C. G. Broyden. The convergence of a class of double-rank minimization algorithms.Journal of the Institute of Mathematics and Its Applications, 6(1):76–90, 1970

work page 1970
[7]

Méthode générale pour la résolution des systemes d’équations simultanées.Comp

Augustin Cauchy et al. Méthode générale pour la résolution des systemes d’équations simultanées.Comp. Rend. Sci. Paris, 25(1847):536–538, 1847

work page
[8]

LIBSVM: A library for support vector machines.ACM Trans

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3), May 2011

work page 2011
[9]

A robust accelerated optimization algorithm for strongly convex functions

Saman Cyrus, Bin Hu, Bryan Van Scoy, and Laurent Lessard. A robust accelerated optimization algorithm for strongly convex functions. In2018 Annual American Control Conference (ACC), pages 1376–1381, 2018

work page 2018
[10]

Optimal convergence rates for the proximal bundle method.SIAM Journal on Optimization, 33(2):424–454, 2023

Mateo Díaz and Benjamin Grimmer. Optimal convergence rates for the proximal bundle method.SIAM Journal on Optimization, 33(2):424–454, 2023

work page 2023
[11]

The exact information-based complexity of smooth convex minimization.J

Yoel Drori. The exact information-based complexity of smooth convex minimization.J. Complex., 39: 1–16, 2017

work page 2017
[12]

Yoel Drori and Adrien B. Taylor. Efficient first-order methods for convex minimization: a constructive approach.Mathematical Programming, 184:183 – 220, 2018

work page 2018
[13]

Yoel Drori and Adrien B. Taylor. On the oracle complexity of smooth strongly convex minimization.J. Complex., 68:101590, 2021

work page 2021
[14]

Performance of first-order methods for smooth convex minimization: a novel approach.Math

Yoel Drori and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach.Math. Program., 145:451–482, 2012

work page 2012
[15]

An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016

Yoel Drori and Marc Teboulle. An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016

work page 2016
[16]

Rate of Convergence of the Bundle Method.J

Yu Du and Andrzej Ruszczyński. Rate of Convergence of the Bundle Method.J. Optim. Theory Appl., 173(3):908–922, June 2017. ISSN 0022-3239. 25

work page 2017
[17]

Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

Alexandre d’Aspremont, Damien Scieur, Adrien Taylor, et al. Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

work page 2021
[18]

On the acceleration of proximal bundle methods

David Fersztand and Xu Andy Sun. On the acceleration of proximal bundle methods. 2025

work page 2025
[19]

A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970

Roger Fletcher. A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970

work page 1970
[20]

Mihai I. Florea. An efficient accelerated gradient method with memory applicable to composite problems. In2021 International Aegean Conference on Electrical Machines and Power Electronics (ACEMP) and 2021 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), pages 473–480, 2021

work page 2021
[21]

Mihai I. Florea. Exact gradient methods with memory.Optimization Methods and Software, 37(6): 2310–2337, 2022. PMID: 39906221

work page 2022
[22]

Mihai I. Florea. Adaptive first-order methods with enhanced worst-case rates. 2024

work page 2024
[23]

Florea and Yurii E

Mihai I. Florea and Yurii E. Nesterov. An optimal lower bound for smooth convex functions.Foundations of Computational Mathematics, 2025. Published online 25 June 2025

work page 2025
[24]

A family of variable metric updates derived by variational means.Mathematics of Computation, 24(109):23–26, 1970

Donald Goldfarb. A family of variable metric updates derived by variational means.Mathematics of Computation, 24(109):23–26, 1970

work page 1970
[25]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated objective gap and gradient norm convergence for gradient descent via long steps. 2024

work page 2024
[26]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Composing optimized stepsize schedules for gradient descent. 2024

work page 2024
[27]

Van Parys, and Ernest Ryu

Shuvomoy Das Gupta, Bart P.G. Van Parys, and Ernest Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods.Math. Program., 2023

work page 2023
[28]

A Redistributed Proximal Bundle Method for Nonconvex Optimization.SIAM Journal on Optimization, 20(5):2442–2473, 2010

Warren Hare and Claudia Sagastizábal. A Redistributed Proximal Bundle Method for Nonconvex Optimization.SIAM Journal on Optimization, 20(5):2442–2473, 2010. ISSN 1052-6234

work page 2010
[29]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1993

Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal.Acceleration of the Cutting-Plane Algorithm: Primal Forms of Bundle Methods, pages 275–330. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993

work page 1993
[30]

Uijeong Jang, Shuvomoy Das Gupta, and Ernest K. Ryu. Computer-assisted design of accelerated composite optimization methods: Optista. 2024

work page 2024
[31]

Accelerated proximal point method for maximally monotone operators.Math

Donghwan Kim. Accelerated proximal point method for maximally monotone operators.Math. Program., 190(1–2):57–87, November 2021. ISSN 0025-5610

work page 2021
[32]

Donghwan Kim and Jeffrey A. Fessler. Optimized first-order methods for smooth convex minimization. Math. Program., 159(1–2):81–107, 2016

work page 2016
[33]

On the convergence analysis of the optimized gradient method.J

Donghwan Kim and Jeffrey A Fessler. On the convergence analysis of the optimized gradient method.J. Optim. Theory Appl., 172(1):187–205, 2017

work page 2017
[34]

Donghwan Kim and Jeffrey A. Fessler. Another look at the fast iterative shrinkage/thresholding algorithm (fista).SIAM Journal on Optimization, 28(1):223–250, 2018

work page 2018
[35]

Donghwan Kim and Jeffrey A. Fessler. Adaptive restart of the optimized gradient method for convex optimization.J. Optim. Theory Appl., 178(1):240–263, July 2018. ISSN 0022-3239

work page 2018
[36]

Donghwan Kim and Jeffrey A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.J. Optim. Theory Appl., 188(1):192–219, 2021. 26

work page 2021
[37]

A proof of exact convergence rate of gradient descent

Jungbin Kim. A proof of exact convergence rate of gradient descent. Part II. Performance criterion (f(xn)−f∗)/∥x0−x∗∥2. 2024

work page 2024
[38]

Krzysztof C. Kiwiel. An Aggregate Subgradient Method for Nonsmooth Convex Minimization.Math. Program., 27(3):320–341, October 1983. ISSN 0025-5610

work page 1983
[39]

Bundle-Level Type Methods Uniformly Optimal For Smooth And Nonsmooth Convex Optimization.Mathematical Programming, 149(1):1–45, Feb 2015

Guanghui Lan. Bundle-Level Type Methods Uniformly Optimal For Smooth And Nonsmooth Convex Optimization.Mathematical Programming, 149(1):1–45, Feb 2015. ISSN 1436-4646

work page 2015
[40]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1975

Claude Lemaréchal.An Extension of Davidon Methods to Nondifferentiable Problems, pages 95–109. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4

work page 1975
[41]

A simple uniformly optimal method without line search for convex optimization: T

Tianjiao Li and Guanghui Lan. A simple uniformly optimal method without line search for convex optimization: T. li, g. lan.Mathematical Programming, pages 1–38, 2025

work page 2025
[42]

Jiaming Liang and Renato D. C. Monteiro. A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes.SIAM Journal on Optimization, 31(4):2955–2986, 2021

work page 2021
[43]

On the limited memory BFGS method for large scale optimization

Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45(1):503–528, 1989

work page 1989
[44]

JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math

Miles Lubin, Oscar Dowson, Joaquim Dias Garcia, Joey Huchette, Benoît Legat, and Juan Pablo Vielma. JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math. Program. Comput., 2023

work page 2023
[45]

Adaptive gradient descent without descent

Yura Malitsky and Konstantin Mishchenko. Adaptive gradient descent without descent. 2019

work page 2019
[46]

Optim: A mathematical optimization package for Julia.J

Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.J. Open Source Softw., 3(24):615, 2018

work page 2018
[47]

A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley- Interscience, New York, 1983

work page 1983
[48]

Information-based complexity of linear operator equations.J

Arkadi S Nemirovsky. Information-based complexity of linear operator equations.J. Complex., 8(2): 153–175, 1992

work page 1992
[49]

Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

Yu Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

work page 2015
[50]

A method for solving the convex programming problem with convergence rateO(1/k2)

Yurii Nesterov. A method for solving the convex programming problem with convergence rateO(1/k2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983

work page 1983
[51]

Yurii Nesterov and Mihai I. Florea. Gradient methods with memory.Optimization Methods and Software, 37(3):936–953, 2022

work page 2022
[52]

An introduction to game theory.Oxford University Press, 2:672–713, 2004

Martin J Osborne. An introduction to game theory.Oxford University Press, 2:672–713, 2004

work page 2004
[53]

Chanwoo Park, Jisun Park, and Ernest K. Ryu. Factor- √ 2 acceleration of accelerated gradient methods. Applied Mathematics & Optimization, 88:1–38, 2021

work page 2021
[54]

Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions

Teodor Rotaru, François Glineur, and Panagiotis Patrinos. Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions. 2024

work page 2024
[55]

Princeton University Press, Princeton, NJ, USA, 2006

Andrzej Ruszczynski.Nonlinear Optimization. Princeton University Press, Princeton, NJ, USA, 2006. ISBN 0691119155, 9780691119151

work page 2006
[56]

David F. Shanno. Conditioning of quasi-newton methods for function minimization.Mathematics of Computation, 24(111):647–656, 1970. 27

work page 1970
[57]

Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions

Adrien Taylor and Francis Bach. Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Alina Beygelzimer and Daniel Hsu, editors,Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 ofProceedings of Machine Learning Research, pages 2934–2992, Phoenix, AZ, USA, 25–28 Jun 2019. PMLR

work page 2019
[58]

An optimal gradient method for smooth strongly convex minimization

Adrien Taylor and Yoel Drori. An optimal gradient method for smooth strongly convex minimization. Math. Program., 199(1–2):557–594, jun 2022. ISSN 0025-5610

work page 2022
[59]

Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math

Adrien Taylor, Julien Hendrickx, and François Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math. Program., 161:307–345, 2017

work page 2017
[60]

Exact worst-case performance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313, 2017

Adrien Taylor, Julien Hendrickx, and François Glineur. Exact worst-case performance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313, 2017

work page 2017
[61]

Freeman, and Kevin M

Bryan Van Scoy, Randy A. Freeman, and Kevin M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018

work page 2018
[62]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1975

Philip Wolfe.A Method of Conjugate Subgradients for Minimizing Nondifferentiable Functions, pages 145–173. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4

work page 1975
[63]

Accelerated gradient descent by concatenation of stepsize schedules

Zehao Zhang and Rujun Jiang. Accelerated gradient descent by concatenation of stepsize schedules. 2024

work page 2024
[64]

Anytime acceleration of gradient descent

Zihan Zhang, Jason D Lee, Simon S Du, and Yuxin Chen. Anytime acceleration of gradient descent. 2024. A Deferred proofs Proof of Lemma 2.We expand f⋆−f+ 0 −L 4∥z1−x⋆∥2 + L 4∥x0−x⋆∥2 =f ⋆−f+ 0 −1 L∥g0∥2 +⟨x0−x⋆,g 0⟩ =f ⋆−f+ 0 − ⟨ g0,x⋆−x+ 0 ⟩ . The expression on the final line isQ⋆,0which is nonnegative by Lemma 1.■ Proof of Lemma 3.By assumption,zis an au...

work page 2024

[1] [1]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging I: Multi-step descent and the silver stepsize schedule. 2023

work page 2023

[2] [2]

Altschuler and Pablo A

Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging II: Silver stepsize schedule for smooth convex optimization. 2023

work page 2023

[3] [3]

URL https://docs.mosek.com/latest/ juliaapi/index.html

MOSEK ApS.Optimizer API for Julia 10.2.10, 2019. URL https://docs.mosek.com/latest/ juliaapi/index.html

work page 2019

[4] [4]

Taylor, and Francis Bach

Mathieu Barré, Adrien B. Taylor, and Francis Bach. Principled analyses and design of first-order methods with inexact proximal operators.Math. Program., 201(1–2):185–230, December 2022. ISSN 0025-5610

work page 2022

[5] [5]

SIAM, Philadelphia, PA, 2001

Aharon Ben-Tal and Arkadi Nemirovski.Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM, Philadelphia, PA, 2001

work page 2001

[6] [6]

C. G. Broyden. The convergence of a class of double-rank minimization algorithms.Journal of the Institute of Mathematics and Its Applications, 6(1):76–90, 1970

work page 1970

[7] [7]

Méthode générale pour la résolution des systemes d’équations simultanées.Comp

Augustin Cauchy et al. Méthode générale pour la résolution des systemes d’équations simultanées.Comp. Rend. Sci. Paris, 25(1847):536–538, 1847

work page

[8] [8]

LIBSVM: A library for support vector machines.ACM Trans

Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3), May 2011

work page 2011

[9] [9]

A robust accelerated optimization algorithm for strongly convex functions

Saman Cyrus, Bin Hu, Bryan Van Scoy, and Laurent Lessard. A robust accelerated optimization algorithm for strongly convex functions. In2018 Annual American Control Conference (ACC), pages 1376–1381, 2018

work page 2018

[10] [10]

Optimal convergence rates for the proximal bundle method.SIAM Journal on Optimization, 33(2):424–454, 2023

Mateo Díaz and Benjamin Grimmer. Optimal convergence rates for the proximal bundle method.SIAM Journal on Optimization, 33(2):424–454, 2023

work page 2023

[11] [11]

The exact information-based complexity of smooth convex minimization.J

Yoel Drori. The exact information-based complexity of smooth convex minimization.J. Complex., 39: 1–16, 2017

work page 2017

[12] [12]

Yoel Drori and Adrien B. Taylor. Efficient first-order methods for convex minimization: a constructive approach.Mathematical Programming, 184:183 – 220, 2018

work page 2018

[13] [13]

Yoel Drori and Adrien B. Taylor. On the oracle complexity of smooth strongly convex minimization.J. Complex., 68:101590, 2021

work page 2021

[14] [14]

Performance of first-order methods for smooth convex minimization: a novel approach.Math

Yoel Drori and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach.Math. Program., 145:451–482, 2012

work page 2012

[15] [15]

An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016

Yoel Drori and Marc Teboulle. An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016

work page 2016

[16] [16]

Rate of Convergence of the Bundle Method.J

Yu Du and Andrzej Ruszczyński. Rate of Convergence of the Bundle Method.J. Optim. Theory Appl., 173(3):908–922, June 2017. ISSN 0022-3239. 25

work page 2017

[17] [17]

Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

Alexandre d’Aspremont, Damien Scieur, Adrien Taylor, et al. Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021

work page 2021

[18] [18]

On the acceleration of proximal bundle methods

David Fersztand and Xu Andy Sun. On the acceleration of proximal bundle methods. 2025

work page 2025

[19] [19]

A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970

Roger Fletcher. A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970

work page 1970

[20] [20]

Mihai I. Florea. An efficient accelerated gradient method with memory applicable to composite problems. In2021 International Aegean Conference on Electrical Machines and Power Electronics (ACEMP) and 2021 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), pages 473–480, 2021

work page 2021

[21] [21]

Mihai I. Florea. Exact gradient methods with memory.Optimization Methods and Software, 37(6): 2310–2337, 2022. PMID: 39906221

work page 2022

[22] [22]

Mihai I. Florea. Adaptive first-order methods with enhanced worst-case rates. 2024

work page 2024

[23] [23]

Florea and Yurii E

Mihai I. Florea and Yurii E. Nesterov. An optimal lower bound for smooth convex functions.Foundations of Computational Mathematics, 2025. Published online 25 June 2025

work page 2025

[24] [24]

A family of variable metric updates derived by variational means.Mathematics of Computation, 24(109):23–26, 1970

Donald Goldfarb. A family of variable metric updates derived by variational means.Mathematics of Computation, 24(109):23–26, 1970

work page 1970

[25] [25]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated objective gap and gradient norm convergence for gradient descent via long steps. 2024

work page 2024

[26] [26]

Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Composing optimized stepsize schedules for gradient descent. 2024

work page 2024

[27] [27]

Van Parys, and Ernest Ryu

Shuvomoy Das Gupta, Bart P.G. Van Parys, and Ernest Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods.Math. Program., 2023

work page 2023

[28] [28]

A Redistributed Proximal Bundle Method for Nonconvex Optimization.SIAM Journal on Optimization, 20(5):2442–2473, 2010

Warren Hare and Claudia Sagastizábal. A Redistributed Proximal Bundle Method for Nonconvex Optimization.SIAM Journal on Optimization, 20(5):2442–2473, 2010. ISSN 1052-6234

work page 2010

[29] [29]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1993

Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal.Acceleration of the Cutting-Plane Algorithm: Primal Forms of Bundle Methods, pages 275–330. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993

work page 1993

[30] [30]

Uijeong Jang, Shuvomoy Das Gupta, and Ernest K. Ryu. Computer-assisted design of accelerated composite optimization methods: Optista. 2024

work page 2024

[31] [31]

Accelerated proximal point method for maximally monotone operators.Math

Donghwan Kim. Accelerated proximal point method for maximally monotone operators.Math. Program., 190(1–2):57–87, November 2021. ISSN 0025-5610

work page 2021

[32] [32]

Donghwan Kim and Jeffrey A. Fessler. Optimized first-order methods for smooth convex minimization. Math. Program., 159(1–2):81–107, 2016

work page 2016

[33] [33]

On the convergence analysis of the optimized gradient method.J

Donghwan Kim and Jeffrey A Fessler. On the convergence analysis of the optimized gradient method.J. Optim. Theory Appl., 172(1):187–205, 2017

work page 2017

[34] [34]

Donghwan Kim and Jeffrey A. Fessler. Another look at the fast iterative shrinkage/thresholding algorithm (fista).SIAM Journal on Optimization, 28(1):223–250, 2018

work page 2018

[35] [35]

Donghwan Kim and Jeffrey A. Fessler. Adaptive restart of the optimized gradient method for convex optimization.J. Optim. Theory Appl., 178(1):240–263, July 2018. ISSN 0022-3239

work page 2018

[36] [36]

Donghwan Kim and Jeffrey A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.J. Optim. Theory Appl., 188(1):192–219, 2021. 26

work page 2021

[37] [37]

A proof of exact convergence rate of gradient descent

Jungbin Kim. A proof of exact convergence rate of gradient descent. Part II. Performance criterion (f(xn)−f∗)/∥x0−x∗∥2. 2024

work page 2024

[38] [38]

Krzysztof C. Kiwiel. An Aggregate Subgradient Method for Nonsmooth Convex Minimization.Math. Program., 27(3):320–341, October 1983. ISSN 0025-5610

work page 1983

[39] [39]

Bundle-Level Type Methods Uniformly Optimal For Smooth And Nonsmooth Convex Optimization.Mathematical Programming, 149(1):1–45, Feb 2015

Guanghui Lan. Bundle-Level Type Methods Uniformly Optimal For Smooth And Nonsmooth Convex Optimization.Mathematical Programming, 149(1):1–45, Feb 2015. ISSN 1436-4646

work page 2015

[40] [40]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1975

Claude Lemaréchal.An Extension of Davidon Methods to Nondifferentiable Problems, pages 95–109. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4

work page 1975

[41] [41]

A simple uniformly optimal method without line search for convex optimization: T

Tianjiao Li and Guanghui Lan. A simple uniformly optimal method without line search for convex optimization: T. li, g. lan.Mathematical Programming, pages 1–38, 2025

work page 2025

[42] [42]

Jiaming Liang and Renato D. C. Monteiro. A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes.SIAM Journal on Optimization, 31(4):2955–2986, 2021

work page 2021

[43] [43]

On the limited memory BFGS method for large scale optimization

Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45(1):503–528, 1989

work page 1989

[44] [44]

JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math

Miles Lubin, Oscar Dowson, Joaquim Dias Garcia, Joey Huchette, Benoît Legat, and Juan Pablo Vielma. JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math. Program. Comput., 2023

work page 2023

[45] [45]

Adaptive gradient descent without descent

Yura Malitsky and Konstantin Mishchenko. Adaptive gradient descent without descent. 2019

work page 2019

[46] [46]

Optim: A mathematical optimization package for Julia.J

Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.J. Open Source Softw., 3(24):615, 2018

work page 2018

[47] [47]

A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley- Interscience, New York, 1983

work page 1983

[48] [48]

Information-based complexity of linear operator equations.J

Arkadi S Nemirovsky. Information-based complexity of linear operator equations.J. Complex., 8(2): 153–175, 1992

work page 1992

[49] [49]

Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

Yu Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015

work page 2015

[50] [50]

A method for solving the convex programming problem with convergence rateO(1/k2)

Yurii Nesterov. A method for solving the convex programming problem with convergence rateO(1/k2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983

work page 1983

[51] [51]

Yurii Nesterov and Mihai I. Florea. Gradient methods with memory.Optimization Methods and Software, 37(3):936–953, 2022

work page 2022

[52] [52]

An introduction to game theory.Oxford University Press, 2:672–713, 2004

Martin J Osborne. An introduction to game theory.Oxford University Press, 2:672–713, 2004

work page 2004

[53] [53]

Chanwoo Park, Jisun Park, and Ernest K. Ryu. Factor- √ 2 acceleration of accelerated gradient methods. Applied Mathematics & Optimization, 88:1–38, 2021

work page 2021

[54] [54]

Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions

Teodor Rotaru, François Glineur, and Panagiotis Patrinos. Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions. 2024

work page 2024

[55] [55]

Princeton University Press, Princeton, NJ, USA, 2006

Andrzej Ruszczynski.Nonlinear Optimization. Princeton University Press, Princeton, NJ, USA, 2006. ISBN 0691119155, 9780691119151

work page 2006

[56] [56]

David F. Shanno. Conditioning of quasi-newton methods for function minimization.Mathematics of Computation, 24(111):647–656, 1970. 27

work page 1970

[57] [57]

Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions

Adrien Taylor and Francis Bach. Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Alina Beygelzimer and Daniel Hsu, editors,Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 ofProceedings of Machine Learning Research, pages 2934–2992, Phoenix, AZ, USA, 25–28 Jun 2019. PMLR

work page 2019

[58] [58]

An optimal gradient method for smooth strongly convex minimization

Adrien Taylor and Yoel Drori. An optimal gradient method for smooth strongly convex minimization. Math. Program., 199(1–2):557–594, jun 2022. ISSN 0025-5610

work page 2022

[59] [59]

Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math

Adrien Taylor, Julien Hendrickx, and François Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math. Program., 161:307–345, 2017

work page 2017

[60] [60]

Exact worst-case performance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313, 2017

Adrien Taylor, Julien Hendrickx, and François Glineur. Exact worst-case performance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313, 2017

work page 2017

[61] [61]

Freeman, and Kevin M

Bryan Van Scoy, Randy A. Freeman, and Kevin M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018

work page 2018

[62] [62]

Springer Berlin Heidelberg, Berlin, Heidelberg, 1975

Philip Wolfe.A Method of Conjugate Subgradients for Minimizing Nondifferentiable Functions, pages 145–173. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4

work page 1975

[63] [63]

Accelerated gradient descent by concatenation of stepsize schedules

Zehao Zhang and Rujun Jiang. Accelerated gradient descent by concatenation of stepsize schedules. 2024

work page 2024

[64] [64]

Anytime acceleration of gradient descent

Zihan Zhang, Jason D Lee, Simon S Du, and Yuxin Chen. Anytime acceleration of gradient descent. 2024. A Deferred proofs Proof of Lemma 2.We expand f⋆−f+ 0 −L 4∥z1−x⋆∥2 + L 4∥x0−x⋆∥2 =f ⋆−f+ 0 −1 L∥g0∥2 +⟨x0−x⋆,g 0⟩ =f ⋆−f+ 0 − ⟨ g0,x⋆−x+ 0 ⟩ . The expression on the final line isQ⋆,0which is nonnegative by Lemma 1.■ Proof of Lemma 3.By assumption,zis an au...

work page 2024