Beyond Minimax Optimality: A Subgame Perfect Gradient Method
Pith reviewed 2026-05-23 07:32 UTC · model grok-4.3
The pith
The Subgame Perfect Gradient Method is dynamically optimal at every iteration given the first-order information observed so far.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SPGM refines OGM to make use of the full history of first-order information and is dynamically optimal in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration, with this notion formalized using the game-theoretic concept of a subgame perfect equilibrium.
What carries the argument
The Subgame Perfect Gradient Method (SPGM), a first-order iteration that at each step solves for the update guaranteeing the best possible remaining convergence rate against all functions consistent with the observed data.
If this is right
- At every iteration SPGM's convergence bound is the best possible among all first-order methods given the history.
- SPGM therefore improves on OGM once any nontrivial first-order information has been collected.
- Numerical experiments confirm that SPGM strongly outperforms OGM on standard test problems.
- The same dynamic-optimality argument applies to any problem class where first-order information can be revealed incrementally.
Where Pith is reading between the lines
- The same equilibrium construction could be used to derive optimal methods for problems where only stochastic or inexact gradients are available.
- It may be possible to replace the fixed minimax analysis of many existing methods with a sequence of subgame equilibria that tighten as data arrives.
- The approach suggests designing new methods by recursively solving the equilibrium problem rather than a single static optimization over step coefficients.
Load-bearing premise
Modeling the choice of first-order methods as a game in which dynamic optimality is captured exactly by subgame perfect equilibrium identifies when one method is strictly better than others given the same information.
What would settle it
A concrete first-order method together with a smooth convex function that matches all observed first-order information up to iteration k but from that point achieves a strictly better convergence rate than SPGM on the remaining possible functions.
Figures
read the original abstract
The study of convex optimization has historically been concerned with worst-case convergence rates. The development of the Optimized Gradient Method (OGM), due to \citet{drori2012PerformanceOF,Kim2016optimal}, marked a major milestone in this study, as OGM achieves the optimal worst-case convergence rate among all first-order methods for unconstrained smooth convex optimization. In order to examine the possibility of obtaining stronger convergence guarantees, we will consider algorithms with \emph{dynamic} convergence rates, which may improve as additional first-order information is revealed. Our main contribution is the development of an algorithm, the Subgame Perfect Gradient Method (SPGM), which refines OGM to make use of the full history of first-order information. We show that SPGM is \emph{dynamically optimal}, in the sense that in each iteration, no other algorithm can offer a strictly better convergence rate on all functions which agree with the observed first-order information up to that iteration. We formalize this notion of dynamic optimality using the game-theoretic notion of a subgame perfect equilibrium. We conclude our study with preliminary numerical experiments showing that SPGM strongly outperforms OGM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Subgame Perfect Gradient Method (SPGM) as a refinement of the Optimized Gradient Method (OGM) that uses the full history of first-order information. It claims SPGM is dynamically optimal: at each iteration, no other first-order method can guarantee a strictly better convergence rate on all functions consistent with the observed gradients up to that point. This notion is formalized via subgame perfect equilibrium in an appropriate game. The work concludes with preliminary numerical experiments indicating that SPGM outperforms OGM.
Significance. If the dynamic optimality result holds, the paper advances convex optimization beyond static minimax rates by introducing an adaptive, information-dependent optimality concept with a rigorous game-theoretic foundation. The explicit construction of SPGM that realizes subgame-perfect optimality and the preliminary experiments showing practical gains are strengths. This framing could guide development of adaptive first-order methods that improve as more gradient information is revealed.
minor comments (1)
- [Abstract] Abstract and § on experiments: the description of the numerical results is labeled 'preliminary'; adding details on the specific test functions, problem dimensions, stopping criteria, and quantitative metrics (e.g., iteration counts or function-value gaps) would make the empirical support more reproducible and proportionate to the theoretical claim.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the paper, recognition of the significance of the dynamic optimality framework, and recommendation for minor revision. We are glad that the game-theoretic formulation and the potential for adaptive first-order methods were viewed favorably.
Circularity Check
No significant circularity: dynamic optimality is independently defined
full rationale
The paper's core claim defines dynamic optimality via subgame-perfect equilibrium on the class of functions consistent with observed first-order information; this game-theoretic notion is introduced as a new formalization and does not reduce by construction to the OGM rates or any fitted parameters. SPGM is constructed by refining the known OGM (externally cited) to incorporate full history, but the optimality proof is tied directly to the subgame definition rather than renaming or self-referentially fitting prior results. No self-citation load-bearing steps, ansatz smuggling, or uniqueness theorems imported from the authors appear in the derivation chain. The contribution remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The objective function is smooth and convex
- domain assumption First-order information consists of function values and gradients at queried points
invented entities (1)
-
Subgame Perfect Gradient Method (SPGM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging I: Multi-step descent and the silver stepsize schedule. 2023
work page 2023
-
[2]
Jason M. Altschuler and Pablo A. Parrilo. Acceleration by stepsize hedging II: Silver stepsize schedule for smooth convex optimization. 2023
work page 2023
-
[3]
URL https://docs.mosek.com/latest/ juliaapi/index.html
MOSEK ApS.Optimizer API for Julia 10.2.10, 2019. URL https://docs.mosek.com/latest/ juliaapi/index.html
work page 2019
-
[4]
Mathieu Barré, Adrien B. Taylor, and Francis Bach. Principled analyses and design of first-order methods with inexact proximal operators.Math. Program., 201(1–2):185–230, December 2022. ISSN 0025-5610
work page 2022
-
[5]
Aharon Ben-Tal and Arkadi Nemirovski.Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM, Philadelphia, PA, 2001
work page 2001
-
[6]
C. G. Broyden. The convergence of a class of double-rank minimization algorithms.Journal of the Institute of Mathematics and Its Applications, 6(1):76–90, 1970
work page 1970
-
[7]
Méthode générale pour la résolution des systemes d’équations simultanées.Comp
Augustin Cauchy et al. Méthode générale pour la résolution des systemes d’équations simultanées.Comp. Rend. Sci. Paris, 25(1847):536–538, 1847
-
[8]
LIBSVM: A library for support vector machines.ACM Trans
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines.ACM Trans. Intell. Syst. Technol., 2(3), May 2011
work page 2011
-
[9]
A robust accelerated optimization algorithm for strongly convex functions
Saman Cyrus, Bin Hu, Bryan Van Scoy, and Laurent Lessard. A robust accelerated optimization algorithm for strongly convex functions. In2018 Annual American Control Conference (ACC), pages 1376–1381, 2018
work page 2018
-
[10]
Mateo Díaz and Benjamin Grimmer. Optimal convergence rates for the proximal bundle method.SIAM Journal on Optimization, 33(2):424–454, 2023
work page 2023
-
[11]
The exact information-based complexity of smooth convex minimization.J
Yoel Drori. The exact information-based complexity of smooth convex minimization.J. Complex., 39: 1–16, 2017
work page 2017
-
[12]
Yoel Drori and Adrien B. Taylor. Efficient first-order methods for convex minimization: a constructive approach.Mathematical Programming, 184:183 – 220, 2018
work page 2018
-
[13]
Yoel Drori and Adrien B. Taylor. On the oracle complexity of smooth strongly convex minimization.J. Complex., 68:101590, 2021
work page 2021
-
[14]
Performance of first-order methods for smooth convex minimization: a novel approach.Math
Yoel Drori and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach.Math. Program., 145:451–482, 2012
work page 2012
-
[15]
An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016
Yoel Drori and Marc Teboulle. An optimal variant of Kelley’s cutting-plane method.Mathematical Programming, 160:321–351, 2016
work page 2016
-
[16]
Rate of Convergence of the Bundle Method.J
Yu Du and Andrzej Ruszczyński. Rate of Convergence of the Bundle Method.J. Optim. Theory Appl., 173(3):908–922, June 2017. ISSN 0022-3239. 25
work page 2017
-
[17]
Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021
Alexandre d’Aspremont, Damien Scieur, Adrien Taylor, et al. Acceleration methods.Foundations and Trends®in Optimization, 5(1-2):1–245, 2021
work page 2021
-
[18]
On the acceleration of proximal bundle methods
David Fersztand and Xu Andy Sun. On the acceleration of proximal bundle methods. 2025
work page 2025
-
[19]
A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970
Roger Fletcher. A new approach to variable metric algorithms.The Computer Journal, 13(3):317–322, 1970
work page 1970
-
[20]
Mihai I. Florea. An efficient accelerated gradient method with memory applicable to composite problems. In2021 International Aegean Conference on Electrical Machines and Power Electronics (ACEMP) and 2021 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), pages 473–480, 2021
work page 2021
-
[21]
Mihai I. Florea. Exact gradient methods with memory.Optimization Methods and Software, 37(6): 2310–2337, 2022. PMID: 39906221
work page 2022
-
[22]
Mihai I. Florea. Adaptive first-order methods with enhanced worst-case rates. 2024
work page 2024
-
[23]
Mihai I. Florea and Yurii E. Nesterov. An optimal lower bound for smooth convex functions.Foundations of Computational Mathematics, 2025. Published online 25 June 2025
work page 2025
-
[24]
Donald Goldfarb. A family of variable metric updates derived by variational means.Mathematics of Computation, 24(109):23–26, 1970
work page 1970
-
[25]
Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Accelerated objective gap and gradient norm convergence for gradient descent via long steps. 2024
work page 2024
-
[26]
Benjamin Grimmer, Kevin Shu, and Alex L. Wang. Composing optimized stepsize schedules for gradient descent. 2024
work page 2024
-
[27]
Shuvomoy Das Gupta, Bart P.G. Van Parys, and Ernest Ryu. Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods.Math. Program., 2023
work page 2023
-
[28]
Warren Hare and Claudia Sagastizábal. A Redistributed Proximal Bundle Method for Nonconvex Optimization.SIAM Journal on Optimization, 20(5):2442–2473, 2010. ISSN 1052-6234
work page 2010
-
[29]
Springer Berlin Heidelberg, Berlin, Heidelberg, 1993
Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal.Acceleration of the Cutting-Plane Algorithm: Primal Forms of Bundle Methods, pages 275–330. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993
work page 1993
-
[30]
Uijeong Jang, Shuvomoy Das Gupta, and Ernest K. Ryu. Computer-assisted design of accelerated composite optimization methods: Optista. 2024
work page 2024
-
[31]
Accelerated proximal point method for maximally monotone operators.Math
Donghwan Kim. Accelerated proximal point method for maximally monotone operators.Math. Program., 190(1–2):57–87, November 2021. ISSN 0025-5610
work page 2021
-
[32]
Donghwan Kim and Jeffrey A. Fessler. Optimized first-order methods for smooth convex minimization. Math. Program., 159(1–2):81–107, 2016
work page 2016
-
[33]
On the convergence analysis of the optimized gradient method.J
Donghwan Kim and Jeffrey A Fessler. On the convergence analysis of the optimized gradient method.J. Optim. Theory Appl., 172(1):187–205, 2017
work page 2017
-
[34]
Donghwan Kim and Jeffrey A. Fessler. Another look at the fast iterative shrinkage/thresholding algorithm (fista).SIAM Journal on Optimization, 28(1):223–250, 2018
work page 2018
-
[35]
Donghwan Kim and Jeffrey A. Fessler. Adaptive restart of the optimized gradient method for convex optimization.J. Optim. Theory Appl., 178(1):240–263, July 2018. ISSN 0022-3239
work page 2018
-
[36]
Donghwan Kim and Jeffrey A. Fessler. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions.J. Optim. Theory Appl., 188(1):192–219, 2021. 26
work page 2021
-
[37]
A proof of exact convergence rate of gradient descent
Jungbin Kim. A proof of exact convergence rate of gradient descent. Part II. Performance criterion (f(xn)−f∗)/∥x0−x∗∥2. 2024
work page 2024
-
[38]
Krzysztof C. Kiwiel. An Aggregate Subgradient Method for Nonsmooth Convex Minimization.Math. Program., 27(3):320–341, October 1983. ISSN 0025-5610
work page 1983
-
[39]
Guanghui Lan. Bundle-Level Type Methods Uniformly Optimal For Smooth And Nonsmooth Convex Optimization.Mathematical Programming, 149(1):1–45, Feb 2015. ISSN 1436-4646
work page 2015
-
[40]
Springer Berlin Heidelberg, Berlin, Heidelberg, 1975
Claude Lemaréchal.An Extension of Davidon Methods to Nondifferentiable Problems, pages 95–109. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4
work page 1975
-
[41]
A simple uniformly optimal method without line search for convex optimization: T
Tianjiao Li and Guanghui Lan. A simple uniformly optimal method without line search for convex optimization: T. li, g. lan.Mathematical Programming, pages 1–38, 2025
work page 2025
-
[42]
Jiaming Liang and Renato D. C. Monteiro. A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes.SIAM Journal on Optimization, 31(4):2955–2986, 2021
work page 2021
-
[43]
On the limited memory BFGS method for large scale optimization
Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Math. Program., 45(1):503–528, 1989
work page 1989
-
[44]
JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math
Miles Lubin, Oscar Dowson, Joaquim Dias Garcia, Joey Huchette, Benoît Legat, and Juan Pablo Vielma. JuMP 1.0: Recent improvements to a modeling language for mathematical optimization.Math. Program. Comput., 2023
work page 2023
-
[45]
Adaptive gradient descent without descent
Yura Malitsky and Konstantin Mishchenko. Adaptive gradient descent without descent. 2019
work page 2019
-
[46]
Optim: A mathematical optimization package for Julia.J
Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.J. Open Source Softw., 3(24):615, 2018
work page 2018
-
[47]
A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. Wiley- Interscience, New York, 1983
work page 1983
-
[48]
Information-based complexity of linear operator equations.J
Arkadi S Nemirovsky. Information-based complexity of linear operator equations.J. Complex., 8(2): 153–175, 1992
work page 1992
-
[49]
Yu Nesterov. Universal gradient methods for convex optimization problems.Mathematical Programming, 152(1):381–404, 2015
work page 2015
-
[50]
A method for solving the convex programming problem with convergence rateO(1/k2)
Yurii Nesterov. A method for solving the convex programming problem with convergence rateO(1/k2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983
work page 1983
-
[51]
Yurii Nesterov and Mihai I. Florea. Gradient methods with memory.Optimization Methods and Software, 37(3):936–953, 2022
work page 2022
-
[52]
An introduction to game theory.Oxford University Press, 2:672–713, 2004
Martin J Osborne. An introduction to game theory.Oxford University Press, 2:672–713, 2004
work page 2004
-
[53]
Chanwoo Park, Jisun Park, and Ernest K. Ryu. Factor- √ 2 acceleration of accelerated gradient methods. Applied Mathematics & Optimization, 88:1–38, 2021
work page 2021
-
[54]
Teodor Rotaru, François Glineur, and Panagiotis Patrinos. Exact worst-case convergence rates of gradient descent: a complete analysis for all constant stepsizes over nonconvex and convex functions. 2024
work page 2024
-
[55]
Princeton University Press, Princeton, NJ, USA, 2006
Andrzej Ruszczynski.Nonlinear Optimization. Princeton University Press, Princeton, NJ, USA, 2006. ISBN 0691119155, 9780691119151
work page 2006
-
[56]
David F. Shanno. Conditioning of quasi-newton methods for function minimization.Mathematics of Computation, 24(111):647–656, 1970. 27
work page 1970
-
[57]
Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions
Adrien Taylor and Francis Bach. Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Alina Beygelzimer and Daniel Hsu, editors,Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 ofProceedings of Machine Learning Research, pages 2934–2992, Phoenix, AZ, USA, 25–28 Jun 2019. PMLR
work page 2019
-
[58]
An optimal gradient method for smooth strongly convex minimization
Adrien Taylor and Yoel Drori. An optimal gradient method for smooth strongly convex minimization. Math. Program., 199(1–2):557–594, jun 2022. ISSN 0025-5610
work page 2022
-
[59]
Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math
Adrien Taylor, Julien Hendrickx, and François Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Math. Program., 161:307–345, 2017
work page 2017
-
[60]
Adrien Taylor, Julien Hendrickx, and François Glineur. Exact worst-case performance of first-order methods for composite convex optimization.SIAM Journal on Optimization, 27(3):1283–1313, 2017
work page 2017
-
[61]
Bryan Van Scoy, Randy A. Freeman, and Kevin M. Lynch. The fastest known globally convergent first-order method for minimizing strongly convex functions.IEEE Control Systems Letters, 2(1):49–54, 2018
work page 2018
-
[62]
Springer Berlin Heidelberg, Berlin, Heidelberg, 1975
Philip Wolfe.A Method of Conjugate Subgradients for Minimizing Nondifferentiable Functions, pages 145–173. Springer Berlin Heidelberg, Berlin, Heidelberg, 1975. ISBN 978-3-642-00764-4
work page 1975
-
[63]
Accelerated gradient descent by concatenation of stepsize schedules
Zehao Zhang and Rujun Jiang. Accelerated gradient descent by concatenation of stepsize schedules. 2024
work page 2024
-
[64]
Anytime acceleration of gradient descent
Zihan Zhang, Jason D Lee, Simon S Du, and Yuxin Chen. Anytime acceleration of gradient descent. 2024. A Deferred proofs Proof of Lemma 2.We expand f⋆−f+ 0 −L 4∥z1−x⋆∥2 + L 4∥x0−x⋆∥2 =f ⋆−f+ 0 −1 L∥g0∥2 +⟨x0−x⋆,g 0⟩ =f ⋆−f+ 0 − ⟨ g0,x⋆−x+ 0 ⟩ . The expression on the final line isQ⋆,0which is nonnegative by Lemma 1.■ Proof of Lemma 3.By assumption,zis an au...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.