Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent

Jason M. Altschuler; Pablo A. Parrilo

arxiv: 2605.31386 · v2 · pith:57LEOC54new · submitted 2026-05-29 · 🧮 math.OC · cs.DS

Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent

Jason M. Altschuler , Pablo A. Parrilo This is my paper

Pith reviewed 2026-06-28 21:41 UTC · model grok-4.3

classification 🧮 math.OC cs.DS

keywords gradient descentstepsize selectionaccelerationoptimizationhedging

0 comments

The pith

Gradient descent accelerates when stepsizes are selected via hedging.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether gradient descent can be sped up solely by better stepsize choices and answers yes through the mechanism of stepsize hedging. It presents this as an expository introduction to the phenomenon rather than a new proof or algorithm. A reader would care because the claim suggests acceleration is available without momentum, preconditioning, or other common modifications. The article focuses on making the idea accessible and on establishing hedging as a distinct route to faster convergence.

Core claim

Stepsize hedging provides an alternative mechanism that accelerates gradient descent by choosing stepsizes according to a hedging strategy rather than fixed or standard adaptive rules.

What carries the argument

Stepsize hedging: a dynamic rule for selecting stepsizes that produces acceleration in gradient descent.

If this is right

Gradient descent reaches a given accuracy in fewer iterations when stepsizes follow the hedging rule.
Acceleration can arise from stepsize policy alone without changing the underlying update direction.
Standard analyses that treat stepsize choice as secondary may miss this source of speedup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If hedging works, algorithm design could shift emphasis toward stepsize policies instead of direction modifications.
The approach might unify with adaptive methods used in machine learning by treating hedging as a meta-rule.

Load-bearing premise

That hedging produces genuine acceleration not already achieved by existing stepsize selection methods.

What would settle it

A direct numerical comparison on a convex quadratic where the best fixed stepsize matches or beats the hedged sequence in convergence rate.

Figures

Figures reproduced from arXiv: 2605.31386 by Jason M. Altschuler, Pablo A. Parrilo.

**Figure 2.** Figure 2: Illustration of Theorem 3: time-varying stepsizes that cycle between 2 stepsizes provably speed up GD for (non-quadratic) convex optimization. Here consider minimizing functions that are 1-strongly convex and 4-smooth. Red: the standard prescription of constant stepsizes αt = 2/5 in each iteration t leads to a convergence rate of 0.6 T after T iterations. Blue: better is cycling between αt = 1/3 and 1/2 in… view at source ↗

**Figure 3.** Figure 3: Schematic of how the silver stepsize schedule is recursively constructed in the strongly convex [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The silver stepsize schedule we proposed for convex optimization [ [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Can gradient descent be accelerated by just choosing better stepsizes? Surprisingly, the answer is yes. This short expository article provides an accessible introduction to this phenomenon of stepsize hedging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short expository note on stepsize hedging, not a source of new results or proofs.

read the letter

The main thing to know is that this paper is explicitly framed as a short expository article introducing the idea of stepsize hedging rather than advancing any new theorems, derivations, or experiments. The central point is that gradient descent can be sped up through particular stepsize choices, presented as a known phenomenon.

What the paper does reasonably well is keep the explanation straightforward and accessible. It avoids dense notation and focuses on the basic claim that better stepsizes alone can produce acceleration, which might help readers who are new to the topic or looking for a quick overview.

The soft spots are straightforward given the stated purpose. There are no new technical arguments to assess, so questions about how hedging differs from existing adaptive stepsize rules or line-search methods remain open. Without equations, proofs, or data, it is difficult to judge the mechanism's distinctness or practical edge. The work draws from prior literature, as the authors indicate, which is fine for an introduction but means it adds little beyond synthesis.

This note is mainly for students or practitioners in optimization who want an entry-level pointer to the concept before turning to the original papers. It could serve as a teaching aid or a reminder of the idea, but it is not self-contained enough for deeper study.

I would not bring it to a reading group, would not cite it, and would not send it for peer review. It fits better as an arXiv note than a journal submission.

Referee Report

0 major / 1 minor

Summary. The manuscript is a short expository article claiming that gradient descent can be accelerated solely through appropriate stepsize choices via the phenomenon of 'stepsize hedging,' and it aims to provide an accessible introduction to this idea without presenting new derivations, theorems, or experiments.

Significance. If the exposition accurately reflects existing literature on stepsize selection, the paper could offer a useful pedagogical entry point for viewing stepsize choice as an acceleration mechanism. Its significance is limited to clarity of presentation rather than novel technical contributions, as the central claim is presented as a known phenomenon drawn from prior work.

minor comments (1)

The manuscript is explicitly positioned as expository; consider adding a brief section or paragraph citing specific prior references on stepsize hedging to help readers locate the underlying technical literature.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. The report correctly identifies the paper as a short expository article on stepsize hedging with no new derivations, theorems, or experiments.

Circularity Check

0 steps flagged

Expository introduction with no derivation chain

full rationale

This manuscript is explicitly framed as a short expository article that introduces the existing phenomenon of stepsize hedging rather than advancing any new derivation, theorem, or fitted prediction. The abstract and skeptic summary confirm the central claim is presented as a known observation about gradient descent acceleration via stepsize choice, with no internal equations, parameters, or self-citations that reduce a result to its own inputs by construction. No load-bearing steps exist that match the enumerated circularity patterns, rendering the work self-contained against external benchmarks as an accessible survey of prior literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.1-grok · 5542 in / 950 out tokens · 22923 ms · 2026-06-28T21:41:25.909822+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Abbaszadehpeivasti H, Zamani M (2025) On the convergence rate of the Douglas-Rachford splitting algorithm.Preprint at arXiv:2509.06676

work page arXiv 2025
[2]

Master’s thesis, Mas- sachusetts Institute of Technology

Altschuler JM (2018)Greed, Hedging, and Acceleration in Convex Optimization. Master’s thesis, Mas- sachusetts Institute of Technology

2018
[3]

Altschuler JM, Parrilo PA (2025) Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM72(2):1–38

2025
[4]

Altschuler JM, Parrilo PA (2025) Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming213(1–2):1105–1118

2025
[5]

Altschuler JM, Parrilo PA (2026) Acceleration by random stepsizes: Hedging, equalization, and the arcsine stepsize schedule.Foundations of Computational Mathematics,to appear

2026
[6]

Axiotis K, Sviridenko M (2023) Gradient descent converges linearly for logistic regression on separable data.International Conference on Machine Learning, 1302–1319

2023
[7]

Preprint at arXiv:2511.21917

Bai L, Zeng Y, Zhou B (2025) Generalization of silver stepsize schedule to stochastic optimization. Preprint at arXiv:2511.21917

work page arXiv 2025
[8]

Bertsekas DP (1999)Nonlinear programming(Athena Scientific)

1999
[9]

Bok J, Altschuler JM (2025) Accelerating proximal gradient descent via silver stepsizes.Conference on Learning Theory, 421–453

2025
[10]

Mathematical Programming,to appear

Bok J, Altschuler JM (2026) Optimized methods for composite optimization: a reduction perspective. Mathematical Programming,to appear

2026
[11]

Boyd S, Vandenberghe L (2004)Convex optimization(Cambridge University Press)

2004
[12]

Bubeck S (2015) Convex optimization: Algorithms and complexity .Foundations and Trends in Machine Learning8(3-4):231–357

2015
[13]

ICS News May 2026 Page 7

Cai Y, Wu J, Mei S, Lindsey M, Bartlett PL (2024) Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization.Advances in Neural Information Pro- cessing Systems. ICS News May 2026 Page 7

2024
[14]

Crawshaw M, Liu M (2026) Tight bounds for logistic regression with large stepsize gradient descent in low dimension.Preprint at arXiv:2602.12471

work page arXiv 2026
[15]

Crawshaw M, Woodworth B, Liu M (2025) Constant stepsize local GD for logistic regression: Acceler- ation by instability .Preprint at arXiv:2506.13974

work page arXiv 2025
[16]

Master’s thesis, Université Catholique de Louvain

Daccache A (2019)Performance estimation of the gradient method with fixed arbitrary step sizes. Master’s thesis, Université Catholique de Louvain

2019
[17]

Das Gupta S, Van Parys BP, Ryu EK (2024) Branch-and-bound performance estimation programming: a unified methodology for constructing optimal optimization methods.Mathematical Programming567— -639

2024
[18]

Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming145(1–2):451–482

2014
[19]

d’Aspremont A, Scieur D, Taylor A (2021) Acceleration methods.Foundations and Trends® in Optimiza- tion5(1-2):1–245

2021
[20]

Master’s thesis, Université Catholique de Louvain

Eloi D (2022)Worst-case functions for the gradient method with fixed variable step sizes. Master’s thesis, Université Catholique de Louvain

2022
[21]

Grimmer B (2024) Provably faster gradient descent via long steps.SIAM Journal on Optimization 34(3):2588–2608

2024
[22]

Grimmer B, Shu K, Wang AL (2023) Accelerated gradient descent via long steps.Preprint at arXiv:2309.09961

work page arXiv 2023
[23]

Grimmer B, Shu K, Wang AL (2025) Accelerated objective gap and gradient norm convergence for gradient descent via long steps.INFORMS Journal on Optimization7(2):156–169

2025
[24]

Mathematics of Operations Research,to appear

Grimmer B, Shu K, Wang AL (2026) Composing optimized stepsize schedules for gradient descent. Mathematics of Operations Research,to appear

2026
[25]

Hazan E (2016) Introduction to online convex optimization.Foundations and Trends in Optimization 2(3-4):157–325

2016
[26]

Kornowski G, Shamir O (2024) Open problem: Anytime convergence rate of gradient descent.Confer- ence on Learning Theory, volume 247

2024
[27]

Luenberger DG, Ye Y (1984)Linear and nonlinear programming(Springer)

1984
[28]

Meng SY, Goujaud B, Orvieto A, De Sa C (2025) Gradient descent on logistic regression: Do large step-sizes work with data on the sphere?Preprint at arXiv:2507.11228

work page arXiv 2025
[29]

Meng SY, Orvieto A, Cao DY, De Sa C (2024) Gradient descent on logistic regression with non-separable data and large step sizes.Preprint at arXiv:2406.05033

work page arXiv 2024
[30]

Nemirovskii A, Yudin DB (1983)Problem complexity and method efficiency in optimization(Wiley)

1983
[31]

Nesterov Y (2013)Introductory lectures on convex optimization: A basic course, volume 87 (Springer Science & Business Media)

2013
[32]

Dokl.27(2):372–376

Nesterov YE (1983) A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Math. Dokl.27(2):372–376

1983
[33]

Park J, Roy A, Siegel JW, Bhattacharya A (2025) Acceleration via silver step-size on Riemannian mani- folds with applications to Wasserstein space.Advances in Neural Information Processing Systems38

2025
[34]

Polyak BT (1964) Some methods of speeding up the convergence of iteration methods.USSR Compu- tational Mathematics and Mathematical Physics4(5):1–17

1964
[35]

Polyak BT (1987)Introduction to optimization(Optimization Software, Inc.)

1987
[36]

Rivlin TJ (1981)An introduction to the approximation of functions(Courier Corporation)

1981
[37]

ICS News May 2026 Page 8

Sambharya R, Bok J, Matni N, Pappas G (2025) Learning acceleration algorithms for fast parametric convex optimization with certified robustness.Preprint at arXiv:2507.16264. ICS News May 2026 Page 8

work page arXiv 2025
[38]

Sambharya R, Stellato B (2025) Data-driven performance guarantees for classical and learned optimiz- ers.Journal of Machine Learning Research26(171):1–49

2025
[39]

Sambharya R, Stellato B (2026) Learning algorithm hyperparameters for fast parametric convex opti- mization.SIAM Journal on Mathematics of Data Science,to appear

2026
[40]

Preprint at arXiv:2511.03052

Shugart H, Altschuler JM (2025) Min-max optimization is strictly easier than variational inequalities. Preprint at arXiv:2511.03052

work page arXiv 2025
[41]

Shugart H, Altschuler JM (2025) Negative stepsizes make gradient-descent-ascent converge.Preprint at arXiv:2505.01423

work page internal anchor Pith review arXiv 2025
[42]

Smith LN (2017) Cyclical learning rates for training neural networks.IEEE Winter Conference on Appli- cations of Computer Vision, 464–472 (IEEE)

2017
[43]

Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learn- ing rates.Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, 369–386 (SPIE)

2019
[44]

Taylor AB, Hendrickx JM, Glineur F (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming161(1-2):307–345

2017
[45]

Master’s thesis, Université Catholique de Louvain

Vernimmen P (2024)Tight convergence analysis of exact and inexact gradient methods with constant and silver schedules. Master’s thesis, Université Catholique de Louvain

2024
[46]

Vernimmen P, Glineur F (2025) Empirical and computer-aided robustness analysis of long-step and ac- celerated methods in smooth convex optimization.International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences, 354–366 (Springer)

2025
[47]

Wang B, Ma S, Yang J, Zhou D (2026) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum.INFORMS Journal on Optimization8(2):141–162

2026
[48]

Wu J, Bartlett PL, Telgarsky M, Yu B (2024) Large stepsize gradient descent for logistic loss: Non- monotonicity of the loss improves optimization efficiency .Proceedings of the Thirty Seventh Conference on Learning Theory, Proceedings of Machine Learning Research (PMLR)

2024
[49]

Wu J, Marion P, Bartlett P (2025) Large stepsizes accelerate gradient descent for regularized logistic regression.Advances in Neural Information Processing Systems38:104485–104525

2025
[50]

Journal of Mathematics and Physics32(1-4):243–255

Young D (1953) On Richardson’s method for solving linear systems with positive definite matrices. Journal of Mathematics and Physics32(1-4):243–255

1953
[51]

Zhang R, Wu J, Bartlett PL (2025) Gradient descent converges arbitrarily fast for logistic regression via large and adaptive stepsizes.International Conference on Machine Learning, volume 267, 76361–76384

2025
[52]

Zhang Z, Jiang R (2026) Accelerated gradient descent by concatenation of stepsize schedules.SIAM Journal on Optimization,to appear

2026
[53]

ICS News May 2026 Page 9

Zhang Z, Lee J, Du S, Chen Y (2025) Anytime acceleration of gradient descent.Conference on Learning Theory, volume 291, 5991–6013. ICS News May 2026 Page 9

2025

[1] [1]

Abbaszadehpeivasti H, Zamani M (2025) On the convergence rate of the Douglas-Rachford splitting algorithm.Preprint at arXiv:2509.06676

work page arXiv 2025

[2] [2]

Master’s thesis, Mas- sachusetts Institute of Technology

Altschuler JM (2018)Greed, Hedging, and Acceleration in Convex Optimization. Master’s thesis, Mas- sachusetts Institute of Technology

2018

[3] [3]

Altschuler JM, Parrilo PA (2025) Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule.Journal of the ACM72(2):1–38

2025

[4] [4]

Altschuler JM, Parrilo PA (2025) Acceleration by stepsize hedging: Silver Stepsize Schedule for smooth convex optimization.Mathematical Programming213(1–2):1105–1118

2025

[5] [5]

Altschuler JM, Parrilo PA (2026) Acceleration by random stepsizes: Hedging, equalization, and the arcsine stepsize schedule.Foundations of Computational Mathematics,to appear

2026

[6] [6]

Axiotis K, Sviridenko M (2023) Gradient descent converges linearly for logistic regression on separable data.International Conference on Machine Learning, 1302–1319

2023

[7] [7]

Preprint at arXiv:2511.21917

Bai L, Zeng Y, Zhou B (2025) Generalization of silver stepsize schedule to stochastic optimization. Preprint at arXiv:2511.21917

work page arXiv 2025

[8] [8]

Bertsekas DP (1999)Nonlinear programming(Athena Scientific)

1999

[9] [9]

Bok J, Altschuler JM (2025) Accelerating proximal gradient descent via silver stepsizes.Conference on Learning Theory, 421–453

2025

[10] [10]

Mathematical Programming,to appear

Bok J, Altschuler JM (2026) Optimized methods for composite optimization: a reduction perspective. Mathematical Programming,to appear

2026

[11] [11]

Boyd S, Vandenberghe L (2004)Convex optimization(Cambridge University Press)

2004

[12] [12]

Bubeck S (2015) Convex optimization: Algorithms and complexity .Foundations and Trends in Machine Learning8(3-4):231–357

2015

[13] [13]

ICS News May 2026 Page 7

Cai Y, Wu J, Mei S, Lindsey M, Bartlett PL (2024) Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization.Advances in Neural Information Pro- cessing Systems. ICS News May 2026 Page 7

2024

[14] [14]

Crawshaw M, Liu M (2026) Tight bounds for logistic regression with large stepsize gradient descent in low dimension.Preprint at arXiv:2602.12471

work page arXiv 2026

[15] [15]

Crawshaw M, Woodworth B, Liu M (2025) Constant stepsize local GD for logistic regression: Acceler- ation by instability .Preprint at arXiv:2506.13974

work page arXiv 2025

[16] [16]

Master’s thesis, Université Catholique de Louvain

Daccache A (2019)Performance estimation of the gradient method with fixed arbitrary step sizes. Master’s thesis, Université Catholique de Louvain

2019

[17] [17]

Das Gupta S, Van Parys BP, Ryu EK (2024) Branch-and-bound performance estimation programming: a unified methodology for constructing optimal optimization methods.Mathematical Programming567— -639

2024

[18] [18]

Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: a novel approach.Mathematical Programming145(1–2):451–482

2014

[19] [19]

d’Aspremont A, Scieur D, Taylor A (2021) Acceleration methods.Foundations and Trends® in Optimiza- tion5(1-2):1–245

2021

[20] [20]

Master’s thesis, Université Catholique de Louvain

Eloi D (2022)Worst-case functions for the gradient method with fixed variable step sizes. Master’s thesis, Université Catholique de Louvain

2022

[21] [21]

Grimmer B (2024) Provably faster gradient descent via long steps.SIAM Journal on Optimization 34(3):2588–2608

2024

[22] [22]

Grimmer B, Shu K, Wang AL (2023) Accelerated gradient descent via long steps.Preprint at arXiv:2309.09961

work page arXiv 2023

[23] [23]

Grimmer B, Shu K, Wang AL (2025) Accelerated objective gap and gradient norm convergence for gradient descent via long steps.INFORMS Journal on Optimization7(2):156–169

2025

[24] [24]

Mathematics of Operations Research,to appear

Grimmer B, Shu K, Wang AL (2026) Composing optimized stepsize schedules for gradient descent. Mathematics of Operations Research,to appear

2026

[25] [25]

Hazan E (2016) Introduction to online convex optimization.Foundations and Trends in Optimization 2(3-4):157–325

2016

[26] [26]

Kornowski G, Shamir O (2024) Open problem: Anytime convergence rate of gradient descent.Confer- ence on Learning Theory, volume 247

2024

[27] [27]

Luenberger DG, Ye Y (1984)Linear and nonlinear programming(Springer)

1984

[28] [28]

Meng SY, Goujaud B, Orvieto A, De Sa C (2025) Gradient descent on logistic regression: Do large step-sizes work with data on the sphere?Preprint at arXiv:2507.11228

work page arXiv 2025

[29] [29]

Meng SY, Orvieto A, Cao DY, De Sa C (2024) Gradient descent on logistic regression with non-separable data and large step sizes.Preprint at arXiv:2406.05033

work page arXiv 2024

[30] [30]

Nemirovskii A, Yudin DB (1983)Problem complexity and method efficiency in optimization(Wiley)

1983

[31] [31]

Nesterov Y (2013)Introductory lectures on convex optimization: A basic course, volume 87 (Springer Science & Business Media)

2013

[32] [32]

Dokl.27(2):372–376

Nesterov YE (1983) A method of solving a convex programming problem with convergence rate O(1/k2).Soviet Math. Dokl.27(2):372–376

1983

[33] [33]

Park J, Roy A, Siegel JW, Bhattacharya A (2025) Acceleration via silver step-size on Riemannian mani- folds with applications to Wasserstein space.Advances in Neural Information Processing Systems38

2025

[34] [34]

Polyak BT (1964) Some methods of speeding up the convergence of iteration methods.USSR Compu- tational Mathematics and Mathematical Physics4(5):1–17

1964

[35] [35]

Polyak BT (1987)Introduction to optimization(Optimization Software, Inc.)

1987

[36] [36]

Rivlin TJ (1981)An introduction to the approximation of functions(Courier Corporation)

1981

[37] [37]

ICS News May 2026 Page 8

Sambharya R, Bok J, Matni N, Pappas G (2025) Learning acceleration algorithms for fast parametric convex optimization with certified robustness.Preprint at arXiv:2507.16264. ICS News May 2026 Page 8

work page arXiv 2025

[38] [38]

Sambharya R, Stellato B (2025) Data-driven performance guarantees for classical and learned optimiz- ers.Journal of Machine Learning Research26(171):1–49

2025

[39] [39]

Sambharya R, Stellato B (2026) Learning algorithm hyperparameters for fast parametric convex opti- mization.SIAM Journal on Mathematics of Data Science,to appear

2026

[40] [40]

Preprint at arXiv:2511.03052

Shugart H, Altschuler JM (2025) Min-max optimization is strictly easier than variational inequalities. Preprint at arXiv:2511.03052

work page arXiv 2025

[41] [41]

Shugart H, Altschuler JM (2025) Negative stepsizes make gradient-descent-ascent converge.Preprint at arXiv:2505.01423

work page internal anchor Pith review arXiv 2025

[42] [42]

Smith LN (2017) Cyclical learning rates for training neural networks.IEEE Winter Conference on Appli- cations of Computer Vision, 464–472 (IEEE)

2017

[43] [43]

Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learn- ing rates.Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, 369–386 (SPIE)

2019

[44] [44]

Taylor AB, Hendrickx JM, Glineur F (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods.Mathematical Programming161(1-2):307–345

2017

[45] [45]

Master’s thesis, Université Catholique de Louvain

Vernimmen P (2024)Tight convergence analysis of exact and inexact gradient methods with constant and silver schedules. Master’s thesis, Université Catholique de Louvain

2024

[46] [46]

Vernimmen P, Glineur F (2025) Empirical and computer-aided robustness analysis of long-step and ac- celerated methods in smooth convex optimization.International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences, 354–366 (Springer)

2025

[47] [47]

Wang B, Ma S, Yang J, Zhou D (2026) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum.INFORMS Journal on Optimization8(2):141–162

2026

[48] [48]

Wu J, Bartlett PL, Telgarsky M, Yu B (2024) Large stepsize gradient descent for logistic loss: Non- monotonicity of the loss improves optimization efficiency .Proceedings of the Thirty Seventh Conference on Learning Theory, Proceedings of Machine Learning Research (PMLR)

2024

[49] [49]

Wu J, Marion P, Bartlett P (2025) Large stepsizes accelerate gradient descent for regularized logistic regression.Advances in Neural Information Processing Systems38:104485–104525

2025

[50] [50]

Journal of Mathematics and Physics32(1-4):243–255

Young D (1953) On Richardson’s method for solving linear systems with positive definite matrices. Journal of Mathematics and Physics32(1-4):243–255

1953

[51] [51]

Zhang R, Wu J, Bartlett PL (2025) Gradient descent converges arbitrarily fast for logistic regression via large and adaptive stepsizes.International Conference on Machine Learning, volume 267, 76361–76384

2025

[52] [52]

Zhang Z, Jiang R (2026) Accelerated gradient descent by concatenation of stepsize schedules.SIAM Journal on Optimization,to appear

2026

[53] [53]

ICS News May 2026 Page 9

Zhang Z, Lee J, Du S, Chen Y (2025) Anytime acceleration of gradient descent.Conference on Learning Theory, volume 291, 5991–6013. ICS News May 2026 Page 9

2025