Robust Accelerated Adaptive Search: High-Probability Complexity Bounds under Bounded-Moment Stochastic Oracles

Congying Han; Shichen Liao; Shunzhi Zhang; Tiande Guo

arxiv: 2604.15526 · v1 · submitted 2026-04-16 · 🧮 math.OC

Robust Accelerated Adaptive Search: High-Probability Complexity Bounds under Bounded-Moment Stochastic Oracles

Shunzhi Zhang , Shichen Liao , Congying Han , Tiande Guo This is my paper

Pith reviewed 2026-05-10 10:14 UTC · model grok-4.3

classification 🧮 math.OC

keywords convex optimizationstochastic oraclesadaptive searchmomentum methodshigh-probability boundsheavy-tailed noisecomplexity analysisrobust algorithms

0 comments

The pith

A tunable-momentum adaptive search method yields high-probability complexity bounds for convex optimization despite biased and heavy-tailed stochastic oracles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses unconstrained smooth convex optimization under stochastic first- and zeroth-order oracles that satisfy only finite-moment bounds, naturally allowing persistent bias and heavy-tailed noise. Integrating momentum into adaptive step search creates a structural difficulty because momentum carries oracle errors forward and can erode the stabilizing benefits of local search. RAAS is introduced as a robust accelerated adaptive search method that employs tunable momentum intervention to manage this issue. A general high-probability framework for adaptive search methods is developed and then specialized to RAAS, producing stopping-time complexity bounds that guarantee reaching an attainable precision neighborhood for both strongly convex and general convex cases. The analysis also identifies how parameters balance early acceleration against later stability and motivates a switching heuristic that performs well in experiments.

Core claim

By incorporating a tunable momentum intervention into adaptive step search, the RAAS algorithm achieves high-probability stopping-time complexity bounds for reaching the attainable precision neighborhood in smooth convex optimization, under first- and zeroth-order stochastic oracles subject only to finite-moment conditions that admit persistent bias and heavy-tailed noise.

What carries the argument

Tunable momentum intervention in the adaptive search procedure, which adjusts the weight on historical directions to limit the forward propagation of stochastic oracle errors while preserving acceleration.

If this is right

High-probability stopping-time bounds hold for oracles with only finite moments, including those with bias and heavy tails.
Algorithmic parameters trade off early-stage acceleration against late-stage stability.
A simple switching heuristic that adjusts momentum levels according to the observed trade-off performs well empirically.
Separate high-probability analyses are given for the strongly convex and general convex settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The general high-probability framework may extend to other adaptive first-order methods facing similar noise conditions.
Momentum tuning strategies could be tested in non-convex or composite optimization problems where error propagation is likewise a concern.
The results highlight the value of parameter-switching rules for practical reliability when noise statistics are unknown in advance.
High-probability guarantees of this form could improve the design of optimization routines used in settings with unreliable gradient estimates.

Load-bearing premise

The momentum parameter can be chosen to sufficiently suppress the propagation of oracle errors so that it does not undermine the stabilizing effect of local adaptive search.

What would settle it

A simulation on a smooth convex problem driven by heavy-tailed stochastic oracle noise (such as Student-t with small degrees of freedom) in which the observed number of iterations to reach a target precision exceeds the derived high-probability bound.

Figures

Figures reproduced from arXiv: 2604.15526 by Congying Han, Shichen Liao, Shunzhi Zhang, Tiande Guo.

**Figure 2.** Figure 2: Strongly convex logistic regression across different noise settings. [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Geometric illustration of the momentum step [PITH_FULL_IMAGE:figures/full_fig_p048_3.png] view at source ↗

read the original abstract

We study unconstrained smooth convex optimization under stochastic first- and zeroth-order oracles subject only to finite-moment bounds, naturally admitting persistent bias and heavy-tailed noise. In this hostile environment, integrating momentum into \emph{adaptive step search} to secure acceleration poses an inherent structural challenge, because momentum propagates oracle errors across iterations, inevitably undermining the stabilizing effect of local search. To address this difficulty, we propose \texttt{RAAS}, a robust accelerated adaptive search method with tunable momentum intervention. Theoretically, we develop a general high-probability framework for adaptive search methods under stochastic oracle feedback, and instantiate it through the strongly convex and general convex analyses of \texttt{RAAS}. This yields high-probability stopping-time complexity bounds for reaching the attainable precision neighborhood. The resulting guarantees also clarify how the algorithmic parameters trade off early-stage acceleration against late-stage stability, and motivate a simple switching heuristic that performs well empirically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAAS adds tunable momentum intervention to adaptive search and supplies a general high-probability framework for finite-moment oracles that yields stopping-time bounds.

read the letter

The main point is that the paper gives a high-probability framework for adaptive search methods under stochastic oracles with only finite moments, and it uses that framework to analyze RAAS, their accelerated method that intervenes on momentum to limit error propagation. The framework produces explicit stopping-time bounds for reaching a precision neighborhood in both the strongly convex and general convex cases. They also show how the parameters trade off early acceleration against later stability, which leads to a simple switching heuristic. That combination addresses a concrete difficulty: momentum normally spreads oracle errors and undercuts the stabilizing effect of local adaptive steps, and their intervention is meant to restore control while keeping the acceleration benefit. The analysis is set up to allow persistent bias and heavy tails, which matches noise that appears in many practical stochastic oracles. The derivations for the high-probability bounds and the way they handle accumulated momentum errors are the parts that carry the weight; if those steps are tight, the bounds are a useful addition because most existing high-probability results either require stronger noise assumptions or do not treat accelerated adaptive search. One soft spot is that the empirical section only sketches the switching heuristic, so it is not yet clear how much it improves over standard choices or how sensitive performance is to the switch point. The constants in the bounds also look loose, as is common in these analyses, but that does not undermine the structural contribution. This paper is for researchers working on stochastic convex optimization who need guarantees that survive realistic noise models rather than variance bounds. A reader focused on high-probability complexity for adaptive or accelerated methods will find the framework and the explicit trade-off discussion worth reading. It deserves peer review because the technical problem is well-posed, the approach is coherent, and the results are grounded enough to be checked and potentially built on.

Referee Report

0 major / 0 minor

Summary. The paper studies unconstrained smooth convex optimization under stochastic first- and zeroth-order oracles subject only to finite-moment bounds (admitting persistent bias and heavy-tailed noise). It proposes RAAS, a robust accelerated adaptive search method with tunable momentum intervention to prevent error propagation from undermining local search stability. A general high-probability framework for adaptive search methods under stochastic oracle feedback is developed and instantiated via strongly convex and general convex analyses of RAAS, yielding high-probability stopping-time complexity bounds for reaching the attainable precision neighborhood. The analysis clarifies algorithmic parameter trade-offs between early acceleration and late stability and motivates an empirical switching heuristic.

Significance. If the derivations hold, the work would be significant for stochastic optimization: it delivers high-probability (rather than expectation) complexity bounds under weaker oracle assumptions than standard bounded-variance or sub-Gaussian models, directly addressing realistic heavy-tailed and biased noise. The general framework for adaptive search and the explicit treatment of momentum intervention as a tunable stabilizer are potentially reusable for other methods. The stopping-time focus and switching heuristic also add practical value.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, recognition of the significance of high-probability bounds under bounded-moment oracles, and recommendation to accept the manuscript.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper develops a general high-probability framework for adaptive search under stochastic oracles with only finite-moment bounds, then instantiates the framework for the RAAS algorithm via tunable momentum intervention. The derivation chain consists of standard probabilistic analysis steps (stopping-time bounds, error propagation control) that start from the stated oracle assumptions and produce complexity guarantees without reducing to fitted parameters, self-definitions, or load-bearing self-citations. No equations equate a claimed prediction to an input by construction, and the central claims remain independent of any prior author work invoked as an external fact. The analysis is self-contained against the weak oracle model.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations or proof sections, so no free parameters, axioms, or invented entities can be extracted; the work appears to rest on standard convex smoothness and the finite-moment oracle assumption.

pith-pipeline@v0.9.0 · 5463 in / 1231 out tokens · 35373 ms · 2026-05-10T10:14:21.152323+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

N. S. Aybat, A. Fallah, M. G¨ urb¨ uzbalaban, and A. Ozdaglar. Robust accelerated gradient methods for smooth strongly convex functions.SIAM Journal on Optimization, 30(1):717–751, 2020

work page 2020
[2]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM journal on imaging sciences, 2(1):183–202, 2009

Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM journal on imaging sciences, 2(1):183–202, 2009

work page 2009
[3]

A stochastic cubic regu- larisation method with inexact function evaluations and random derivatives for finite sum minimisation

Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, and Philippe L Toint. A stochastic cubic regu- larisation method with inexact function evaluations and random derivatives for finite sum minimisation. InProceedings of the 37th International Conference on Machine Learning, pages 745–754. PMLR, 2020

work page 2020
[4]

Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022

Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, and Philippe L Toint. Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022

work page 2022
[5]

Cao, and Katya Scheinberg

Albert Berahas, L. Cao, and Katya Scheinberg. Global convergence rate analysis of a generic line search algorithm with noise.SIAM Journal on Optimization, 31:1489–1518, 06 2021

work page 2021
[6]

Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg

Albert S. Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg. A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Found. Comput. Math., 22(2):507–560, April 2022. 27

work page 2022
[7]

Berahas, Miaolan Xie, and Baoyu Zhou

Albert S. Berahas, Miaolan Xie, and Baoyu Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Optimization, 35(1):240–269, 2025

work page 2025
[8]

On biased compression for distributed learning.Journal of Machine Learning Research, 24(276):1–50, 2023

Aleksei Beznosikov, Samuel Horv´ ath, Peter Richt´ arik, and Mher Safaryan. On biased compression for distributed learning.Journal of Machine Learning Research, 24(276):1–50, 2023

work page 2023
[9]

Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

work page 2019
[10]

Backtracking strategies for accelerated descent methods with smooth composite objectives.SIAM Journal on Optimization, 29(3):1772–1798, 2019

Luca Calatroni and Antonin Chambolle. Backtracking strategies for accelerated descent methods with smooth composite objectives.SIAM Journal on Optimization, 29(3):1772–1798, 2019

work page 2019
[11]

Berahas, and Katya Scheinberg

Liyuan Cao, Albert S. Berahas, and Katya Scheinberg. First- and second-order high probability com- plexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1):55–106, 2024

work page 2024
[12]

Cartis and K Scheinberg

C. Cartis and K Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(3):337–375, 2018

work page 2018
[13]

Convergence analysis of accelerated stochastic gradient descent under the growth condition.Mathematics of Operations Research, 49(4):2492–2526, 2023

You-Lin Chen, Sen Na, and Mladen Kolar. Convergence analysis of accelerated stochastic gradient descent under the growth condition.Mathematics of Operations Research, 49(4):2492–2526, 2023

work page 2023
[14]

Smooth optimization with approximate gradient.SIAM Journal on Optimiza- tion, 19(3):1171–1183, 2008

Alexandre d’Aspremont. Smooth optimization with approximate gradient.SIAM Journal on Optimiza- tion, 19(3):1171–1183, 2008

work page 2008
[15]

Intermediate gradient methods for smooth convex problems with inexact oracle, 2013

Olivier Devolder, Fran¸ cois Glineur, and Yurii Nesterov. Intermediate gradient methods for smooth convex problems with inexact oracle, 2013. CORE Discussion Paper 2013/17; Optimization Online version posted 2014

work page 2013
[16]

First-order methods of smooth convex opti- mization with inexact oracle.Mathematical Programming, 146:37–75, 2014

Olivier Devolder, Fran¸ cois Glineur, and Yurii Nesterov. First-order methods of smooth convex opti- mization with inexact oracle.Mathematical Programming, 146:37–75, 2014

work page 2014
[17]

Fallah, K

A. Fallah, K. Yuan, M. G¨ urb¨ uzbalaban, and A. Ozdaglar. Robust distributed accelerated stochastic gradient methods for multi-agent networks.Journal of Machine Learning Research, 23(220):1–96, 2022

work page 2022
[18]

Deviation inequalities for martingales with applications

Xiequan Fan, Ion Grama, and Quansheng Liu. Deviation inequalities for martingales with applications. Journal of Mathematical Analysis and Applications, 448(1):538–566, 2017

work page 2017
[19]

High probability complexity bounds of trust-region stochastic sequential quadratic programming with heavy-tailed noise, 2025

Yuchen Fang, Javad Lavaei, and Sen Na. High probability complexity bounds of trust-region stochastic sequential quadratic programming with heavy-tailed noise, 2025

work page 2025
[20]

Stochastic optimization with heavy- tailed noise via accelerated gradient clipping.Advances in Neural Information Processing Systems, 33:15042–15053, 2020

Eduard Gorbunov, Marina Danilova, and Alexander Gasnikov. Stochastic optimization with heavy- tailed noise via accelerated gradient clipping.Advances in Neural Information Processing Systems, 33:15042–15053, 2020

work page 2020
[21]

Siegel, and Stephan Wojtowytsch

Kanan Gupta, Jonathan W. Siegel, and Stephan Wojtowytsch. Nesterov acceleration despite very noisy gradients, 2024

work page 2024
[22]

Robustly stable accelerated momentum methods with a near-optimal l2 gain and h∞ performance, 2023

Mert Gurbuzbalaban. Robustly stable accelerated momentum methods with a near-optimal l2 gain and h∞ performance, 2023

work page 2023
[23]

Accelerated gradient methods with biased gradient estimates: Risk sensitivity, high-probability guarantees, and large deviation bounds, 2025

Mert G¨ urb¨ uzbalaban, Yasa Syed, and Necdet Serhat Aybat. Accelerated gradient methods with biased gradient estimates: Risk sensitivity, high-probability guarantees, and large deviation bounds, 2025

work page 2025
[24]

High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024

Billy Jin, Katya Scheinberg, and Miaolan Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024. 28

work page 2024
[25]

Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, pages 1–29, 2024

Billy Jin, Katya Scheinberg, and Miaolan Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, pages 1–29, 2024

work page 2024
[26]

Error feedback fixes signsgd and other gradient compression schemes

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, and Martin Jaggi. Error feedback fixes signsgd and other gradient compression schemes. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 3252–3261. PMLR, 2019

work page 2019
[27]

Intermediate gradient methods with relative inexactness.Journal of Optimization Theory and Applications, 207(3):62, 2025

Nikita Kornilov, Mohammad Alkousa, Eduard Gorbunov, Fedor Stonyakin, Pavel Dvurechensky, and Alexander Gasnikov. Intermediate gradient methods with relative inexactness.Journal of Optimization Theory and Applications, 207(3):62, 2025

work page 2025
[28]

Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise.J

Andrei Kulunchakov and Julien Mairal. Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise.J. Mach. Learn. Res., 21(1), January 2020

work page 2020
[29]

undefinedlker Birbil, Mert G¨ urb¨ uzbalaban, and Sinan Yildirim

Nurdan Kuru, S ¸. undefinedlker Birbil, Mert G¨ urb¨ uzbalaban, and Sinan Yildirim. Differentially private accelerated optimization algorithms.SIAM J. on Optimization, 32(2):795–821, January 2022

work page 2022
[30]

Shanbhag

Jinlong Lei and Uday V. Shanbhag. Variance-reduced accelerated first-order methods: Central limit theorems and confidence statements.Mathematics of Operations Research, 50(2):1364–1397, 2024

work page 2024
[31]

Nonasymptotic analysis of accelerated methods with inexact oracle under absolute error bound

Yin Liu and Sam Davanloo Tajbakhsh. Nonasymptotic analysis of accelerated methods with inexact oracle under absolute error bound. 2025

work page 2025
[32]

Accelerated first-order methods for convex optimization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

Zhaosong Lu and Sanyou Mei. Accelerated first-order methods for convex optimization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

work page 2023
[33]

Jovanovi´ c

Hesameddin Mohammadi, Meisam Razaviyayn, and Mihailo R. Jovanovi´ c. Tradeoffs between conver- gence rate and noise amplification for momentum-based accelerated optimization algorithms.IEEE Transactions on Automatic Control, 70(2):889–904, 2025

work page 2025
[34]

Gradient methods for minimizing composite functions.Mathematical programming, 140(1):125–161, 2013

Yu Nesterov. Gradient methods for minimizing composite functions.Mathematical programming, 140(1):125–161, 2013

work page 2013
[35]

A method for solving the convex programming problem with convergence rateo(1/k 2)

Yurii Nesterov. A method for solving the convex programming problem with convergence rateo(1/k 2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983

work page 1983
[36]

Stochastic ista/fista adaptive step search algorithms for convex composite optimization.arXiv preprint arXiv:2402.15646, 2024

Lam M Nguyen, Katya Scheinberg, and Trang H Tran. Stochastic ista/fista adaptive step search algorithms for convex composite optimization.arXiv preprint arXiv:2402.15646, 2024

work page arXiv 2024
[37]

Nguyen and Trang H

Lam M. Nguyen and Trang H. Tran. On the convergence to a global solution of shuffling-type gradient algorithms. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[38]

Improved convergence in high prob- ability of clipped gradient methods with heavy tailed noise

Ta Duy Nguyen, Thien H Nguyen, Alina Ene, and Huy Nguyen. Improved convergence in high prob- ability of clipped gradient methods with heavy tailed noise. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 24191–24222. Curran Associates, Inc., 2023

work page 2023
[39]

A stochastic line search method with expected complexity analysis.SIAM Journal on Optimization, 30(1):349–376, 2020

Courtney Paquette and Katya Scheinberg. A stochastic line search method with expected complexity analysis.SIAM Journal on Optimization, 30(1):349–376, 2020

work page 2020
[40]

Moment inequalities for sums of dependent random variables under projective condi- tions.Journal of Theoretical Probability, 22(1):146–163, 2009

Emmanuel Rio. Moment inequalities for sums of dependent random variables under projective condi- tions.Journal of Theoretical Probability, 22(1):146–163, 2009

work page 2009
[41]

Fast first-order methods for composite convex opti- mization with backtracking.Foundations of Computational Mathematics, 14:389–417, 2014

Katya Scheinberg, Donald Goldfarb, and Xi Bai. Fast first-order methods for composite convex opti- mization with backtracking.Foundations of Computational Mathematics, 14:389–417, 2014. 29

work page 2014
[42]

Stochastic adaptive regularization method with cubics: A high probability complexity bound

Katya Scheinberg and Miaolan Xie. Stochastic adaptive regularization method with cubics: A high probability complexity bound. InProceedings of the 2023 Winter Simulation Conference (WSC), pages 3520–3531. IEEE, 2023

work page 2023
[43]

Stochastic adaptive optimization with unreliable inputs: A unified framework for high-probability complexity analysis, 2025

Katya Scheinberg and Miaolan Xie. Stochastic adaptive optimization with unreliable inputs: A unified framework for high-probability complexity analysis, 2025

work page 2025
[44]

Fast convergence of stochastic gradient descent under a strong growth condition

Mark Schmidt and Nicolas Le Roux. Fast convergence of stochastic gradient descent under a strong growth condition. Technical report, University of British Columbia, 2013

work page 2013
[45]

Convergence rates of inexact proximal-gradient methods for convex optimization

Mark Schmidt, Nicolas Le Roux, and Francis Bach. Convergence rates of inexact proximal-gradient methods for convex optimization. InProceedings of the 25th International Conference on Neural Infor- mation Processing Systems, NIPS’11, page 1458–1466, Red Hook, NY, USA, 2011. Curran Associates Inc

work page 2011
[46]

Understanding the acceleration phenomenon via high-resolution differential equations.Mathematical Programming, 195(1):79–148, 2022

Bin Shi, Simon S Du, Michael I Jordan, and Weijie J Su. Understanding the acceleration phenomenon via high-resolution differential equations.Mathematical Programming, 195(1):79–148, 2022

work page 2022
[47]

A tail-index analysis of stochastic gradient noise in deep neural networks

Umut Simsekli, Levent Sagun, and Mert Gurbuzbalaban. A tail-index analysis of stochastic gradient noise in deep neural networks. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 5827–5837. PMLR, 2019

work page 2019
[48]

A differential equation for modeling Nesterov’s ac- celerated gradient method: theory and insights.The Journal of Machine Learning Research, 17(1):5312– 5354, 2016

Weijie Su, Stephen Boyd, and Emmanuel J Cand` es. A differential equation for modeling Nesterov’s ac- celerated gradient method: theory and insights.The Journal of Machine Learning Research, 17(1):5312– 5354, 2016

work page 2016
[49]

Tran, Katya Scheinberg, and Lam M

Trang H. Tran, Katya Scheinberg, and Lam M. Nguyen. Nesterov accelerated shuffling gradient method for convex optimization. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 21703–21732. PMLR, July 2022

work page 2022
[50]

On accelerated proximal gradient methods for convex-concave optimization.SIAM journal on optimization, 2008

Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization.SIAM journal on optimization, 2008

work page 2008
[51]

Inequalities for therth absolute moment of a sum of random variables, 1≤r≤2.The Annals of Mathematical Statistics, 36(1):299–303, 1965

Bengt von Bahr and Carl-Gustav Esseen. Inequalities for therth absolute moment of a sum of random variables, 1≤r≤2.The Annals of Mathematical Statistics, 36(1):299–303, 1965

work page 1965
[52]

Zhang, N

X. Zhang, N. S. Aybat, and M. G¨ urb¨ uzbalaban. Robust accelerated primal-dual methods for computing saddle points.SIAM Journal on Optimization, 34(1):1097–1130, 2024. 30 A Proofs for Section 3 A.1 Proofs for Section 3.1 Proof of Theorem 1.Consider fixedtsuch thatt≥ ¯Tand t≥ H −1 ¯γ,ˆp εϕ −ε 0 Φ0 .(46) We aim to show that the stopping condition is met (i...

work page 2024

[1] [1]

N. S. Aybat, A. Fallah, M. G¨ urb¨ uzbalaban, and A. Ozdaglar. Robust accelerated gradient methods for smooth strongly convex functions.SIAM Journal on Optimization, 30(1):717–751, 2020

work page 2020

[2] [2]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM journal on imaging sciences, 2(1):183–202, 2009

Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM journal on imaging sciences, 2(1):183–202, 2009

work page 2009

[3] [3]

A stochastic cubic regu- larisation method with inexact function evaluations and random derivatives for finite sum minimisation

Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, and Philippe L Toint. A stochastic cubic regu- larisation method with inexact function evaluations and random derivatives for finite sum minimisation. InProceedings of the 37th International Conference on Machine Learning, pages 745–754. PMLR, 2020

work page 2020

[4] [4]

Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022

Stefania Bellavia, Gianmarco Gurioli, Benedetta Morini, and Philippe L Toint. Adaptive regularization for nonconvex optimization using inexact function values and randomly perturbed derivatives.Journal of Complexity, 68:101591, 2022

work page 2022

[5] [5]

Cao, and Katya Scheinberg

Albert Berahas, L. Cao, and Katya Scheinberg. Global convergence rate analysis of a generic line search algorithm with noise.SIAM Journal on Optimization, 31:1489–1518, 06 2021

work page 2021

[6] [6]

Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg

Albert S. Berahas, Liyuan Cao, Krzysztof Choromanski, and Katya Scheinberg. A theoretical and empirical comparison of gradient approximations in derivative-free optimization.Found. Comput. Math., 22(2):507–560, April 2022. 27

work page 2022

[7] [7]

Berahas, Miaolan Xie, and Baoyu Zhou

Albert S. Berahas, Miaolan Xie, and Baoyu Zhou. A sequential quadratic programming method with high-probability complexity bounds for nonlinear equality-constrained stochastic optimization.SIAM Journal on Optimization, 35(1):240–269, 2025

work page 2025

[8] [8]

On biased compression for distributed learning.Journal of Machine Learning Research, 24(276):1–50, 2023

Aleksei Beznosikov, Samuel Horv´ ath, Peter Richt´ arik, and Mher Safaryan. On biased compression for distributed learning.Journal of Machine Learning Research, 24(276):1–50, 2023

work page 2023

[9] [9]

Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019

work page 2019

[10] [10]

Backtracking strategies for accelerated descent methods with smooth composite objectives.SIAM Journal on Optimization, 29(3):1772–1798, 2019

Luca Calatroni and Antonin Chambolle. Backtracking strategies for accelerated descent methods with smooth composite objectives.SIAM Journal on Optimization, 29(3):1772–1798, 2019

work page 2019

[11] [11]

Berahas, and Katya Scheinberg

Liyuan Cao, Albert S. Berahas, and Katya Scheinberg. First- and second-order high probability com- plexity bounds for trust-region methods with noisy oracles.Mathematical Programming, 207(1):55–106, 2024

work page 2024

[12] [12]

Cartis and K Scheinberg

C. Cartis and K Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(3):337–375, 2018

work page 2018

[13] [13]

Convergence analysis of accelerated stochastic gradient descent under the growth condition.Mathematics of Operations Research, 49(4):2492–2526, 2023

You-Lin Chen, Sen Na, and Mladen Kolar. Convergence analysis of accelerated stochastic gradient descent under the growth condition.Mathematics of Operations Research, 49(4):2492–2526, 2023

work page 2023

[14] [14]

Smooth optimization with approximate gradient.SIAM Journal on Optimiza- tion, 19(3):1171–1183, 2008

Alexandre d’Aspremont. Smooth optimization with approximate gradient.SIAM Journal on Optimiza- tion, 19(3):1171–1183, 2008

work page 2008

[15] [15]

Intermediate gradient methods for smooth convex problems with inexact oracle, 2013

Olivier Devolder, Fran¸ cois Glineur, and Yurii Nesterov. Intermediate gradient methods for smooth convex problems with inexact oracle, 2013. CORE Discussion Paper 2013/17; Optimization Online version posted 2014

work page 2013

[16] [16]

First-order methods of smooth convex opti- mization with inexact oracle.Mathematical Programming, 146:37–75, 2014

Olivier Devolder, Fran¸ cois Glineur, and Yurii Nesterov. First-order methods of smooth convex opti- mization with inexact oracle.Mathematical Programming, 146:37–75, 2014

work page 2014

[17] [17]

Fallah, K

A. Fallah, K. Yuan, M. G¨ urb¨ uzbalaban, and A. Ozdaglar. Robust distributed accelerated stochastic gradient methods for multi-agent networks.Journal of Machine Learning Research, 23(220):1–96, 2022

work page 2022

[18] [18]

Deviation inequalities for martingales with applications

Xiequan Fan, Ion Grama, and Quansheng Liu. Deviation inequalities for martingales with applications. Journal of Mathematical Analysis and Applications, 448(1):538–566, 2017

work page 2017

[19] [19]

High probability complexity bounds of trust-region stochastic sequential quadratic programming with heavy-tailed noise, 2025

Yuchen Fang, Javad Lavaei, and Sen Na. High probability complexity bounds of trust-region stochastic sequential quadratic programming with heavy-tailed noise, 2025

work page 2025

[20] [20]

Stochastic optimization with heavy- tailed noise via accelerated gradient clipping.Advances in Neural Information Processing Systems, 33:15042–15053, 2020

Eduard Gorbunov, Marina Danilova, and Alexander Gasnikov. Stochastic optimization with heavy- tailed noise via accelerated gradient clipping.Advances in Neural Information Processing Systems, 33:15042–15053, 2020

work page 2020

[21] [21]

Siegel, and Stephan Wojtowytsch

Kanan Gupta, Jonathan W. Siegel, and Stephan Wojtowytsch. Nesterov acceleration despite very noisy gradients, 2024

work page 2024

[22] [22]

Robustly stable accelerated momentum methods with a near-optimal l2 gain and h∞ performance, 2023

Mert Gurbuzbalaban. Robustly stable accelerated momentum methods with a near-optimal l2 gain and h∞ performance, 2023

work page 2023

[23] [23]

Accelerated gradient methods with biased gradient estimates: Risk sensitivity, high-probability guarantees, and large deviation bounds, 2025

Mert G¨ urb¨ uzbalaban, Yasa Syed, and Necdet Serhat Aybat. Accelerated gradient methods with biased gradient estimates: Risk sensitivity, high-probability guarantees, and large deviation bounds, 2025

work page 2025

[24] [24]

High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024

Billy Jin, Katya Scheinberg, and Miaolan Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024. 28

work page 2024

[25] [25]

Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, pages 1–29, 2024

Billy Jin, Katya Scheinberg, and Miaolan Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, pages 1–29, 2024

work page 2024

[26] [26]

Error feedback fixes signsgd and other gradient compression schemes

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, and Martin Jaggi. Error feedback fixes signsgd and other gradient compression schemes. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 3252–3261. PMLR, 2019

work page 2019

[27] [27]

Intermediate gradient methods with relative inexactness.Journal of Optimization Theory and Applications, 207(3):62, 2025

Nikita Kornilov, Mohammad Alkousa, Eduard Gorbunov, Fedor Stonyakin, Pavel Dvurechensky, and Alexander Gasnikov. Intermediate gradient methods with relative inexactness.Journal of Optimization Theory and Applications, 207(3):62, 2025

work page 2025

[28] [28]

Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise.J

Andrei Kulunchakov and Julien Mairal. Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise.J. Mach. Learn. Res., 21(1), January 2020

work page 2020

[29] [29]

undefinedlker Birbil, Mert G¨ urb¨ uzbalaban, and Sinan Yildirim

Nurdan Kuru, S ¸. undefinedlker Birbil, Mert G¨ urb¨ uzbalaban, and Sinan Yildirim. Differentially private accelerated optimization algorithms.SIAM J. on Optimization, 32(2):795–821, January 2022

work page 2022

[30] [30]

Shanbhag

Jinlong Lei and Uday V. Shanbhag. Variance-reduced accelerated first-order methods: Central limit theorems and confidence statements.Mathematics of Operations Research, 50(2):1364–1397, 2024

work page 2024

[31] [31]

Nonasymptotic analysis of accelerated methods with inexact oracle under absolute error bound

Yin Liu and Sam Davanloo Tajbakhsh. Nonasymptotic analysis of accelerated methods with inexact oracle under absolute error bound. 2025

work page 2025

[32] [32]

Accelerated first-order methods for convex optimization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

Zhaosong Lu and Sanyou Mei. Accelerated first-order methods for convex optimization with locally lipschitz continuous gradient.SIAM Journal on Optimization, 33(3):2275–2310, 2023

work page 2023

[33] [33]

Jovanovi´ c

Hesameddin Mohammadi, Meisam Razaviyayn, and Mihailo R. Jovanovi´ c. Tradeoffs between conver- gence rate and noise amplification for momentum-based accelerated optimization algorithms.IEEE Transactions on Automatic Control, 70(2):889–904, 2025

work page 2025

[34] [34]

Gradient methods for minimizing composite functions.Mathematical programming, 140(1):125–161, 2013

Yu Nesterov. Gradient methods for minimizing composite functions.Mathematical programming, 140(1):125–161, 2013

work page 2013

[35] [35]

A method for solving the convex programming problem with convergence rateo(1/k 2)

Yurii Nesterov. A method for solving the convex programming problem with convergence rateo(1/k 2). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983

work page 1983

[36] [36]

Stochastic ista/fista adaptive step search algorithms for convex composite optimization.arXiv preprint arXiv:2402.15646, 2024

Lam M Nguyen, Katya Scheinberg, and Trang H Tran. Stochastic ista/fista adaptive step search algorithms for convex composite optimization.arXiv preprint arXiv:2402.15646, 2024

work page arXiv 2024

[37] [37]

Nguyen and Trang H

Lam M. Nguyen and Trang H. Tran. On the convergence to a global solution of shuffling-type gradient algorithms. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[38] [38]

Improved convergence in high prob- ability of clipped gradient methods with heavy tailed noise

Ta Duy Nguyen, Thien H Nguyen, Alina Ene, and Huy Nguyen. Improved convergence in high prob- ability of clipped gradient methods with heavy tailed noise. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 24191–24222. Curran Associates, Inc., 2023

work page 2023

[39] [39]

A stochastic line search method with expected complexity analysis.SIAM Journal on Optimization, 30(1):349–376, 2020

Courtney Paquette and Katya Scheinberg. A stochastic line search method with expected complexity analysis.SIAM Journal on Optimization, 30(1):349–376, 2020

work page 2020

[40] [40]

Moment inequalities for sums of dependent random variables under projective condi- tions.Journal of Theoretical Probability, 22(1):146–163, 2009

Emmanuel Rio. Moment inequalities for sums of dependent random variables under projective condi- tions.Journal of Theoretical Probability, 22(1):146–163, 2009

work page 2009

[41] [41]

Fast first-order methods for composite convex opti- mization with backtracking.Foundations of Computational Mathematics, 14:389–417, 2014

Katya Scheinberg, Donald Goldfarb, and Xi Bai. Fast first-order methods for composite convex opti- mization with backtracking.Foundations of Computational Mathematics, 14:389–417, 2014. 29

work page 2014

[42] [42]

Stochastic adaptive regularization method with cubics: A high probability complexity bound

Katya Scheinberg and Miaolan Xie. Stochastic adaptive regularization method with cubics: A high probability complexity bound. InProceedings of the 2023 Winter Simulation Conference (WSC), pages 3520–3531. IEEE, 2023

work page 2023

[43] [43]

Stochastic adaptive optimization with unreliable inputs: A unified framework for high-probability complexity analysis, 2025

Katya Scheinberg and Miaolan Xie. Stochastic adaptive optimization with unreliable inputs: A unified framework for high-probability complexity analysis, 2025

work page 2025

[44] [44]

Fast convergence of stochastic gradient descent under a strong growth condition

Mark Schmidt and Nicolas Le Roux. Fast convergence of stochastic gradient descent under a strong growth condition. Technical report, University of British Columbia, 2013

work page 2013

[45] [45]

Convergence rates of inexact proximal-gradient methods for convex optimization

Mark Schmidt, Nicolas Le Roux, and Francis Bach. Convergence rates of inexact proximal-gradient methods for convex optimization. InProceedings of the 25th International Conference on Neural Infor- mation Processing Systems, NIPS’11, page 1458–1466, Red Hook, NY, USA, 2011. Curran Associates Inc

work page 2011

[46] [46]

Understanding the acceleration phenomenon via high-resolution differential equations.Mathematical Programming, 195(1):79–148, 2022

Bin Shi, Simon S Du, Michael I Jordan, and Weijie J Su. Understanding the acceleration phenomenon via high-resolution differential equations.Mathematical Programming, 195(1):79–148, 2022

work page 2022

[47] [47]

A tail-index analysis of stochastic gradient noise in deep neural networks

Umut Simsekli, Levent Sagun, and Mert Gurbuzbalaban. A tail-index analysis of stochastic gradient noise in deep neural networks. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 5827–5837. PMLR, 2019

work page 2019

[48] [48]

A differential equation for modeling Nesterov’s ac- celerated gradient method: theory and insights.The Journal of Machine Learning Research, 17(1):5312– 5354, 2016

Weijie Su, Stephen Boyd, and Emmanuel J Cand` es. A differential equation for modeling Nesterov’s ac- celerated gradient method: theory and insights.The Journal of Machine Learning Research, 17(1):5312– 5354, 2016

work page 2016

[49] [49]

Tran, Katya Scheinberg, and Lam M

Trang H. Tran, Katya Scheinberg, and Lam M. Nguyen. Nesterov accelerated shuffling gradient method for convex optimization. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 21703–21732. PMLR, July 2022

work page 2022

[50] [50]

On accelerated proximal gradient methods for convex-concave optimization.SIAM journal on optimization, 2008

Paul Tseng. On accelerated proximal gradient methods for convex-concave optimization.SIAM journal on optimization, 2008

work page 2008

[51] [51]

Inequalities for therth absolute moment of a sum of random variables, 1≤r≤2.The Annals of Mathematical Statistics, 36(1):299–303, 1965

Bengt von Bahr and Carl-Gustav Esseen. Inequalities for therth absolute moment of a sum of random variables, 1≤r≤2.The Annals of Mathematical Statistics, 36(1):299–303, 1965

work page 1965

[52] [52]

Zhang, N

X. Zhang, N. S. Aybat, and M. G¨ urb¨ uzbalaban. Robust accelerated primal-dual methods for computing saddle points.SIAM Journal on Optimization, 34(1):1097–1130, 2024. 30 A Proofs for Section 3 A.1 Proofs for Section 3.1 Proof of Theorem 1.Consider fixedtsuch thatt≥ ¯Tand t≥ H −1 ¯γ,ˆp εϕ −ε 0 Φ0 .(46) We aim to show that the stopping condition is met (i...

work page 2024