Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

Dongruo Zhou

arxiv: 2606.05438 · v1 · pith:MDEYEIDRnew · submitted 2026-06-03 · 💻 cs.LG · math.OC

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

Dongruo Zhou This is my paper

Pith reviewed 2026-06-28 06:47 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords nonconvex optimizationfirst-order methodslower boundshigher-order smoothnessstationary pointsoracle complexityHessian Lipschitzthird-order smoothness

0 comments

The pith

First-order methods require Ω(ε^{-7/4}) iterations to reach ε-stationary points in Hessian-Lipschitz nonconvex problems and Ω(ε^{-5/3}) under third-order smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that known accelerated upper bounds for deterministic first-order methods in higher-order smooth nonconvex optimization are tight. It constructs dimension-free hard instances that force any algorithm using only gradient oracles to take the matching number of steps. The result holds for every finite smoothness order and closes the gap left open by prior upper-bound analyses. A sympathetic reader sees that higher-order smoothness assumptions cannot improve first-order rates beyond these exponents in the worst case.

Core claim

There exist families of higher-order smooth nonconvex functions on which any deterministic first-order algorithm requires Ω(ε^{-7/4}) oracle calls to produce an ε-stationary point when the Hessian is Lipschitz continuous, and Ω(ε^{-5/3}) calls when the third derivative is Lipschitz. The lower bounds are dimension-free and match the best known upper bounds exactly.

What carries the argument

The block-chain mechanism, which chains multiple blocks to enforce blockwise sequential revelation of oracle information while preserving the higher-order smoothness of a scalar hard instance.

If this is right

The ε^{-7/4} upper bound is optimal for finding stationary points under Lipschitz Hessians.
The ε^{-5/3} upper bound is optimal under Lipschitz third derivatives.
Matching lower bounds extend to any finite order of smoothness.
No dimension dependence appears in the lower bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same block-chain idea may yield lower bounds for stochastic or randomized first-order methods.
It would be natural to test whether the construction adapts to constrained or manifold optimization.
The gap between first-order and higher-order oracle complexities remains wide even under strong smoothness.

Load-bearing premise

The smoothness structure of a single hard scalar function can be replicated across chained blocks without allowing the algorithm earlier access to later blocks.

What would settle it

An explicit deterministic first-order algorithm that returns an ε-stationary point after o(ε^{-7/4}) gradient evaluations on every Hessian-Lipschitz function would falsify the lower bound.

read the original abstract

We study the deterministic first-order oracle complexity of finding \(\epsilon\)-stationary points in smooth nonconvex optimization when the objective satisfies higher-order smoothness assumptions. While the classical \(\epsilon^{-2}\) rate is optimal under only Lipschitz gradients, higher-order smoothness leads to accelerated first-order upper bounds, most notably the \(\epsilon^{-7/4}\) rate under Lipschitz Hessians and the \(\epsilon^{-5/3}\) rate under Lipschitz third derivatives. The matching lower bounds, however, have remained open. We resolve this gap by proving a new dimension-free first-order lower bound for higher-order smooth nonconvex functions, valid for every finite smoothness order. In particular, our construction gives a matching \(\Omega(\epsilon^{-7/4})\) lower bound in the Hessian-Lipschitz case and a matching \(\Omega(\epsilon^{-5/3})\) lower bound in the third-order-smooth regime. The hard instance is based on a \emph{block-chain} mechanism that enforces blockwise oracle revelation while preserving the smoothness structure needed for the scalar hard instance. The lower-bound construction was discovered with the assistance of ChatGPT 5.5 Pro and subsequently verified by the authors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper closes the open gap with matching lower bounds via a block-chain construction, but the key technical claim needs checking on whether smoothness constants stay independent of chain length.

read the letter

The main point is that this work supplies the previously missing matching lower bounds: Ω(ε^{-7/4}) under Hessian-Lipschitz and Ω(ε^{-5/3}) under third-order smoothness for deterministic first-order methods.

The new piece is the block-chain mechanism. It chains scalar hard instances so that first-order oracles must be queried block by block, while the composite function stays inside the claimed smoothness class. This produces dimension-free lower bounds that match the known upper bounds.

The paper states the rates cleanly and positions the result against the existing upper-bound literature. The authors note they verified the construction after initial discovery help from an AI model, which at least signals they checked the algebra themselves.

The soft spot sits where the stress-test note points: the chaining step must keep the highest-order Lipschitz constant from scaling with the number of blocks. If it grows, the rescaling argument that produces the exact exponent would break. The abstract asserts preservation of the structure, but that is the precise property the proof must establish. Until the details are examined, the bound strength remains conditional on that step.

This is for readers who follow nonconvex complexity results and want the full picture of what first-order methods can and cannot achieve. It is worth sending to peer review so the construction can be vetted in full.

Referee Report

1 major / 1 minor

Summary. The paper claims to resolve open gaps in first-order oracle complexity for nonconvex optimization by constructing dimension-free hard instances that achieve matching lower bounds of Ω(ε^{-7/4}) under Hessian-Lipschitz smoothness and Ω(ε^{-5/3}) under third-order smoothness. The central technical device is a block-chain mechanism that enforces sequential blockwise first-order oracle access while preserving the higher-order smoothness structure of a scalar hard instance, allowing the lower bounds to carry over to the composite function.

Significance. If the construction is valid, the result is significant: it supplies the first matching lower bounds for these accelerated regimes, confirming optimality of known upper bounds and extending to arbitrary finite smoothness order in a dimension-free manner. The explicit construction (rather than parameter fitting or self-referential equations) is a methodological strength.

major comments (1)

[Block-chain mechanism] Block-chain mechanism (abstract and main construction): the claim that the mechanism 'preserves the smoothness structure' is load-bearing for the matching rates. If the effective Lipschitz constant of the highest-order derivative grows with the number of blocks B (or with dimension), then the rescaling required to reach stationarity tolerance ε would change the iteration lower bound, breaking the asserted Ω(ε^{-7/4}) and Ω(ε^{-5/3}) results. A explicit verification that all relevant Lipschitz constants remain independent of B is required.

minor comments (1)

[Abstract] The disclosure that the construction was discovered with ChatGPT assistance is already present in the abstract; if retained, it should be moved to the acknowledgments or a dedicated footnote for standard presentation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying the load-bearing aspect of the block-chain construction. We address the major comment below and will revise the manuscript to make the required verification explicit.

read point-by-point responses

Referee: [Block-chain mechanism] Block-chain mechanism (abstract and main construction): the claim that the mechanism 'preserves the smoothness structure' is load-bearing for the matching rates. If the effective Lipschitz constant of the highest-order derivative grows with the number of blocks B (or with dimension), then the rescaling required to reach stationarity tolerance ε would change the iteration lower bound, breaking the asserted Ω(ε^{-7/4}) and Ω(ε^{-5/3}) results. A explicit verification that all relevant Lipschitz constants remain independent of B is required.

Authors: We agree that an explicit verification is necessary for the claim to be fully rigorous. The block-chain construction couples blocks sequentially so that cross-block interactions are limited to adjacent blocks only; non-adjacent mixed partial derivatives vanish identically by the chain structure. Consequently, the Lipschitz constant of the k-th derivative of the composite function is bounded by a quantity that depends solely on the parameters of the scalar hard instance and is independent of both B and the ambient dimension. In the revised manuscript we will add a dedicated lemma (with proof in the appendix) that explicitly bounds all relevant higher-order Lipschitz constants and confirms they remain O(1) with respect to B. This preserves the original rescaling argument and the asserted lower-bound exponents. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit construction supplies independent lower-bound evidence

full rationale

The paper derives its Ω(ε^{-7/4}) and Ω(ε^{-5/3}) lower bounds from an explicit new block-chain hard instance whose first-order oracle complexity is analyzed directly. No parameter is fitted to data and then relabeled a prediction, no self-citation supplies a load-bearing uniqueness or smoothness-preservation theorem, and the claimed preservation of higher-order Lipschitz constants is part of the construction definition rather than a reduction to prior results by the same authors. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review based only on abstract; no explicit free parameters, axioms, or invented entities beyond the high-level block-chain mechanism are described.

invented entities (1)

block-chain mechanism no independent evidence
purpose: Enforces blockwise oracle revelation while preserving smoothness structure
Described in abstract as the basis for the hard instance construction

pith-pipeline@v0.9.1-grok · 5731 in / 976 out tokens · 33215 ms · 2026-06-28T06:47:46.274268+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On the Limits of Biased Derivative Information for Nonconvex Stochastic Optimization
math.OC 2026-06 unverdicted novelty 7.0

Provides tight lower bounds on oracle complexity for biased stochastic nonconvex optimization and matching algorithms using trust regions and higher-order variance reduction.
A Restart-Free Accelerated Algorithm for Non-Convex Minimization: Continuous and Discrete Analysis
math.OC 2026-06 unverdicted novelty 6.0

Two restart-free accelerated first-order methods for nonconvex functions with Lipschitz gradients and Hessians achieve O(ε^{-7/4}) complexity by discretizing a new ODE model, with adaptive Lipschitz estimation in one variant.

Reference graph

Works this paper leans on

38 extracted references · 2 linked inside Pith · cited by 2 Pith papers

[1]

Balancing gradient and hessian queries in non-convex optimization

Adil, D.,Bullins, B.,Sidford, A.andZhang, C.(2025). Balancing gradient and hessian queries in non-convex optimization. InAdvances in Neural Information Processing Systems

2025
[2]

Finding approximate local minima faster than gradient descent

Agarwal, N.,Allen-Zhu, Z.,Bullins, B.,Hazan, E.andMa, T.(2017). Finding approximate local minima faster than gradient descent. InProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

2017
[3]

NEON2: Finding local minima via first-order oracles

Allen-Zhu, Z.andLi, Y.(2018). NEON2: Finding local minima via first-order oracles. In Advances in Neural Information Processing Systems, vol. 31

2018
[4]

C.,Foster, D

Arjevani, Y.,Carmon, Y.,Duchi, J. C.,Foster, D. J.,Sekhari, A.andSridharan, K. (2020). Second-order information in non-convex stochastic optimization: Power and limitations. InProceedings of Thirty Third Conference on Learning Theory, vol. 125 ofProceedings of Machine Learning Research. PMLR

2020
[5]

C.,Foster, D

Arjevani, Y.,Carmon, Y.,Duchi, J. C.,Foster, D. J.,Srebro, N.andWoodworth, B.(2023). Lower bounds for non-convex stochastic optimization.Mathematical Programming 199165–214

2023
[6]

G.,Gardenghi, J

Birgin, E. G.,Gardenghi, J. L.,Mart ´ınez, J. M.,Santos, S. A.andToint, P. L. (2017). Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models.Mathematical Programming163359–368

2017
[7]

convex until proven guilty

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2017). “convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions. InProceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research. PMLR

2017
[8]

C.,Hinder, O.andSidford, A.(2018)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2018). Accelerated methods for nonconvex optimization.SIAM Journal on Optimization281751–1772

2018
[9]

C.,Hinder, O.andSidford, A.(2020)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2020). Lower bounds for finding stationary points i.Mathematical Programming18471–120. 21

2020
[10]

C.,Hinder, O.andSidford, A.(2021)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2021). Lower bounds for finding stationary points ii: First-order methods.Mathematical Programming185315–355

2021
[11]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2010). On the complexity of steepest descent, newton’s and regularized newton’s methods for nonconvex unconstrained optimization problems.SIAM Journal on Optimization202833–2852

2010
[12]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2012). Complexity bounds for second-order optimality in unconstrained optimization.Journal of Complexity2893–108

2012
[13]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2017). Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization.arXiv preprint arXiv:1709.07180

arXiv 2017
[14]

R.,Gould, N

Conn, A. R.,Gould, N. I. M.andToint, P. L.(2000).Trust Region Methods. MPS-SIAM Series on Optimization, SIAM

2000
[15]

Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion

Cutkosky, A.,Mehta, H.andOrabona, F.(2023). Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. InProceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PMLR

2023
[16]

N.(2022)

Emmenegger, N.,Kyng, R.andZehmakan, A. N.(2022). On the oracle complexity of higher-order smooth non-convex finite-sum optimization. InProceedings of the 25th International Conference on Artificial Intelligence and Statistics, vol. 151 ofProceedings of Machine Learning Research. PMLR

2022
[17]

J.,Lin, Z.andZhang, T.(2018)

Fang, C.,Li, C. J.,Lin, Z.andZhang, T.(2018). SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. InAdvances in Neural Information Processing Systems, vol. 31

2018
[18]

Sharp analysis for nonconvex SGD escaping from saddle points

Fang, C.,Lin, Z.andZhang, T.(2019). Sharp analysis for nonconvex SGD escaping from saddle points. InProceedings of the 32nd Conference on Learning Theory, vol. 99 ofProceedings of Machine Learning Research. PMLR

2019
[19]

Escaping from saddle points – online stochastic gradient for tensor decomposition

Ge, R.,Huang, F.,Jin, C.andYuan, Y.(2015). Escaping from saddle points – online stochastic gradient for tensor decomposition. InProceedings of the 28th Conference on Learning Theory, vol. 40 ofProceedings of Machine Learning Research. PMLR

2015
[20]

Improved complexity for smooth nonconvex optimization: A two-level online learning approach with quasi-newton methods

Jiang, R.,Mokhtari, A.andPatitucci, F.(2025). Improved complexity for smooth nonconvex optimization: A two-level online learning approach with quasi-newton methods. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing

2025
[21]

M.andJordan, M

Jin, C.,Ge, R.,Netrapalli, P.,Kakade, S. M.andJordan, M. I.(2017). How to escape saddle points efficiently. InProceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research. PMLR

2017
[22]

M.andJordan, M

Jin, C.,Netrapalli, P.,Ge, R.,Kakade, S. M.andJordan, M. I.(2019). On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points.arXiv preprint arXiv:1902.04811. 22

arXiv 2019
[23]

I.(2018)

Jin, C.,Netrapalli, P.andJordan, M. I.(2018). Accelerated gradient descent escapes saddle points faster than gradient descent. InProceedings of the 31st Conference on Learning Theory, vol. 75 ofProceedings of Machine Learning Research. PMLR

2018
[24]

Restarted nonconvex accelerated gradient descent: No more polylogarithmic factor in theO(ϵ −7/4) complexity.Journal of Machine Learning Research24 1–37

Li, H.andLin, Z.(2023). Restarted nonconvex accelerated gradient descent: No more polylogarithmic factor in theO(ϵ −7/4) complexity.Journal of Machine Learning Research24 1–37

2023
[25]

SSRGD: Simple stochastic recursive gradient descent for escaping saddle points

Li, Z.(2019). SSRGD: Simple stochastic recursive gradient descent for escaping saddle points. InAdvances in Neural Information Processing Systems, vol. 32

2019
[26]

Adaptive negative curvature descent with applications in non-convex optimization

Liu, M.,Li, Z.,Wang, X.,Yi, J.andYang, T.(2018). Adaptive negative curvature descent with applications in non-convex optimization. InAdvances in Neural Information Processing Systems, vol. 31

2018
[27]

Parameter-free accelerated gradient descent for nonconvex minimization.SIAM Journal on Optimization342093–2120

Marumo, N.andTakeda, A.(2024). Parameter-free accelerated gradient descent for nonconvex minimization.SIAM Journal on Optimization342093–2120

2024
[28]

T.(2006)

Nesterov, Y.andPolyak, B. T.(2006). Cubic regularization of newton method and its global performance.Mathematical Programming108177–205

2006
[29]

I.(2018)

Tripuraneni, N.,Stern, M.,Jin, C.,Regier, J.andJordan, M. I.(2018). Stochastic cubic regularization for fast nonconvex optimization. InAdvances in Neural Information Processing Systems, vol. 31

2018
[30]

NEON+: Accelerated gradient methods for extracting negative curvature for non-convex optimization.arXiv preprint arXiv:1712.01033

Xu, Y.,Jin, R.andYang, T.(2017). NEON+: Accelerated gradient methods for extracting negative curvature for non-convex optimization.arXiv preprint arXiv:1712.01033

Pith/arXiv arXiv 2017
[31]

First-order stochastic algorithms for escaping from saddle points in almost linear time

Xu, Y.,Jin, R.andYang, T.(2018). First-order stochastic algorithms for escaping from saddle points in almost linear time. InAdvances in Neural Information Processing Systems, vol. 31

2018
[32]

Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima

Yu, Y.,Xu, P.andGu, Q.(2018). Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima. InAdvances in Neural Information Processing Systems, vol. 31

2018
[33]

Escape saddle points by a simple gradient-descent based algorithm

Zhang, C.andLi, T.(2021). Escape saddle points by a simple gradient-descent based algorithm. InAdvances in Neural Information Processing Systems, vol. 34

2021
[34]

Stochastic recursive variance-reduced cubic regularization methods

Zhou, D.andGu, Q.(2020). Stochastic recursive variance-reduced cubic regularization methods. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, vol. 108 ofProceedings of Machine Learning Research. PMLR

2020
[35]

Dimension-free complexity bounds for high-order nonconvex finite-sum optimization

Zhou, D.andGu, Q.(2022). Dimension-free complexity bounds for high-order nonconvex finite-sum optimization. InProceedings of the 39th International Conference on Machine Learning, vol. 162 ofProceedings of Machine Learning Research. PMLR

2022
[36]

Finding local minima via stochastic nested variance reduction.arXiv preprint arXiv:1806.08782

Zhou, D.,Xu, P.andGu, Q.(2018). Finding local minima via stochastic nested variance reduction.arXiv preprint arXiv:1806.08782. 23

Pith/arXiv arXiv 2018
[37]

Stochastic variance-reduced cubic regularized newton methods

Zhou, D.,Xu, P.andGu, Q.(2018). Stochastic variance-reduced cubic regularized newton methods. InProceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research. PMLR

2018
[38]

Stochastic nested variance reduction for nonconvex optimization.Journal of Machine Learning Research211–63

Zhou, D.,Xu, P.andGu, Q.(2020). Stochastic nested variance reduction for nonconvex optimization.Journal of Machine Learning Research211–63. 24

2020

[1] [1]

Balancing gradient and hessian queries in non-convex optimization

Adil, D.,Bullins, B.,Sidford, A.andZhang, C.(2025). Balancing gradient and hessian queries in non-convex optimization. InAdvances in Neural Information Processing Systems

2025

[2] [2]

Finding approximate local minima faster than gradient descent

Agarwal, N.,Allen-Zhu, Z.,Bullins, B.,Hazan, E.andMa, T.(2017). Finding approximate local minima faster than gradient descent. InProceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

2017

[3] [3]

NEON2: Finding local minima via first-order oracles

Allen-Zhu, Z.andLi, Y.(2018). NEON2: Finding local minima via first-order oracles. In Advances in Neural Information Processing Systems, vol. 31

2018

[4] [4]

C.,Foster, D

Arjevani, Y.,Carmon, Y.,Duchi, J. C.,Foster, D. J.,Sekhari, A.andSridharan, K. (2020). Second-order information in non-convex stochastic optimization: Power and limitations. InProceedings of Thirty Third Conference on Learning Theory, vol. 125 ofProceedings of Machine Learning Research. PMLR

2020

[5] [5]

C.,Foster, D

Arjevani, Y.,Carmon, Y.,Duchi, J. C.,Foster, D. J.,Srebro, N.andWoodworth, B.(2023). Lower bounds for non-convex stochastic optimization.Mathematical Programming 199165–214

2023

[6] [6]

G.,Gardenghi, J

Birgin, E. G.,Gardenghi, J. L.,Mart ´ınez, J. M.,Santos, S. A.andToint, P. L. (2017). Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models.Mathematical Programming163359–368

2017

[7] [7]

convex until proven guilty

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2017). “convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions. InProceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research. PMLR

2017

[8] [8]

C.,Hinder, O.andSidford, A.(2018)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2018). Accelerated methods for nonconvex optimization.SIAM Journal on Optimization281751–1772

2018

[9] [9]

C.,Hinder, O.andSidford, A.(2020)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2020). Lower bounds for finding stationary points i.Mathematical Programming18471–120. 21

2020

[10] [10]

C.,Hinder, O.andSidford, A.(2021)

Carmon, Y.,Duchi, J. C.,Hinder, O.andSidford, A.(2021). Lower bounds for finding stationary points ii: First-order methods.Mathematical Programming185315–355

2021

[11] [11]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2010). On the complexity of steepest descent, newton’s and regularized newton’s methods for nonconvex unconstrained optimization problems.SIAM Journal on Optimization202833–2852

2010

[12] [12]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2012). Complexity bounds for second-order optimality in unconstrained optimization.Journal of Complexity2893–108

2012

[13] [13]

Cartis, C.,Gould, N. I. M.andToint, P. L.(2017). Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization.arXiv preprint arXiv:1709.07180

arXiv 2017

[14] [14]

R.,Gould, N

Conn, A. R.,Gould, N. I. M.andToint, P. L.(2000).Trust Region Methods. MPS-SIAM Series on Optimization, SIAM

2000

[15] [15]

Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion

Cutkosky, A.,Mehta, H.andOrabona, F.(2023). Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. InProceedings of the 40th International Conference on Machine Learning, vol. 202 ofProceedings of Machine Learning Research. PMLR

2023

[16] [16]

N.(2022)

Emmenegger, N.,Kyng, R.andZehmakan, A. N.(2022). On the oracle complexity of higher-order smooth non-convex finite-sum optimization. InProceedings of the 25th International Conference on Artificial Intelligence and Statistics, vol. 151 ofProceedings of Machine Learning Research. PMLR

2022

[17] [17]

J.,Lin, Z.andZhang, T.(2018)

Fang, C.,Li, C. J.,Lin, Z.andZhang, T.(2018). SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. InAdvances in Neural Information Processing Systems, vol. 31

2018

[18] [18]

Sharp analysis for nonconvex SGD escaping from saddle points

Fang, C.,Lin, Z.andZhang, T.(2019). Sharp analysis for nonconvex SGD escaping from saddle points. InProceedings of the 32nd Conference on Learning Theory, vol. 99 ofProceedings of Machine Learning Research. PMLR

2019

[19] [19]

Escaping from saddle points – online stochastic gradient for tensor decomposition

Ge, R.,Huang, F.,Jin, C.andYuan, Y.(2015). Escaping from saddle points – online stochastic gradient for tensor decomposition. InProceedings of the 28th Conference on Learning Theory, vol. 40 ofProceedings of Machine Learning Research. PMLR

2015

[20] [20]

Improved complexity for smooth nonconvex optimization: A two-level online learning approach with quasi-newton methods

Jiang, R.,Mokhtari, A.andPatitucci, F.(2025). Improved complexity for smooth nonconvex optimization: A two-level online learning approach with quasi-newton methods. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing

2025

[21] [21]

M.andJordan, M

Jin, C.,Ge, R.,Netrapalli, P.,Kakade, S. M.andJordan, M. I.(2017). How to escape saddle points efficiently. InProceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research. PMLR

2017

[22] [22]

M.andJordan, M

Jin, C.,Netrapalli, P.,Ge, R.,Kakade, S. M.andJordan, M. I.(2019). On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points.arXiv preprint arXiv:1902.04811. 22

arXiv 2019

[23] [23]

I.(2018)

Jin, C.,Netrapalli, P.andJordan, M. I.(2018). Accelerated gradient descent escapes saddle points faster than gradient descent. InProceedings of the 31st Conference on Learning Theory, vol. 75 ofProceedings of Machine Learning Research. PMLR

2018

[24] [24]

Restarted nonconvex accelerated gradient descent: No more polylogarithmic factor in theO(ϵ −7/4) complexity.Journal of Machine Learning Research24 1–37

Li, H.andLin, Z.(2023). Restarted nonconvex accelerated gradient descent: No more polylogarithmic factor in theO(ϵ −7/4) complexity.Journal of Machine Learning Research24 1–37

2023

[25] [25]

SSRGD: Simple stochastic recursive gradient descent for escaping saddle points

Li, Z.(2019). SSRGD: Simple stochastic recursive gradient descent for escaping saddle points. InAdvances in Neural Information Processing Systems, vol. 32

2019

[26] [26]

Adaptive negative curvature descent with applications in non-convex optimization

Liu, M.,Li, Z.,Wang, X.,Yi, J.andYang, T.(2018). Adaptive negative curvature descent with applications in non-convex optimization. InAdvances in Neural Information Processing Systems, vol. 31

2018

[27] [27]

Parameter-free accelerated gradient descent for nonconvex minimization.SIAM Journal on Optimization342093–2120

Marumo, N.andTakeda, A.(2024). Parameter-free accelerated gradient descent for nonconvex minimization.SIAM Journal on Optimization342093–2120

2024

[28] [28]

T.(2006)

Nesterov, Y.andPolyak, B. T.(2006). Cubic regularization of newton method and its global performance.Mathematical Programming108177–205

2006

[29] [29]

I.(2018)

Tripuraneni, N.,Stern, M.,Jin, C.,Regier, J.andJordan, M. I.(2018). Stochastic cubic regularization for fast nonconvex optimization. InAdvances in Neural Information Processing Systems, vol. 31

2018

[30] [30]

NEON+: Accelerated gradient methods for extracting negative curvature for non-convex optimization.arXiv preprint arXiv:1712.01033

Xu, Y.,Jin, R.andYang, T.(2017). NEON+: Accelerated gradient methods for extracting negative curvature for non-convex optimization.arXiv preprint arXiv:1712.01033

Pith/arXiv arXiv 2017

[31] [31]

First-order stochastic algorithms for escaping from saddle points in almost linear time

Xu, Y.,Jin, R.andYang, T.(2018). First-order stochastic algorithms for escaping from saddle points in almost linear time. InAdvances in Neural Information Processing Systems, vol. 31

2018

[32] [32]

Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima

Yu, Y.,Xu, P.andGu, Q.(2018). Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima. InAdvances in Neural Information Processing Systems, vol. 31

2018

[33] [33]

Escape saddle points by a simple gradient-descent based algorithm

Zhang, C.andLi, T.(2021). Escape saddle points by a simple gradient-descent based algorithm. InAdvances in Neural Information Processing Systems, vol. 34

2021

[34] [34]

Stochastic recursive variance-reduced cubic regularization methods

Zhou, D.andGu, Q.(2020). Stochastic recursive variance-reduced cubic regularization methods. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, vol. 108 ofProceedings of Machine Learning Research. PMLR

2020

[35] [35]

Dimension-free complexity bounds for high-order nonconvex finite-sum optimization

Zhou, D.andGu, Q.(2022). Dimension-free complexity bounds for high-order nonconvex finite-sum optimization. InProceedings of the 39th International Conference on Machine Learning, vol. 162 ofProceedings of Machine Learning Research. PMLR

2022

[36] [36]

Finding local minima via stochastic nested variance reduction.arXiv preprint arXiv:1806.08782

Zhou, D.,Xu, P.andGu, Q.(2018). Finding local minima via stochastic nested variance reduction.arXiv preprint arXiv:1806.08782. 23

Pith/arXiv arXiv 2018

[37] [37]

Stochastic variance-reduced cubic regularized newton methods

Zhou, D.,Xu, P.andGu, Q.(2018). Stochastic variance-reduced cubic regularized newton methods. InProceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research. PMLR

2018

[38] [38]

Stochastic nested variance reduction for nonconvex optimization.Journal of Machine Learning Research211–63

Zhou, D.,Xu, P.andGu, Q.(2020). Stochastic nested variance reduction for nonconvex optimization.Journal of Machine Learning Research211–63. 24

2020