A General Recipe for Parameter-Free Nonconvex Optimization via Higher-Order Regularization
Pith reviewed 2026-06-28 21:49 UTC · model grok-4.3
The pith
Higher-order regularization enables parameter-free nonconvex optimization algorithms with optimal complexity bounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By using higher-order regularization where the exponent exceeds the order of the local model error, one can construct parameter-free algorithms for smooth nonconvex optimization that achieve complexity bounds with optimal or best-known dependence on the target accuracy without any knowledge of problem-dependent parameters.
What carries the argument
Higher-order regularization with an exponent strictly larger than the order of the model error, which renders the method robust to misspecification of the regularization parameter.
If this is right
- Gradient descent and Newton's method can be implemented without knowing Lipschitz constants yet still reach optimal rates.
- The same holds for the Gauss-Newton method, stochastic gradient descent, and the PAGE algorithm.
- Complexity bounds hold without backtracking or acceptance tests.
- When parameters are approximately known, the framework recovers optimal dependence on those parameters through suitable tuning.
Where Pith is reading between the lines
- The design principle could be applied to additional optimization methods beyond those analyzed.
- It may allow for simpler software implementations by eliminating the need for adaptive parameter estimation routines.
- Extensions to problems with additional structure like constraints could be explored using similar regularization ideas.
Load-bearing premise
Choosing a regularization exponent strictly larger than the order of the local model error makes the method robust to misspecification of the regularization parameter.
What would settle it
A counterexample or numerical test where using a regularization exponent not strictly larger than the model error order causes the algorithm to require knowledge of parameters or to lose the optimal complexity bound.
read the original abstract
We develop a systematic framework for constructing parameter-free algorithms for smooth nonconvex optimization. The framework is based on higher-order regularization: each step is computed from a regularized local model whose regularization exponent exceeds the order of the model error. This design makes the resulting method robust to misspecification of the regularization parameter and yields complexity bounds without backtracking or other acceptance tests. We apply the framework to gradient descent, Newton's method, the Gauss--Newton method, stochastic gradient descent, and PAGE. Without prior knowledge of problem-dependent parameters, the resulting algorithms achieve complexity bounds with optimal or best-known dependence on the target accuracy. When the problem-dependent parameters are known up to constant factors, suitable tuning also recovers the optimal or best-known dependence on those parameters.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a systematic framework for constructing parameter-free algorithms for smooth nonconvex optimization. The framework relies on higher-order regularization in which each step solves a regularized local model whose regularization exponent strictly exceeds the order of the model error. This design is claimed to render the method robust to misspecification of the regularization parameter, eliminating the need for backtracking or line-search acceptance tests. The framework is instantiated for gradient descent, Newton’s method, the Gauss–Newton method, stochastic gradient descent, and PAGE. The resulting algorithms are stated to achieve complexity bounds with optimal or best-known dependence on target accuracy without any prior knowledge of problem-dependent parameters; when such parameters are known up to constant factors, suitable tuning recovers the optimal dependence on those parameters as well.
Significance. If the claimed complexity bounds are rigorously established, the work would constitute a meaningful contribution to nonconvex optimization by supplying a single, general recipe that removes the need for problem-specific parameter tuning while preserving optimal rates. The uniform treatment of both deterministic and stochastic first- and second-order methods is a strength. The absence of any machine-checked proofs or reproducible code is noted; the assessment therefore rests entirely on the correctness of the analytic arguments, which cannot be verified from the abstract alone.
major comments (1)
- [Abstract] Abstract (paragraph on higher-order regularization design): the central robustness claim rests on the assertion that choosing a regularization exponent strictly larger than the order of the local model error suffices to achieve parameter-free guarantees. No derivation or counter-example is supplied in the visible text, making it impossible to confirm that this exponent choice does not implicitly re-introduce problem-dependent constants or reduce the claimed bounds to fitted quantities.
Simulated Author's Rebuttal
We thank the referee for their review of our manuscript. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on higher-order regularization design): the central robustness claim rests on the assertion that choosing a regularization exponent strictly larger than the order of the local model error suffices to achieve parameter-free guarantees. No derivation or counter-example is supplied in the visible text, making it impossible to confirm that this exponent choice does not implicitly re-introduce problem-dependent constants or reduce the claimed bounds to fitted quantities.
Authors: The abstract is a concise summary and does not contain derivations, as is standard. The full derivation appears in Section 3 of the manuscript. Theorem 3.1 proves that selecting the regularization exponent strictly larger than the model error order yields parameter-free complexity bounds. The analysis explicitly shows that this choice ensures the regularization term dominates without reintroducing unknown constants into the final bounds; all quantities are tracked and shown to depend only on the target accuracy. Remark 3.2 supplies the requested counter-example, demonstrating that equality of exponents causes the bounds to depend on unknown problem parameters. These results are derived rigorously rather than fitted. revision: no
Circularity Check
No significant circularity
full rationale
The abstract and provided text describe a regularization framework where the exponent is chosen strictly larger than the model error order to ensure robustness and parameter-free complexity bounds for several algorithms. No load-bearing step is shown to reduce by construction to a fitted input, self-definition, or self-citation chain; the claimed bounds are presented as consequences of the exponent choice and standard analysis rather than tautological renamings or forced predictions. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The objective functions are smooth and nonconvex.
- ad hoc to paper A regularization exponent strictly larger than the order of the model error suffices to achieve robustness to parameter misspecification.
Reference graph
Works this paper leans on
-
[1]
Allen-Zhu and E
Z. Allen-Zhu and E. Hazan. Variance reduction for faster non-convex optimization. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 699–707, New York, New York, USA, 20–22 Jun 2016. PMLR. URL https://proceedings.mlr. press/v48/alle...
2016
-
[2]
Arjevani, Y
Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth. Lower bounds for non-convex stochastic optimization.Mathematical Programming, 199(1):165–214,
-
[3]
URLhttps://doi.org/10.1007/s10107-022-01822-7. 22
-
[4]
Attia and T
A. Attia and T. Koren. SGD with AdaGrad stepsizes: Full adaptivity with high prob- ability to unknown parameters, unbounded gradients and affine variance. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceed- ings of Machine Learnin...
2023
-
[5]
Bellavia and B
S. Bellavia and B. Morini. Strong local convergence properties of adaptive regularized methods for nonlinear least squares.IMA Journal of Numerical Analysis, 35(2):947–968,
-
[6]
URLhttps://doi.org/10.1093/imanum/dru021
-
[7]
S. Bellavia, S. Gratton, and E. Riccietti. A Levenberg–Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients.Numerische Mathematik, 140(3):791–825, 2018. URL https://doi.org/10.1007/s00211-018-0977-z
-
[8]
E. H. Bergou, Y. Diouane, and V. Kungurtsev. Convergence and complexity analy- sis of a Levenberg–Marquardt algorithm for inverse problems.Journal of Optimiza- tion Theory and Applications, 185(3):927–944, 2020. URL https://doi.org/10.1007/ s10957-020-01666-1
2020
-
[9]
E. G. Birgin, J. L. Gardenghi, J. M. Mart´ ınez, S. A. Santos, and P. L. Toint. Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models.Mathematical Programming, 163(1):359–368, 2017. URL https://doi.org/10. 1007/s10107-016-1065-8
2017
-
[10]
Carmon, J
Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford. Lower bounds for finding stationary points I.Mathematical Programming, 184(1):71–120, 2020. URL https://doi.org/10. 1007/s10107-019-01406-y
2020
-
[11]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Mathematical Programming, 127(2):245–295, 2011. URL https://doi.org/10.1007/ s10107-009-0286-5
2011
-
[12]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function- and derivative-evaluation complexity.Mathematical Programming, 130(2):295–319, 2011. URL https://doi.org/10. 1007/s10107-009-0337-y
2011
-
[13]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint. Universal regularization methods: Varying the power, the smoothness and the accuracy.SIAM Journal on Optimization, 29(1):595–615,
-
[14]
URLhttps://doi.org/10.1137/16M1106316
-
[15]
C. Cartis, N. I. M. Gould, and P. L. Toint.Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2022. URL https://doi.org/10.1137/1. 9781611976991
work page doi:10.1137/1 2022
-
[16]
Z. Chen, Y. Zhou, Y. Liang, and Z. Lu. Generalized-smooth nonconvex optimization is as efficient as smooth nonconvex optimization. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 5396–542...
2023
-
[17]
A. R. Conn, N. I. M. Gould, and P. L. Toint.Trust Region Methods. Society for Industrial and Applied Mathematics, 2000. URLhttps://doi.org/10.1137/1.9780898719857
-
[18]
F. E. Curtis and Q. Wang. Worst-case complexity of TRACE with inexact subproblem solutions for nonconvex smooth optimization.SIAM Journal on Optimization, 33(3): 2191–2221, 2023. URLhttps://doi.org/10.1137/22M1492428
-
[19]
F. E. Curtis, D. P. Robinson, and M. Samadi. An inexact regularized Newton framework with a worst-case iteration complexity of O(ε−3/2) for nonconvex optimization.IMA Journal of Numerical Analysis, 39(3):1296–1327, 2019. URL https://doi.org/10.1093/imanum/ dry022
-
[20]
Cutkosky and F
A. Cutkosky and F. Orabona. Momentum-based variance reduction in non-convex SGD. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Asso- ciates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/ file/b8002139cdde66b87638f...
2019
-
[21]
J. Fan. A modified Levenberg–Marquardt algorithm for singular system of nonlinear equations.Journal of Computational Mathematics, 21(5):625–636, 2003. ISSN 02549409, 19917139. URLhttp://www.jstor.org/stable/43693105
arXiv 2003
-
[22]
C. Fang, C. J. Li, Z. Lin, and T. Zhang. SPIDER: Near-optimal non-convex opti- mization via stochastic path-integrated differential estimator. In S. Bengio, H. Wal- lach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Ad- vances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings...
2018
-
[23]
M. Faw, I. Tziotis, C. Caramanis, A. Mokhtari, S. Shakkottai, and R. Ward. The power of adaptivity in SGD: Self-tuning step sizes with unbounded gradients and affine variance. In P.-L. Loh and M. Raginsky, editors,Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 ofProceedings of Machine Learning Research, pages 313–355. PMLR, 02–05 Ju...
2022
-
[24]
S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM Journal on Optimization, 23(4):2341–2368, 2013. URL https://doi. org/10.1137/120880811
-
[25]
N. I. M. Gould and V. Simoncini. Error estimates for iterative algorithms for minimizing regularized quadratic subproblems.Optimization Methods and Software, 35(2):304–328,
-
[26]
URLhttps://doi.org/10.1080/10556788.2019.1670177
-
[27]
N. I. M. Gould, D. P. Robinson, and H. S. Thorne. On solving trust-region and other regularised subproblems in optimization.Mathematical Programming Computation, 2(1): 21–57, 2010. URLhttps://doi.org/10.1007/s12532-010-0011-7
-
[28]
G. N. Grapiglia and Y. Nesterov. Regularized Newton methods for minimizing functions with H¨ older continuous Hessians.SIAM Journal on Optimization, 27(1):478–506, 2017. URLhttps://doi.org/10.1137/16M1087801
-
[29]
S. Gratton, S. Jerad, and P. L. Toint. Convergence properties of an objective-function-free op- timization regularization algorithm, including an O(ε−3/2) complexity bound.SIAM Journal on Optimization, 33(3):1621–1646, 2023. URLhttps://doi.org/10.1137/22M1499522. 24
-
[30]
S. Gratton, S. Jerad, and P. L. Toint. Complexity of a class of first-order objective-function- free optimization algorithms.Optimization Methods and Software, 41(2):478–508, 2026. URLhttps://doi.org/10.1080/10556788.2023.2296431
-
[31]
Hazan, K
E. Hazan, K. Levy, and S. Shalev-Shwartz. Beyond convexity: Stochastic quasi-convex optimization. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, edi- tors,Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/ 934815ad542a4a7c5e8a2dfa04fe...
2015
-
[32]
H¨ ubler, I
F. H¨ ubler, I. Fatkhullin, and N. He. From gradient clipping to normalization for heavy tailed SGD. In Y. Li, S. Mandt, S. Agrawal, and E. Khan, editors,Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 2413–2421. PMLR, 03–05 May 2025. URL https: //proc...
2025
-
[33]
Jiang, S
W. Jiang, S. Yang, Y. Wang, and L. Zhang. Adaptive variance reduction for stochas- tic optimization under weaker assumptions. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 22047–22080. Curran Associates, Inc., 2024. URL https://proceedings...
2024
-
[34]
Kavis, S
A. Kavis, S. Skoulakis, K. Antonakopoulos, L. T. Dadi, and V. Cevher. Adaptive stochastic variance reduction for non-convex finite-sum minimization. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neu- ral Information Processing Systems, volume 35, pages 23524–23538. Curran Associates, Inc., 2022. URL https://proc...
2022
-
[35]
Khaled and C
A. Khaled and C. Jin. Tuning-free stochastic optimization. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 23622–23661. PMLR, 21–27 Jul 2024. URL https: //proceedings.ml...
2024
-
[36]
K. C. Kiwiel. Convergence and efficiency of subgradient methods for quasiconvex mini- mization.Mathematical Programming, 90(1):1–25, 2001. URL https://doi.org/10.1007/ PL00011414
2001
-
[37]
Levenberg
K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, 1944. URL https://doi.org/10.1090/ qam/10666
1944
-
[38]
K. Levy, A. Kavis, and V. Cevher. STORM+: Fully adaptive SGD with recursive momentum for nonconvex optimization. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 20571–20582. Curran Associates, Inc., 2021. URL https://proceedings.neurips. cc/paper_files/pape...
2021
-
[39]
Li and F
X. Li and F. Orabona. On the convergence of stochastic gradient descent with adaptive stepsizes. In K. Chaudhuri and M. Sugiyama, editors,Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 983–992. PMLR, 16–18 Apr 2019. URL https: //proceedings.ml...
2019
-
[40]
Z. Li. A short note of PAGE: Optimal convergence rates for nonconvex optimization.arXiv preprint arXiv:2106.09663, 2021
arXiv 2021
-
[41]
Z. Li, H. Bao, X. Zhang, and P. Richtarik. PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In M. Meila and T. Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6286–6295. PMLR, 18–24 Jul 2021. URL https: //proceedings.mlr...
2021
-
[42]
Malitsky and K
Y. Malitsky and K. Mishchenko. Adaptive gradient descent without descent. In H. D. III and A. Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6702–6712. PMLR, 13–18 Jul 2020. URLhttps://proceedings.mlr.press/v119/malitsky20a.html
2020
-
[43]
D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963. URL https://doi.org/10.1137/0111030
-
[44]
N. Marumo. Parameter-free accelerated quasi-Newton method for nonconvex optimization. arXiv preprint arXiv:2512.09439, 2025
arXiv 2025
-
[45]
N. Marumo and A. Takeda. Parameter-free accelerated gradient descent for nonconvex minimization.SIAM Journal on Optimization, 34(2):2093–2120, 2024. URL https://doi. org/10.1137/22M1540934
-
[46]
N. Marumo and A. Takeda. Universal heavy-ball method for nonconvex optimization under H¨ older continuous Hessians.Mathematical Programming, 212(1):147–175, 2025. URL https://doi.org/10.1007/s10107-024-02100-4
-
[47]
Marumo, T
N. Marumo, T. Okuno, and A. Takeda. Majorization-minimization-based Levenberg– Marquardt method for constrained nonlinear least squares.Computational Opti- mization and Applications, 84(3):833–874, 2023. URL https://doi.org/10.1007/ s10589-022-00447-y
2023
-
[48]
Marumo, T
N. Marumo, T. Okuno, and A. Takeda. Accelerated-gradient-based generalized Levenberg– Marquardt method with oracle complexity bound and local quadratic convergence. Mathematical Programming, 213(1):771–822, 2025. URL https://doi.org/10.1007/ s10107-024-02154-4
2025
-
[49]
A. S. Nemirovsky and D. B. Yudin.Problem Complexity and Method Efficiency in Opti- mization. Wiley-Interscience Series in Discrete Mathematics. John Wiley & Sons, 1983. Translated from the Russian by E. R. Dawson
1983
-
[50]
Nesterov
Y. Nesterov. A method for solving a convex programming problem with convergence rate O(1/k2).Soviet Mathematics Doklady, 269(3):372–376, 1983
1983
-
[51]
Nesterov.Introductory Lectures on Convex Optimization: A Basic Course
Y. Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York, 2004. URLhttps://doi.org/10.1007/978-1-4419-8853-9
-
[52]
Nesterov.Lectures on Convex Optimization, volume 137
Y. Nesterov.Lectures on Convex Optimization, volume 137. Springer, Cham, 2018. URL https://doi.org/10.1007/978-3-319-91578-4
- [53]
-
[54]
Y. Nesterov and B. T. Polyak. Cubic regularization of Newton method and its global performance.Mathematical Programming, 108(1):177–205, 2006. URL https://doi.org/ 10.1007/s10107-006-0706-8
-
[55]
J. Nocedal and S. J. Wright.Numerical Optimization. Springer, 2nd edition, 2006. URL https://doi.org/10.1007/978-0-387-40065-5
-
[56]
N. H. Pham, L. M. Nguyen, D. T. Phan, and Q. Tran-Dinh. ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization.Journal of Machine Learning Research, 21(110):1–48, 2020. URL http://jmlr.org/papers/v21/19-248.html
2020
-
[57]
S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola. Stochastic variance reduction for nonconvex optimization. In M. F. Balcan and K. Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 ofProceedings of Machine Learning Research, pages 314–323, New York, New York, USA, 20–22 Jun 2016. PMLR. URLhttps://...
2016
-
[58]
On information and sufficiency,
H. Robbins and S. Monro. A stochastic approximation method.The Annals of Mathematical Statistics, 22(3):400–407, 1951. URLhttps://doi.org/10.1214/aoms/1177729586
-
[59]
Roulet and A
V. Roulet and A. d'Aspremont. Sharpness, restart and acceleration. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/ file/2ca65f58e35d9ad45bf7f3ae5cfd...
2017
-
[60]
Shalev-Shwartz
S. Shalev-Shwartz. Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 03 2012. ISSN 1935-8237. URL https://doi.org/10. 1561/2200000018
2012
-
[61]
D. B. Thomsen and N. Doikov. Complexity of minimizing regularized convex quadratic functions.arXiv preprint arXiv:2404.17543, 2024
arXiv 2024
-
[62]
K. Ueda and N. Yamashita. On a global complexity bound of the Levenberg–Marquardt method.Journal of Optimization Theory and Applications, 147(3):443–453, 2010. URL https://doi.org/10.1007/s10957-010-9731-0
-
[63]
Z. Wang, K. Ji, Y. Zhou, Y. Liang, and V. Tarokh. SpiderBoost and momentum: Faster variance reduction algorithms. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e- Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32, pages 2406–2416. Curran Associates, Inc., 2019. URL https://proceedings. neurips.cc/...
2019
-
[64]
R. Ward, X. Wu, and L. Bottou. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes.Journal of Machine Learning Research, 21(219):1–30, 2020. URL http: //jmlr.org/papers/v21/18-352.html
2020
-
[65]
S. Xiong, S. Jerad, and C. Cartis. A parameter-free first-order algorithm for non-convex optimization with ˜O(ε−5/3) global rate.arXiv preprint arXiv:2605.02127, 2026
Pith/arXiv arXiv 2026
-
[66]
S. Yagishita and M. Ito. Simple linesearch-free first-order methods for nonconvex optimiza- tion.arXiv preprint arXiv:2509.14670, 2025
arXiv 2025
-
[67]
J. Yang, X. Li, I. Fatkhullin, and N. He. Two sides of one coin: the limits of untuned SGD and the power of adaptive methods. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural 27 Information Processing Systems, volume 36, pages 74257–74288. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/p...
2023
-
[68]
Z. Ye, S. Ma, J. Yang, and D. Zhou. A simple adaptive proximal gradient method for nonconvex optimization.arXiv preprint arXiv:2510.06079, 2025
arXiv 2025
-
[69]
R. Zhao and J. Fan. Global complexity bound of the Levenberg–Marquardt method. Optimization Methods and Software, 31(4):805–814, 2016. URL https://doi.org/10. 1080/10556788.2016.1179737. 28
arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.