A survey of trust-region radius update mechanisms. Part I: First-order analysis
Pith reviewed 2026-06-30 05:04 UTC · model grok-4.3
The pith
Three structural conditions on trust-region radius updates define weak and strong admissibility that guarantee first-order convergence and optimal complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By isolating a lower bound relative to the gradient norm, a contraction on unsuccessful iterations, and a controlled expansion on successful iterations, the authors define weakly admissible mechanisms that guarantee lim ||∇f(x_k)|| = 0 and strongly admissible ones that achieve O(ε^{-2}) complexity for first-order stationarity, with the strong version extending to linearly growing Hessians. They verify these properties for fixed-factor, step-driven, retrospective, criticality-anchored, and gradient-scaled mechanisms, while extending prior frameworks to additional scaling factors with decoupled step acceptance.
What carries the argument
Weak admissibility (gradient-norm lower bound plus contraction on unsuccessful iterations) and strong admissibility (gradient-norm lower bound plus controlled expansion on successful iterations) for radius update rules.
If this is right
- Weak admissibility implies that the gradient norm converges to zero under uniformly bounded model Hessians.
- Strong admissibility yields the optimal worst-case complexity O(ε^{-2}) for first-order stationarity.
- Strong admissibility extends the convergence result to the case of linearly growing model Hessians.
- The five listed mechanism families satisfy the relevant admissibility conditions.
- Retrospective updates converge even when model Hessians grow linearly.
Where Pith is reading between the lines
- The separation between weak and strong conditions indicates that convergence can be obtained with milder expansion rules than those needed for optimal complexity.
- The classification may help construct new radius update rules that inherit the theoretical guarantees without case-by-case analysis.
- Extending the framework to linearly growing Hessians suggests that similar radius conditions could be useful for problems where curvature information increases mildly over iterations.
Load-bearing premise
Model Hessians remain uniformly bounded or grow at most linearly with the iteration counter.
What would settle it
A radius update rule that meets the three structural conditions yet produces a sequence where the gradient norm stays above some fixed positive value for infinitely many iterations, or requires more than order ε^{-2} iterations to reach first-order stationarity.
Figures
read the original abstract
We isolate three structural conditions on trust-region radius update rules for smooth unconstrained nonlinear optimisation, and study the class of mechanisms they define. The conditions act on the radius directly: a lower bound relative to the gradient norm, a contraction on unsuccessful iterations, and a controlled expansion on successful ones. A mechanism is \emph{weakly admissible} if it satisfies the first two conditions, and \emph{strongly admissible} if it satisfies the lower bound together with the controlled-expansion condition. Under uniformly bounded model Hessians, weak admissibility yields $\lim_{k\to\infty}\|\nabla f(x_k)\|=0$, and strong admissibility yields the optimal worst-case complexity $O(\epsilon^{-2})$ for first-order stationarity. Strong admissibility extends the convergence guarantee to linearly growing model Hessians. We verify admissibility for five mechanism classes: fixed-factor, step-driven, retrospective, criticality-anchored, and gradient-scaled. Along the way, we prove convergence of the retrospective update under linearly growing model Hessians and revisit the framework of Curtis and Scheinberg (2020), and Wang and Yuan (2022): we extend it to three distinct scaling factors with decoupled step acceptance (covering $\eta = 0$), and specialise its stochastic version to the deterministic gradient-scaled
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper isolates three structural conditions on trust-region radius update rules for smooth unconstrained nonlinear optimisation: a lower bound on the radius relative to the gradient norm, contraction on unsuccessful iterations, and controlled expansion on successful ones. Weak admissibility (satisfying the lower bound and contraction) yields lim_{k→∞}‖∇f(x_k)‖=0 under uniformly bounded model Hessians; strong admissibility (lower bound plus controlled expansion) yields the optimal O(ε^{-2}) worst-case complexity for first-order stationarity. Strong admissibility extends the convergence guarantee to linearly growing model Hessians. The paper verifies admissibility for five mechanism classes (fixed-factor, step-driven, retrospective, criticality-anchored, gradient-scaled), proves convergence of the retrospective update under linear Hessian growth, and extends the Curtis-Scheinberg (2020) and Wang-Yuan (2022) frameworks to three scaling factors with decoupled step acceptance (including η=0) while specialising the stochastic version to the deterministic gradient-scaled case.
Significance. If the stated results hold, the manuscript supplies a clean structural characterisation of radius-update rules that isolates the minimal requirements for global convergence and optimal first-order complexity. Credit is due for the explicit verification across five distinct classes, the extension of convergence to linearly growing Hessians, and the generalisation of prior frameworks to decoupled acceptance and deterministic gradient scaling. These contributions clarify the design space for trust-region methods and may facilitate both theoretical comparisons and practical implementations.
minor comments (3)
- [Abstract / Introduction] The abstract refers to 'Part I: First-order analysis' but the manuscript provides no explicit statement of the intended scope or open questions for Part II; adding a short forward-looking paragraph would improve context.
- [Section 4] The five mechanism classes are introduced and verified separately; a single comparative table summarising which structural conditions each class satisfies (with references to the relevant lemmas) would enhance readability.
- [Section 2] Notation for the radius-update parameters (e.g., the contraction factor γ, expansion factor α) is introduced inline; collecting the parameter ranges and their roles in a preliminary notation table would reduce repetition across the admissibility proofs.
Simulated Author's Rebuttal
We thank the referee for their detailed and positive summary of our manuscript, as well as for the recognition of its contributions to the structural characterisation of trust-region radius updates, the verification across mechanism classes, and the extensions of prior frameworks. We appreciate the recommendation for minor revision. No major comments were provided in the report, so we have no points requiring response or revision at this time.
Circularity Check
No significant circularity; derivations are self-contained
full rationale
The paper isolates three explicit structural conditions on radius updates, defines weak/strong admissibility directly from them, and derives lim ||∇f||=0 and O(ε^{-2}) complexity from those conditions plus standard bounded/linear-growth Hessian assumptions. Verification for the five mechanism classes consists of direct substitution into the conditions. Cited prior frameworks (Curtis-Scheinberg 2020, Wang-Yuan 2022) are by non-overlapping authors and are extended rather than used as load-bearing self-justification. No equation reduces to a fitted parameter renamed as prediction, no self-definition of the target result, and no uniqueness theorem imported from the authors' own prior work.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The objective function f is continuously differentiable (smooth).
- domain assumption Model Hessians are uniformly bounded or grow linearly with iteration index.
Reference graph
Works this paper leans on
-
[1]
A. S. Bandeira, K. Scheinberg, and L. N. Vicente. Convergence of Trust-Region Methods Based on Probabilistic Models.SIAM Journal on Optimization, 24(3):1238–1264, 2014. doi: 10.1137/130915984
-
[2]
F. Bastin, V. Malmedy, M. Mouffe, P. L. Toint, and D. Tomanos. A retrospective trust-region method for un- constrained optimization.Mathematical Programming, 123(2):395–418, 2010. doi: 10.1007/s10107-008-0258-1
-
[3]
E. G. Birgin, J. L. Gardenghi, J. M. Mart´ ınez, S. A. Santos, and P. L. Toint. Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models.Mathematical Programming, 163 (1):359–368, 2017. doi: 10.1007/s10107-016-1065-8
-
[4]
J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence Rate Analysis of a Stochastic Trust Region Method via Submartingales.INFORMS Journal on Optimization, 1(2):92–119, 2019. doi: 10.1287/ ijoo.2019.0016
arXiv 2019
-
[5]
R. Bollapragada, R. Byrd, and J. Nocedal. Adaptive sampling strategies for stochastic optimization.SIAM Journal on Optimization, 28(4):3312–3343, 2018. doi: 10.1137/17m1154679
-
[6]
R. Bollapragada, R. H. Byrd, and J. Nocedal. Exact and inexact subsampled Newton methods for optimization. IMA Journal of Numerical Analysis, 39(2):545–578, 2018. doi: 10.1093/imanum/dry009. 26 (i)AM.2:∥H k∥ ≤κ umh (ii)Condition C.2: For someη dec ∈(0 ; 1), ∆k+1 ≤γ 2∆k ifρ k < η dec (i)AM.3:∥H k∥ ≤M k withP k 1 Mk = +∞ (ii)Condition C.3: For someη inc ∈(0...
-
[7]
doi:10.1137/16M1080173 , eprint =
L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale Machine Learning.SIAM Review, 60(2):223–311, 2018. doi: 10.1137/16m1080173
-
[8]
R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu. Sample size selection in optimization methods for Machine Learning.Mathematical Programming, 134(1):127–155, 2012. doi: 10.1007/s10107-012-0572-5
-
[9]
R. G. Carter. On the global convergence of trust region algorithms using inexact gradient information.SIAM Journal on Numerical Analysis, 28:251–265, 1991. doi: 10.1137/0728014
-
[10]
Cartis and K
C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models.Mathematical Programming, 169(2):337–375, 2018
2018
-
[11]
C. Cartis, N. I. M. Gould, and P. L. Toint. On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems.SIAM Journal on Optimization, 20 (6):2833–2852, 2010. doi: 10.1137/090774100
-
[12]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained opti- mization. Part I: motivation, convergence and numerical results.Mathematical Programming, 127(2):245–295,
-
[13]
doi: 10.1007/s10107-009-0286-5
-
[14]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint. On the Evaluation Complexity of Composite Function Minimization with Applications to Nonconvex Nonlinear Programming.SIAM Journal on Optimization, 21(4):1721–1739,
-
[15]
doi: 10.1137/11082381X. 27
-
[16]
C. Cartis, N. I. M. Gould, and P. L. Toint. Complexity bounds for second-order optimality in unconstrained optimization.Journal of Complexity, 28(1):93–108, 2012. doi: 10.1016/j.jco.2011.06.001
-
[17]
Cartis, N
C. Cartis, N. I. M. Gould, and P. L. Toint.Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives. Society for Industrial and Applied Mathematics, Philadelphia, PA,
-
[18]
doi: 10.1137/1.9781611976991
-
[19]
R. Chen, M. Menickelly, and K. Scheinberg. Stochastic optimization using a trust-region method and random models.Mathematical Programming, 169(2):447–487, 2018. doi: 10.1007/s10107-017-1141-8
-
[20]
A. R. Conn, N. I. M. Gould, and P. L. Toint.Trust-Region Methods. MOS-SIAM Series on Optimization. SIAM, Philadelphia, PA, USA, 2000. doi: 10.1137/1.9780898719857
-
[21]
A. R. Conn, K. Scheinberg, and L. N. Vicente. Global Convergence of General Derivative-Free Trust-Region Algorithms to First- and Second-Order Critical Points.SIAM Journal on Optimization, 20(1):387–415, 2009. doi: 10.1137/060673424
-
[22]
A. R. Conn, K. Scheinberg, and L. N. Vicente.Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, USA, 2009
2009
-
[23]
F. E. Curtis and K. Scheinberg. Adaptive Stochastic Optimization: A Framework for Analyzing Stochastic Optimization Algorithms.IEEE Signal Processing Magazine, 37(5):32–42, 2020. doi: 10.1109/MSP.2020. 3003539
-
[24]
F. E. Curtis, D. P. Robinson, and M. Samadi. A Trust Region Algorithm with a Worst-Case Iteration Complexity ofO(ϵ −3/2) for Nonconvex Optimization.Mathematical Programming, 162(1–2):1–32, 2017. doi: 10.1007/s10107-016-1026-2
-
[25]
F. E. Curtis, Z. Lubberts, and D. P. Robinson. Concise complexity analyses for trust region methods.Opti- mization Letters, 12(8):1713–1724, 2018. doi: 10.1007/s11590-018-1286-2
-
[26]
F. E. Curtis, D. P. Robinson, C. W. Royer, and S. J. Wright. Trust-Region Newton-CG with Strong Second- Order Complexity Guarantees for Nonconvex Optimization.SIAM Journal on Optimization, 31(1):518–544,
-
[27]
doi: 10.1137/19M130563X
-
[28]
I. Ekeland. On the variational principle.Journal of Mathematical Analysis and Applications, 47(2):324–353,
-
[29]
doi: 10.1016/0022-247X(74)90025-0
-
[30]
J. Fan. Convergence Rate of The Trust Region Method for Nonlinear Equations Under Local Error Bound Con- dition.Computational Optimization and Applications, 34(2):215–227, 2006. doi: 10.1007/s10589-005-3078-8
-
[31]
J. Fan and N. Lu. On the modified trust region algorithm for nonlinear equations.Optimization Methods and Software, 30(3):478–491, 2015. doi: 10.1080/10556788.2014.932943. Publisher: Taylor & Francis
-
[32]
J. Fan and J. Pan. An improved trust region algorithm for nonlinear equations.Computational Optimization and Applications, 48(1):59–70, 2011. doi: 10.1007/s10589-009-9236-7
-
[33]
J. Fan, J. Pan, and S. Hongyan. A retrospective trust region algorithm with trust region converging to zero. Journal of Computational Mathematics, 34(4):421–436, 2016. doi: 10.4208/jcm.1601-m2015-0399
-
[34]
M. P. Friedlander and M. Schmidt. Hybrid deterministic-stochastic methods for data fitting.SIAM Journal on Scientific Computing, 34(3):A1380–A1405, 2012. doi: 10.1137/110830629
-
[35]
R. Garmanjani, D. J´ udice, and L. N. Vicente. Trust-Region Methods Without Using Derivatives: Worst Case Complexity and the NonSmooth Case.SIAM Journal on Optimization, 26(4):1987–2011, 2016. doi: 10.1137/151005683
-
[36]
N. I. M. Gould, D. Orban, A. Sartenaer, and P. L. Toint. Sensitivity of trust-region algorithms to their parameters.4OR, 3(3):227–241, 2005. doi: 10.1007/s10288-005-0065-y
-
[37]
G. N. Grapiglia, J. Yuan, and Y.-x. Yuan. On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization.Mathematical Programming, 152(1):491–520, 2015. doi: 10.1007/s10107-014-0794-9. 28
-
[38]
G. N. Grapiglia, J. Yuan, and Y.-x. Yuan. Nonlinear Stepsize Control Algorithms: Complexity Bounds for First- and Second-Order Optimality.Journal of Optimization Theory and Applications, 171(3):980–997, 2016. doi: 10.1007/s10957-016-1007-x
-
[39]
S. Gratton, A. Sartenaer, and P. L. Toint. Recursive Trust-Region Methods for Multiscale Nonlinear Opti- mization.SIAM Journal on Optimization, 19(1):414–444, 2008. doi: 10.1137/050623012
-
[40]
S. Gratton, C. Royer, L. N. Vicente, and Z. Zhang. Complexity and global rates of trust-region methods based on probabilistic models.IMA Journal of Numerical Analysis, 38(3):1579–1597, 2017. doi: 10.1093/imanum/ drx043. eprint: https://academic.oup.com/imajna/article-pdf/38/3/1579/25170882/drx043.pdf
-
[41]
L. Hei. A self-adaptive trust region algorithm.Journal of Computational Mathematics, 21(2):229–236, 2003. Publisher: Institute of Computational Mathematics and Scientific/Engineering Computing
2003
-
[42]
B. Jin, K. Scheinberg, and M. Xie. High probability complexity bounds for adaptive step search based on stochastic oracles.SIAM Journal on Optimization, 34(3):2411–2439, 2024. doi: 10.1137/22M1512764
-
[43]
B. Jin, K. Scheinberg, and M. Xie. Sample complexity analysis for adaptive optimization algorithms with stochastic oracles.Mathematical Programming, 209(1):651–679, 2025. doi: 10.1007/s10107-024-02078-z
-
[44]
Levenberg
K. Levenberg. A method for the solution of certain non-linear problems in least squares.Quarterly of Applied Mathematics, 2(2):164–168, 1944. Publisher: Brown University
1944
-
[45]
D. W. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters.Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963. doi: 10.1137/0111030
-
[46]
J. J. Mor´ e. Recent Developments in Algorithms and Software for Trust Region Methods. In A. Bachem, B. Korte, and M. Gr¨ otschel, editors,Mathematical Programming The State of the Art: Bonn 1982, pages 258–287. Springer Berlin Heidelberg, Berlin, Heidelberg, 1983. doi: 10.1007/978-3-642-68874-4 11
-
[47]
J. J. Mor´ e and D. C. Sorensen. Computing a Trust Region Step.SIAM Journal on Scientific and Statistical Computing, 4(3):553–572, 1983. doi: 10.1137/0904038
-
[48]
Nesterov.Introductory Lectures on Convex Optimization: A Basic Course
Y. Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1st edition, 2014. ISBN 1461346916
2014
-
[49]
Nesterov.Lectures on Convex Optimization
Y. Nesterov.Lectures on Convex Optimization. Springer Optimization and Its Applications. Springer Cham, 2nd edition, 2018. ISBN 978-3-319-91577-7. doi: https://doi.org/10.1007/978-3-319-91578-4
-
[50]
Y. Nesterov and B. Polyak. Cubic regularization of Newton method and its global performance.Mathematical Programming, 108(1):177–205, 2006. doi: 10.1007/s10107-006-0706-8
-
[51]
J. Nocedal and S. J. Wright.Numerical Optimization. Springer Series in Operations Research and Financial En- gineering. Springer New York, NY, New York, NY, USA, second edition, 2006. doi: 10.1007/978-0-387-40065-5
-
[52]
C. Paquette and K. Scheinberg. A stochastic Line-Search method with expected complexity analysis.SIAM Journal on Optimization, 30(1):349–376, 2020. doi: 10.1137/18M1216250
-
[53]
M. Powell. UOBYQA: unconstrained optimization by quadratic approximation.Mathematical Programming, 92(3):555–582, 2002. doi: 10.1007/s101070100290
-
[54]
M. J. D. Powell. A New Algorithm for Unconstrained Optimization. In J. Rosen, O. Mangasarian, and K. Ritter, editors,Nonlinear Programming, pages 31–65. Academic Press, Jan. 1970. doi: 10.1016/B978-0-12-597050-1. 50006-3
-
[55]
M. J. D. Powell. Convergence properties of a class of minization algorithms. In O. Mangasarian, R. Meyer, and S. Robinson, editors,Nonlinear Programming 2, pages 1–27. Academic Press, Jan. 1975. doi: 10.1016/ B978-0-12-468650-2.50005-5
1975
-
[56]
M. J. D. Powell. On the global convergence of trust region algorithms for unconstrained minimization.Math- ematical Programming, 29(3):297–303, 1984. doi: 10.1007/BF02591998
-
[57]
Robbins and S
H. Robbins and S. Monro. A stochastic approximation method.The Annals of Mathematical Statistics, 22(3): 400–407, 1951. 29
1951
-
[58]
T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization.SIAM Journal on Numerical Analysis, 20(3):626–637, 1983. doi: 10.1137/0720042
-
[59]
P. L. Toint. On the Superlinear Convergence of an Algorithm for Solving a Sparse Minimization Problem. SIAM Journal on Numerical Analysis, 16(6):1036–1045, 1979. doi: 10.1137/0716076
-
[60]
P. L. Toint. Global convergence of a a of trust-region methods for nonconvex minimization in hilbert space. IMA Journal of Numerical Analysis, 8(2):231–252, 1988. doi: 10.1093/imanum/8.2.231
-
[61]
P. L. Toint. Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization. Optimization Methods and Software, 28(1):82–95, 2013. doi: 10.1080/10556788.2011.610458. Publisher: Taylor & Francis
-
[62]
X. Wang and Y. Yuan. Stochastic trust-region methods with trust-region radius depending on probabilistic models.Journal of Computational Mathematics, 40(2):294–334, 2022. doi: 10.4208/jcm.2012-m2020-0144
-
[63]
P. Xu, F. Roosta, and M. W. Mahoney. Newton-type methods for non-convex optimization under inexact Hessian information.Mathematical Programming, 184(1):35–70, 2020
2020
-
[64]
Z. Yao, P. Xu, F. Roosta, and M. W. Mahoney. Inexact Nonconvex Newton-Type Methods.INFORMS Journal on Optimization, 3(2):154–182, 2021
2021
-
[65]
Y. Yuan. On the truncated conjugate gradient method.Mathematical Programming, 87(3):561–573, 2000. doi: 10.1007/s101070050012
-
[66]
Y.-x. Yuan. Conditions for convergence of trust region algorithms for nonsmooth optimization.Mathematical Programming, 31(2):220–228, 1985. doi: 10.1007/BF02591750
-
[67]
Y.-x. Yuan. An example of non-convergence of trust region algorithms. In Y.-x. Yuan, editor,Advances in Nonlinear Programming: Proceedings of the 96 International Conference on Nonlinear Programming, pages 205–215. Springer US, Boston, MA, 1998. doi: 10.1007/978-1-4613-3335-7\ 9
-
[68]
Y.-x. Yuan. A review of trust region algorithms for optimization. In J. M. Ball and J. C. R. Hunt, editors, ICIAM99: Proceedings of the Fourth International Congress on Industrial & Applied Mathematics Edinburgh, page 0. Oxford University Press, Dec. 2000. doi: 10.1093/oso/9780198505143.003.0023
-
[69]
Y.-x. Yuan. Recent advances in trust region algorithms.Mathematical Programming, 151(1):249–281, 2015
2015
-
[70]
Yuan and J
Y.-x. Yuan and J. Fan. A new trust region algorithm with trust region radius converging to zero. InProceedings of the 5th International Conference on Optimization: Techniques and Applications, pages 786–794, 2001. A Example of a R-function As an illustration inspired from the example plot given in [36], one may defineR η1(t) to interpolate smoothly across...
2001
-
[71]
Consider such an iterationm≥0 of the subsequence such that∥g(x m)∥ ≥ε
This implies that lim sup k→∞ ∥gk∥ ≥εfor someε >0, and there exists a subsequence{x kj }such that ∥g(xkj)∥ ≥ε >0 for allj. Consider such an iterationm≥0 of the subsequence such that∥g(x m)∥ ≥ε. Whenxis sufficiently close tox m, then∥x−x m∥ ≤ ∥g(xm)∥ 2L .Then, from AF.2, we have ∥g(x)∥ ≥ ∥g(x m)∥ − ∥g(x m)−g(x)∥ ≥ 1 2 ∥g(xm)∥ ≥ ε 2 >0.(B.1) As lim inf k→∞ ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.