INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization
Pith reviewed 2026-05-18 04:03 UTC · model grok-4.3
The pith
INTHOP reuses a positive definite Hessian approximation within a bounded interval to guarantee descent directions and global convergence while inverting the matrix only at selected iterations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
INTHOP computes a positive definite approximation to the Hessian at certain iterations and reuses it for subsequent steps provided the current point remains inside a chosen interval around the previous point; the difference between this approximation and the exact Hessian is proven to lie within a controlled bound, which guarantees that the resulting search direction is a descent direction, and the overall algorithm is shown to converge globally to a stationary point.
What carries the argument
Interval-bounded positive definite Hessian approximation that is reused while iterates remain inside the interval, with the approximation error bounded to preserve descent.
If this is right
- Full Hessian evaluation and inversion occur only when iterates exit the current interval.
- Global convergence holds across variants that differ in interval-size updating or minimum-eigenvalue computation.
- The method solves more test problems with fewer function and gradient evaluations than steepest descent or quasi-Newton approaches.
- Total O(n^3) work is substantially lower than Newton because expensive operations are avoided on most iterations.
Where Pith is reading between the lines
- If interval-updating rules keep iterates inside the interval for long stretches on typical problems, the approach could scale to larger dimensions than standard Newton methods.
- The same bounded-reuse idea could be combined with trust-region or cubic-regularization frameworks to further control step quality.
- Problems with rapid movement across the domain would expose how frequently the method triggers full Hessian recomputations in practice.
Load-bearing premise
That the iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations.
What would settle it
A nonconvex test problem on which the method either diverges or requires full Hessian recomputation and inversion at nearly every iteration for all interval-updating variants.
Figures
read the original abstract
Second-order Newton-type algorithms that leverage the exact Hessian or its approximation are central to solve nonlinear optimization problems. However, their applications in solving large-scale nonconvex problems are hindered by three primary challenges: (1) the high computational cost associated with Hessian evaluations, (2) its inversion, and (3) ensuring descent direction at points where the Hessian becomes indefinite. We propose INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems to deal with these primary challenges. The proposed search direction is based on approximating the original Hessian matrix by a positive definite matrix. The novelty of the proposed method is that the proposed search direction is guaranteed to be descent and requires approximation of Hessian and its inversion only at specific iterations. We prove that the difference between the calculated approximate and exact Hessian is bounded within an interval. Accordingly, the approximate Hessian matrix is reused if the iterates are in that chosen interval while computing the gradients at each iteration. We develop various algorithm variants based on the interval size updating methods and minimum eigenvalue computation methods. We also prove the global convergence of the proposed algorithm. Further, we apply the algorithm to an extensive set of test problems and compare its performance with the existing methods such as steepest descent, quasi-Newton, and Newton method. We show empirically that the proposed method solves more problems in fewer function and gradient evaluations than steepest descent and the quasi-Newton method. While in the comparison to the Newton method, we illustrate that for nonconvex optimization problems, we require substantially less $O(n^3)$ operations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems. The search direction approximates the Hessian by a positive definite matrix, with the approximation and its inversion performed only at specific iterations when the current iterate exits a dynamically maintained interval around the last evaluation point; otherwise the same approximate Hessian is reused while gradients are evaluated at each step. The manuscript claims to prove that the difference between the approximate and exact Hessian remains bounded within the interval, that the resulting direction is always a descent direction, and that the algorithm converges globally. Various algorithm variants are developed based on different interval-size updating rules and minimum-eigenvalue computation methods. Empirical comparisons on test problems show that INTHOP solves more problems with fewer function and gradient evaluations than steepest descent and quasi-Newton methods, while requiring substantially fewer O(n³) operations than the full Newton method for nonconvex problems.
Significance. If the bounded-error and global-convergence claims hold, the approach offers a principled way to amortize the dominant cubic cost of second-order methods by exploiting local constancy of the Hessian within intervals, which could be practically relevant for large-scale nonconvex optimization. The explicit construction of positive-definite approximations that guarantee descent and the development of multiple interval-updating variants are constructive contributions. The empirical results on an extensive test set provide concrete evidence of reduced evaluation counts relative to first-order and quasi-Newton baselines.
major comments (2)
- [algorithm description and variants] Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.
- [global convergence proof] Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.
minor comments (3)
- [algorithm presentation] The manuscript would benefit from a single pseudocode listing that clearly distinguishes the full-Hessian step from the reuse step and shows how the interval is updated after each exit.
- [numerical results] In the numerical experiments section, explicit details on the selection of test problems, any exclusion criteria, and the precise definition of “solves more problems” (e.g., success tolerance and maximum iteration limit) should be provided for reproducibility.
- [empirical comparisons] A brief comparison table reporting average number of Hessian evaluations (or O(n³) operations) across methods, rather than only qualitative statements, would strengthen the efficiency claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We respond point by point to the major comments below.
read point-by-point responses
-
Referee: Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.
Authors: We agree that a quantitative bound on exit frequency would provide stronger support for the efficiency claims. The manuscript proves that the approximation error is bounded whenever the iterate remains inside the interval and that recomputation occurs on exit, preserving positive-definiteness and descent at every step; global convergence follows from this per-iteration guarantee rather than from any frequency bound. Practical performance is supported by the reported numerical comparisons showing fewer O(n³) operations than Newton on the test set. We will add a clarifying discussion of the interval-updating variants and their adaptation to local curvature, while noting that a worst-case exit-frequency analysis lies outside the current scope. revision: partial
-
Referee: Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.
Authors: The proof already relies on the fact that exit from the interval immediately triggers a new positive-definite approximation whose error is again bounded, restoring the descent property before the line search is performed. To make this interaction fully explicit, we will revise the convergence section to state the logical sequence at each iteration: (i) test whether the current point lies inside the maintained interval, (ii) recompute and invert the approximate Hessian if an exit has occurred, (iii) confirm the resulting direction is a descent direction, and (iv) execute the line search with the current approximation before updating the interval. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper constructs INTHOP by defining an interval around a Hessian evaluation point, proving a bound on the difference between the exact Hessian and the positive-definite approximation when iterates remain inside that interval, and reusing the approximation (with only gradient evaluations) inside the interval. Global convergence follows from a standard line-search argument that uses the guaranteed descent property of the positive-definite direction. These steps rely on external regularity assumptions (e.g., continuity or Lipschitz properties of the Hessian) rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. The interval-size updating rules are presented as explicit algorithmic variants whose correctness is independent of the convergence claim itself.
Axiom & Free-Parameter Ledger
free parameters (2)
- interval size
- minimum eigenvalue computation method
axioms (1)
- domain assumption Standard assumptions for global convergence proofs in nonconvex optimization (e.g., continuous differentiability, bounded level sets or Lipschitz gradients).
Reference graph
Works this paper leans on
-
[1]
Adjiman, C. S., Androulakis, I. P., and Floudas, C. A. (1998a). A global optimization method, αbb, for general twice-differentiable constrained nlps—ii. implementation and computational results. Computers & chemical engineering, 22(9):1159–1179
-
[2]
Adjiman, C. S., Dallwig, S., Floudas, C. A., and Neumaier, A. (1998b). A global optimization method,αbb, for general twice-differentiable constrained nlps—i. theoretical advances.Computers & Chemical Engineering, 22(9):1137–1158
-
[3]
Adjiman, C. S. and Floudas, C. A. (1996). Rigorous convex underestimators for general twice- differentiable problems.Journal of Global Optimization, 9:23–40
work page 1996
-
[4]
Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics
Adjiman, C.S.andPapamichail, I.(2004). Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics. In Floudas, C. A. and Pardalos, P., editors,Frontiers in Global Optimization, pages 1–23, Boston, MA. Springer US
work page 2004
-
[5]
Al-Khayyal, F. A. and Falk, J. E. (1983). Jointly constrained biconvex programming.Mathematics of Operations Research, 8(2):273–286
work page 1983
-
[6]
Allman, A., Tang, W., andDaoutidis, P.(2019). Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084
work page 2019
-
[7]
Androulakis, I. P., Maranas, C. D., and Floudas, C. A. (1995).αbb: A global optimization method for general constrained nonconvex problems.Journal of Global Optimization, 7:337–363
work page 1995
-
[8]
Bajaj, I. and Hasan, M. F. (2019). Unipopt: Univariate projection-based optimization without derivatives.Computers & Chemical Engineering, 127:71–87
work page 2019
-
[9]
Bajaj, I. and Hasan, M. F. (2020a). Deterministic global derivative-free optimization of black-box problems with bounded hessian.Optimization Letters, 14(4):1011–1026
-
[10]
Bajaj, I. and Hasan, M. F. (2020b). Global dynamic optimization using edge-concave underestima- tor.Journal of Global Optimization, 77(3):487–512
-
[11]
Bajaj, I., Iyer, S. S., and Hasan, M. F. (2018). A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point.Computers & Chemical Engineering, 116:306–321
work page 2018
-
[12]
Bauer, C., Frink, A., and Kreckel, R. (2002). Introduction to the ginac framework for symbolic computation within the c++ programming language.Journal of Symbolic Computation, 33(1):1–12
work page 2002
-
[13]
(2015).Convex optimization algorithms
Bertsekas, D. (2015).Convex optimization algorithms. Athena Scientific
work page 2015
-
[14]
Bollapragada, R., Byrd, R., and Nocedal, J. (2016). Exact and inexact subsampled newton methods for optimization
work page 2016
-
[15]
Boyd, S. and Vandenberghe, L. (2004).Convex optimization. Cambridge university press
work page 2004
-
[16]
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 2. the new algorithm.Ima Journal of Applied Mathematics, 6:222–231
work page 1970
-
[17]
Cartis, C., Gould, N. I., and Toint, P. L. (2011). Adaptive cubic regularisation methods for uncon- strained optimization. part i: motivation, convergence and numerical results.Mathematical Program- ming, 127(2):245–295
work page 2011
-
[18]
Chiang, N., Petra, C. G., and Zavala, V. M. (2014). Structured nonconvex optimization of large- scale energy systems using pips-nlp. In2014 Power Systems Computation Conference, pages 1–7. IEEE
work page 2014
-
[19]
Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015). The loss surfaces of multilayer networks. InArtificial intelligence and statistics, pages 192–204. PMLR. 22
work page 2015
-
[20]
Cole, D. L., Shin, S., and Zavala, V. M. (2022). A julia framework for graph-structured nonlinear optimization.Industrial & Engineering Chemistry Research, 61(26):9366–9380
work page 2022
-
[21]
Deif, A. (1991). The interval eigenvalue problem.ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 71(1):61–64
work page 1991
-
[22]
Doikov, N., Jaggi, M., et al. (2023). Second-order optimization with lazy hessians. InInternational Conference on Machine Learning, pages 8138–8161. PMLR
work page 2023
-
[23]
Erdogdu, M. A. and Montanari, A. (2015). Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, 28
work page 2015
-
[24]
Fletcher, R. (1970). A new approach to variable metric algorithms.Comput. J., 13:317–322
work page 1970
-
[25]
Gebreslassie, B. H., Yao, Y., and You, F. (2012). Design under uncertainty of hydrocarbon biore- finery supply chains: multiobjective stochastic programming models, decomposition algorithm, and a comparison between cvar and downside risk.AIChE Journal, 58(7):2155–2179
work page 2012
-
[26]
Ghadimi, S. and Lan, G. (2013). Accelerated gradient methods for nonconvex nonlinear and stochas- tic programming.Mathematical Programming, 156
work page 2013
-
[27]
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means.Mathematics of Computation, 24:23–26
work page 1970
-
[28]
Goualard, F. (2005). Gaol: Not just another interval library.University of Nantes, France
work page 2005
-
[29]
Gratton, S., Jerad, S., and Toint, P. L. (2025). Yet another fast variant of newton’s method for nonconvex optimization.IMA Journal of Numerical Analysis, 45(2):971–1008
work page 2025
-
[30]
Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org
work page 2010
-
[31]
Hasan, M. F. (2018). An edge-concave underestimator for the global optimization of twice- differentiable nonconvex problems.Journal of Global Optimization, 71(4):735–752
work page 2018
-
[32]
Kang, J., Cao, Y., Word, D. P., and Laird, C. D. (2014). An interior-point method for efficient solu- tion of block-structured nlp problems using an implicit schur-complement decomposition.Computers & Chemical Engineering, 71:563–573
work page 2014
-
[33]
Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients
Karimireddy, S. P., Stich, S. U., and Jaggi, M. (2018). Global linear convergence of newton’s method without strong-convexity or lipschitz gradients.CoRR, abs/1806.00413
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[34]
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[35]
Li, C. and Grossmann, I. E. (2019). A generalized benders decomposition-based branch and cut algorithm for two-stage stochastic programs with nonconvex constraints and mixed-binary first and second stage variables.Journal of Global Optimization, 75(2):247–272
work page 2019
-
[36]
Liu, Y., Gao, Y., and Yin, W. (2020). An improved analysis of stochastic gradient descent with momentum.Advances in Neural Information Processing Systems, 33:18261–18271
work page 2020
-
[37]
Maranas, C. D. and Floudas, C. A. (1995). Finding all solutions of nonlinearly constrained systems of equations.Journal of Global Optimization, 7:143–182
work page 1995
-
[38]
McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175
work page 1976
-
[39]
McMahan, H. B. and Streeter, M. (2010). Adaptive bound optimization for online convex optimiza- tion.arXiv preprint arXiv:1002.4908
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[40]
Meyer, C. A. and Floudas, C. A. (2004a). Trilinear monomials with mixed sign domains: Facets of the convex and concave envelopes.Journal of Global Optimization, 29:125–155
-
[41]
Meyer, C. A. and Floudas, C. A. (2004b). Trilinear monomials with positive or negative domains: Facets of the convex and concave envelopes. InFrontiers in global optimization, pages 327–352. Springer. 23
-
[42]
Meyer, C. A. and Floudas, C. A. (2005). Convex envelopes for edge-concave functions.Mathematical programming, 103:207–224
work page 2005
-
[43]
Mitrai, I. and Daoutidis, P. (2021). Efficient solution of enterprise-wide optimization problems using nested stochastic blockmodeling.Industrial & Engineering Chemistry Research, 60(40):14476–14494
work page 2021
-
[44]
Mitsos, A., Chachuat, B., and Barton, P. I. (2009). Mccormick-based relaxations of algorithms. SIAM Journal on Optimization, 20(2):573–601
work page 2009
-
[45]
Mitsos, A., Lemonidis, P., and Barton, P. I. (2008). Global solution of bilevel programs with a nonconvex inner program.Journal of Global Optimization, 42:475–513
work page 2008
- [46]
-
[47]
Moore, R. E., Kearfott, R. B., and Cloud, M. J. (2009).Introduction to interval analysis. SIAM
work page 2009
-
[48]
Moré, J. J. and Wild, S. M. (2009). Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191
work page 2009
-
[49]
Mori, T. and Kokame, H. (1994). Eigenvalue bounds for a certain class of interval matrices.IEICE transactions on fundamentals of electronics, communications and computer sciences, 77(10):1707– 1709
work page 1994
-
[50]
Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate o(1/k2)
work page 1983
-
[51]
Nesterov, Y. et al. (2018).Lectures on convex optimization, volume 137. Springer
work page 2018
-
[52]
Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of newton method and its global performance.Mathematical programming, 108(1):177–205
work page 2006
- [53]
-
[54]
Pacaud, F., Schanen, M., Shin, S., Maldonado, D. A., and Anitescu, M. (2024). Parallel interior- point solver for block-structured nonlinear programs on simd/gpu architectures.Optimization Methods and Software, 39(4):874–897
work page 2024
-
[55]
Pacaud, F. and Shin, S. (2024). Gpu-accelerated dynamic nonlinear optimization with examodels and madnlp. In2024 IEEE 63rd Conference on Decision and Control (CDC), pages 5963–5968. IEEE
work page 2024
-
[56]
Puranik, Y. and Sahinidis, N. (2017). Bounds tightening based on optimality conditions for non- convex box-constrained optimization.Journal of Global Optimization, 67
work page 2017
-
[57]
Qian, N. (1999). On the momentum term in gradient descent learning algorithms.Neural networks, 12(1):145–151
work page 1999
-
[58]
Robbins, H. and Monro, S. (1951). A stochastic approximation method.The annals of mathematical statistics, pages 400–407
work page 1951
-
[59]
Rohn, J. (1998). Bounds on eigenvalues of interval matrices.ZAMM-Zeitschrift fur Angewandte Mathematik und Mechanik, 78(3):S1049
work page 1998
-
[60]
Roosta-Khorasani, F. and Mahoney, M. W. (2016a). Sub-sampled newton methods i: Globally convergent algorithms
-
[61]
Roosta-Khorasani, F. and Mahoney, M. W. (2016b). Sub-sampled newton methods ii: Local con- vergence rates
-
[62]
Roosta-Khorasani, F. and Mahoney, M. W. (2019). Sub-sampled newton methods.Mathematical Programming, 174:293–326
work page 2019
-
[63]
Ryoo, H. S. and Sahinidis, N. V. (2001). Analysis of bounds for multilinear functions.Journal of Global Optimization, 19(4):403–424. 24
work page 2001
-
[64]
Scott, J. K., Stuber, M. D., and Barton, P. I. (2011). Generalized mccormick relaxations.Journal of Global Optimization, 51(4):569–606
work page 2011
-
[65]
Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization.Mathe- matics of Computation, 24:647–656
work page 1970
-
[66]
(2004).On the existence of polyhedral convex envelopes
Tardella, F. (2004).On the existence of polyhedral convex envelopes. Springer
work page 2004
-
[67]
Tawarmalani, M. and Sahinidis, N. V. (2001). Semidefinite relaxations of fractional programs via novel convexification techniques.Journal of Global Optimization, 20:133–154
work page 2001
-
[68]
Tawarmalani, M. and Sahinidis, N. V. (2002). Convex extensions and envelopes of lower semi- continuous functions.Mathematical Programming, 93(2):247–263
work page 2002
-
[69]
Yoshio, N. and Biegler, L. T. (2021). A nested schur decomposition approach for multiperiod optimization of chemical processes.Computers & Chemical Engineering, 155:107509
work page 2021
-
[70]
Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. 25 Appendix A. Proofs This appendix provides proofs of Theorem 1 and Lemma 1 stated in Section 3.1 and 3.4.2, respectively. We prove the following results stated in Theorem 1: •Function value bound: |L(xt)−f(x k)| ≤ Lf 2 √nδ+ |λmin| 8 nδ2 •Gradient bound: ∥∇L(xt)− ∇f(x k)∥ ≤ Lg 2 √nδ+ |λmin...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.