pith. sign in

arxiv: 2510.22342 · v4 · submitted 2025-10-25 · 🧮 math.OC

INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization

Pith reviewed 2026-05-18 04:03 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonconvex optimizationHessian approximationglobal convergencesecond-order methodsdescent directioninterval methodsNewton-type algorithms
0
0 comments X

The pith

INTHOP reuses a positive definite Hessian approximation within a bounded interval to guarantee descent directions and global convergence while inverting the matrix only at selected iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents INTHOP, an algorithm that approximates the Hessian by a positive definite matrix and reuses that approximation across iterations as long as the current point stays inside a controlled interval around the prior point. The authors prove that the approximation error remains bounded inside the interval, which ensures the search direction is always a descent direction even when the true Hessian is indefinite. They establish global convergence for the overall method and introduce variants that differ in how the interval size is updated or how the minimum eigenvalue is computed. Numerical comparisons on test problems show that INTHOP requires fewer function and gradient evaluations than steepest descent or quasi-Newton methods and substantially fewer O(n cubed) operations than the full Newton method on nonconvex instances. This combination of reduced cost and convergence guarantees addresses the main barriers to using second-order information on large-scale nonconvex problems.

Core claim

INTHOP computes a positive definite approximation to the Hessian at certain iterations and reuses it for subsequent steps provided the current point remains inside a chosen interval around the previous point; the difference between this approximation and the exact Hessian is proven to lie within a controlled bound, which guarantees that the resulting search direction is a descent direction, and the overall algorithm is shown to converge globally to a stationary point.

What carries the argument

Interval-bounded positive definite Hessian approximation that is reused while iterates remain inside the interval, with the approximation error bounded to preserve descent.

If this is right

  • Full Hessian evaluation and inversion occur only when iterates exit the current interval.
  • Global convergence holds across variants that differ in interval-size updating or minimum-eigenvalue computation.
  • The method solves more test problems with fewer function and gradient evaluations than steepest descent or quasi-Newton approaches.
  • Total O(n^3) work is substantially lower than Newton because expensive operations are avoided on most iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If interval-updating rules keep iterates inside the interval for long stretches on typical problems, the approach could scale to larger dimensions than standard Newton methods.
  • The same bounded-reuse idea could be combined with trust-region or cubic-regularization frameworks to further control step quality.
  • Problems with rapid movement across the domain would expose how frequently the method triggers full Hessian recomputations in practice.

Load-bearing premise

That the iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations.

What would settle it

A nonconvex test problem on which the method either diverges or requires full Hessian recomputation and inversion at nearly every iteration for all interval-updating variants.

Figures

Figures reproduced from arXiv: 2510.22342 by Ashutosh Sharma, Gauransh Dingwani, Ishan Bajaj, Krishan Kumar, Nikhil Gupta, Vaishnavi Gupta.

Figure 1
Figure 1. Figure 1: Graphical illustration of f(x), mN k (x) and mIH k (x) Computing α exactly over an interval using Eq. 3.8 is equivalent to solving a nonconvex optimization problem to global optimality, which could be more difficult than solving Eq. 1.1. Therefore, an efficient and scalable alternative is needed to obtain a reliable lower bound on the minimum eigenvalue over the interval. In the following subsection, we in… view at source ↗
Figure 2
Figure 2. Figure 2: Graphical illustration of INTHOP with fixed interval size (INTHOP:F) and Algorithm [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Paths taken by steepest descent, quasi-Newton, and INTHOP:F on illustrative example 1. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (A) Comparison of paths take by INTHOP:F and NEWTON:CAMI on illustrative example 2. (B) Function [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Fraction of problems solved by INTHOP:F variants with (A) gradient evaluations, (B) Hessian evaluations, (C) [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Data profile corresponding to fixed and adaptive [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Variation of function value and δ with iterations for (A) 10_s295 and (B) 2_gold. Overall, these results highlight the robustness of the adaptive interval strategies, particularly when used with the EM and MK α estimation methods. It consistently outperforms the fixed-interval ap￾proaches across all the performance metrics. 5.3. Comparison with Competing Algorithms Next, we compare our proposed algorithm w… view at source ↗
Figure 8
Figure 8. Figure 8: Fraction of problems solved by INTHOP with EM:A1 and MK:A1, steepest descent, quasi-Newton BFGS, and [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Fraction of problems solved by INTHOP variants and Newton-CAMI on problems requiring Hessian [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Function value with number of (A) function, (B) Hessian evaluations, and (C) CPU time. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Function value with (A) function, (B) Hessian evaluations, and (C) CPU time. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
read the original abstract

Second-order Newton-type algorithms that leverage the exact Hessian or its approximation are central to solve nonlinear optimization problems. However, their applications in solving large-scale nonconvex problems are hindered by three primary challenges: (1) the high computational cost associated with Hessian evaluations, (2) its inversion, and (3) ensuring descent direction at points where the Hessian becomes indefinite. We propose INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems to deal with these primary challenges. The proposed search direction is based on approximating the original Hessian matrix by a positive definite matrix. The novelty of the proposed method is that the proposed search direction is guaranteed to be descent and requires approximation of Hessian and its inversion only at specific iterations. We prove that the difference between the calculated approximate and exact Hessian is bounded within an interval. Accordingly, the approximate Hessian matrix is reused if the iterates are in that chosen interval while computing the gradients at each iteration. We develop various algorithm variants based on the interval size updating methods and minimum eigenvalue computation methods. We also prove the global convergence of the proposed algorithm. Further, we apply the algorithm to an extensive set of test problems and compare its performance with the existing methods such as steepest descent, quasi-Newton, and Newton method. We show empirically that the proposed method solves more problems in fewer function and gradient evaluations than steepest descent and the quasi-Newton method. While in the comparison to the Newton method, we illustrate that for nonconvex optimization problems, we require substantially less $O(n^3)$ operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems. The search direction approximates the Hessian by a positive definite matrix, with the approximation and its inversion performed only at specific iterations when the current iterate exits a dynamically maintained interval around the last evaluation point; otherwise the same approximate Hessian is reused while gradients are evaluated at each step. The manuscript claims to prove that the difference between the approximate and exact Hessian remains bounded within the interval, that the resulting direction is always a descent direction, and that the algorithm converges globally. Various algorithm variants are developed based on different interval-size updating rules and minimum-eigenvalue computation methods. Empirical comparisons on test problems show that INTHOP solves more problems with fewer function and gradient evaluations than steepest descent and quasi-Newton methods, while requiring substantially fewer O(n³) operations than the full Newton method for nonconvex problems.

Significance. If the bounded-error and global-convergence claims hold, the approach offers a principled way to amortize the dominant cubic cost of second-order methods by exploiting local constancy of the Hessian within intervals, which could be practically relevant for large-scale nonconvex optimization. The explicit construction of positive-definite approximations that guarantee descent and the development of multiple interval-updating variants are constructive contributions. The empirical results on an extensive test set provide concrete evidence of reduced evaluation counts relative to first-order and quasi-Newton baselines.

major comments (2)
  1. [algorithm description and variants] Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.
  2. [global convergence proof] Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.
minor comments (3)
  1. [algorithm presentation] The manuscript would benefit from a single pseudocode listing that clearly distinguishes the full-Hessian step from the reuse step and shows how the interval is updated after each exit.
  2. [numerical results] In the numerical experiments section, explicit details on the selection of test problems, any exclusion criteria, and the precise definition of “solves more problems” (e.g., success tolerance and maximum iteration limit) should be provided for reproducibility.
  3. [empirical comparisons] A brief comparison table reporting average number of Hessian evaluations (or O(n³) operations) across methods, rather than only qualitative statements, would strengthen the efficiency claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond point by point to the major comments below.

read point-by-point responses
  1. Referee: Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.

    Authors: We agree that a quantitative bound on exit frequency would provide stronger support for the efficiency claims. The manuscript proves that the approximation error is bounded whenever the iterate remains inside the interval and that recomputation occurs on exit, preserving positive-definiteness and descent at every step; global convergence follows from this per-iteration guarantee rather than from any frequency bound. Practical performance is supported by the reported numerical comparisons showing fewer O(n³) operations than Newton on the test set. We will add a clarifying discussion of the interval-updating variants and their adaptation to local curvature, while noting that a worst-case exit-frequency analysis lies outside the current scope. revision: partial

  2. Referee: Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.

    Authors: The proof already relies on the fact that exit from the interval immediately triggers a new positive-definite approximation whose error is again bounded, restoring the descent property before the line search is performed. To make this interaction fully explicit, we will revise the convergence section to state the logical sequence at each iteration: (i) test whether the current point lies inside the maintained interval, (ii) recompute and invert the approximate Hessian if an exit has occurred, (iii) confirm the resulting direction is a descent direction, and (iv) execute the line search with the current approximation before updating the interval. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper constructs INTHOP by defining an interval around a Hessian evaluation point, proving a bound on the difference between the exact Hessian and the positive-definite approximation when iterates remain inside that interval, and reusing the approximation (with only gradient evaluations) inside the interval. Global convergence follows from a standard line-search argument that uses the guaranteed descent property of the positive-definite direction. These steps rely on external regularity assumptions (e.g., continuity or Lipschitz properties of the Hessian) rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. The interval-size updating rules are presented as explicit algorithmic variants whose correctness is independent of the convergence claim itself.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard nonconvex optimization assumptions plus new parameters for interval size and eigenvalue computation that are chosen by the algorithm variants.

free parameters (2)
  • interval size
    Determines how often the approximate Hessian can be reused; chosen via the interval-size updating methods described for the algorithm variants.
  • minimum eigenvalue computation method
    Used to ensure the approximate matrix remains positive definite; different methods are tested as part of the algorithm variants.
axioms (1)
  • domain assumption Standard assumptions for global convergence proofs in nonconvex optimization (e.g., continuous differentiability, bounded level sets or Lipschitz gradients).
    Invoked to prove global convergence of the proposed algorithm.

pith-pipeline@v0.9.0 · 5828 in / 1345 out tokens · 32095 ms · 2026-05-18T04:03:42.285877+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

  1. [1]

    S., Androulakis, I

    Adjiman, C. S., Androulakis, I. P., and Floudas, C. A. (1998a). A global optimization method, αbb, for general twice-differentiable constrained nlps—ii. implementation and computational results. Computers & chemical engineering, 22(9):1159–1179

  2. [2]

    S., Dallwig, S., Floudas, C

    Adjiman, C. S., Dallwig, S., Floudas, C. A., and Neumaier, A. (1998b). A global optimization method,αbb, for general twice-differentiable constrained nlps—i. theoretical advances.Computers & Chemical Engineering, 22(9):1137–1158

  3. [3]

    Adjiman, C. S. and Floudas, C. A. (1996). Rigorous convex underestimators for general twice- differentiable problems.Journal of Global Optimization, 9:23–40

  4. [4]

    Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics

    Adjiman, C.S.andPapamichail, I.(2004). Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics. In Floudas, C. A. and Pardalos, P., editors,Frontiers in Global Optimization, pages 1–23, Boston, MA. Springer US

  5. [5]

    Al-Khayyal, F. A. and Falk, J. E. (1983). Jointly constrained biconvex programming.Mathematics of Operations Research, 8(2):273–286

  6. [6]

    Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

    Allman, A., Tang, W., andDaoutidis, P.(2019). Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

  7. [7]

    P., Maranas, C

    Androulakis, I. P., Maranas, C. D., and Floudas, C. A. (1995).αbb: A global optimization method for general constrained nonconvex problems.Journal of Global Optimization, 7:337–363

  8. [8]

    and Hasan, M

    Bajaj, I. and Hasan, M. F. (2019). Unipopt: Univariate projection-based optimization without derivatives.Computers & Chemical Engineering, 127:71–87

  9. [9]

    and Hasan, M

    Bajaj, I. and Hasan, M. F. (2020a). Deterministic global derivative-free optimization of black-box problems with bounded hessian.Optimization Letters, 14(4):1011–1026

  10. [10]

    and Hasan, M

    Bajaj, I. and Hasan, M. F. (2020b). Global dynamic optimization using edge-concave underestima- tor.Journal of Global Optimization, 77(3):487–512

  11. [11]

    S., and Hasan, M

    Bajaj, I., Iyer, S. S., and Hasan, M. F. (2018). A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point.Computers & Chemical Engineering, 116:306–321

  12. [12]

    Bauer, C., Frink, A., and Kreckel, R. (2002). Introduction to the ginac framework for symbolic computation within the c++ programming language.Journal of Symbolic Computation, 33(1):1–12

  13. [13]

    (2015).Convex optimization algorithms

    Bertsekas, D. (2015).Convex optimization algorithms. Athena Scientific

  14. [14]

    Bollapragada, R., Byrd, R., and Nocedal, J. (2016). Exact and inexact subsampled newton methods for optimization

  15. [15]

    and Vandenberghe, L

    Boyd, S. and Vandenberghe, L. (2004).Convex optimization. Cambridge university press

  16. [16]

    Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 2. the new algorithm.Ima Journal of Applied Mathematics, 6:222–231

  17. [17]

    I., and Toint, P

    Cartis, C., Gould, N. I., and Toint, P. L. (2011). Adaptive cubic regularisation methods for uncon- strained optimization. part i: motivation, convergence and numerical results.Mathematical Program- ming, 127(2):245–295

  18. [18]

    G., and Zavala, V

    Chiang, N., Petra, C. G., and Zavala, V. M. (2014). Structured nonconvex optimization of large- scale energy systems using pips-nlp. In2014 Power Systems Computation Conference, pages 1–7. IEEE

  19. [19]

    B., and LeCun, Y

    Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015). The loss surfaces of multilayer networks. InArtificial intelligence and statistics, pages 192–204. PMLR. 22

  20. [20]

    L., Shin, S., and Zavala, V

    Cole, D. L., Shin, S., and Zavala, V. M. (2022). A julia framework for graph-structured nonlinear optimization.Industrial & Engineering Chemistry Research, 61(26):9366–9380

  21. [21]

    Deif, A. (1991). The interval eigenvalue problem.ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 71(1):61–64

  22. [22]

    Doikov, N., Jaggi, M., et al. (2023). Second-order optimization with lazy hessians. InInternational Conference on Machine Learning, pages 8138–8161. PMLR

  23. [23]

    Erdogdu, M. A. and Montanari, A. (2015). Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, 28

  24. [24]

    Fletcher, R. (1970). A new approach to variable metric algorithms.Comput. J., 13:317–322

  25. [25]

    H., Yao, Y., and You, F

    Gebreslassie, B. H., Yao, Y., and You, F. (2012). Design under uncertainty of hydrocarbon biore- finery supply chains: multiobjective stochastic programming models, decomposition algorithm, and a comparison between cvar and downside risk.AIChE Journal, 58(7):2155–2179

  26. [26]

    and Lan, G

    Ghadimi, S. and Lan, G. (2013). Accelerated gradient methods for nonconvex nonlinear and stochas- tic programming.Mathematical Programming, 156

  27. [27]

    Goldfarb, D. (1970). A family of variable-metric methods derived by variational means.Mathematics of Computation, 24:23–26

  28. [28]

    Goualard, F. (2005). Gaol: Not just another interval library.University of Nantes, France

  29. [29]

    Gratton, S., Jerad, S., and Toint, P. L. (2025). Yet another fast variant of newton’s method for nonconvex optimization.IMA Journal of Numerical Analysis, 45(2):971–1008

  30. [30]

    Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org

  31. [31]

    Hasan, M. F. (2018). An edge-concave underestimator for the global optimization of twice- differentiable nonconvex problems.Journal of Global Optimization, 71(4):735–752

  32. [32]

    P., and Laird, C

    Kang, J., Cao, Y., Word, D. P., and Laird, C. D. (2014). An interior-point method for efficient solu- tion of block-structured nlp problems using an implicit schur-complement decomposition.Computers & Chemical Engineering, 71:563–573

  33. [33]

    Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

    Karimireddy, S. P., Stich, S. U., and Jaggi, M. (2018). Global linear convergence of newton’s method without strong-convexity or lipschitz gradients.CoRR, abs/1806.00413

  34. [34]

    Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980

  35. [35]

    and Grossmann, I

    Li, C. and Grossmann, I. E. (2019). A generalized benders decomposition-based branch and cut algorithm for two-stage stochastic programs with nonconvex constraints and mixed-binary first and second stage variables.Journal of Global Optimization, 75(2):247–272

  36. [36]

    Liu, Y., Gao, Y., and Yin, W. (2020). An improved analysis of stochastic gradient descent with momentum.Advances in Neural Information Processing Systems, 33:18261–18271

  37. [37]

    Maranas, C. D. and Floudas, C. A. (1995). Finding all solutions of nonlinearly constrained systems of equations.Journal of Global Optimization, 7:143–182

  38. [38]

    McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175

  39. [39]

    McMahan, H. B. and Streeter, M. (2010). Adaptive bound optimization for online convex optimiza- tion.arXiv preprint arXiv:1002.4908

  40. [40]

    Meyer, C. A. and Floudas, C. A. (2004a). Trilinear monomials with mixed sign domains: Facets of the convex and concave envelopes.Journal of Global Optimization, 29:125–155

  41. [41]

    Meyer, C. A. and Floudas, C. A. (2004b). Trilinear monomials with positive or negative domains: Facets of the convex and concave envelopes. InFrontiers in global optimization, pages 327–352. Springer. 23

  42. [42]

    Meyer, C. A. and Floudas, C. A. (2005). Convex envelopes for edge-concave functions.Mathematical programming, 103:207–224

  43. [43]

    and Daoutidis, P

    Mitrai, I. and Daoutidis, P. (2021). Efficient solution of enterprise-wide optimization problems using nested stochastic blockmodeling.Industrial & Engineering Chemistry Research, 60(40):14476–14494

  44. [44]

    Mitsos, A., Chachuat, B., and Barton, P. I. (2009). Mccormick-based relaxations of algorithms. SIAM Journal on Optimization, 20(2):573–601

  45. [45]

    Mitsos, A., Lemonidis, P., and Barton, P. I. (2008). Global solution of bilevel programs with a nonconvex inner program.Journal of Global Optimization, 42:475–513

  46. [46]

    Montoison, A., Pacaud, F., Saunders, M., Shin, S., and Orban, D. (2025). Madncl: a gpu implementation of algorithm ncl for large-scale, degenerate nonlinear programs.arXiv preprint arXiv:2510.05885

  47. [47]

    E., Kearfott, R

    Moore, R. E., Kearfott, R. B., and Cloud, M. J. (2009).Introduction to interval analysis. SIAM

  48. [48]

    Moré, J. J. and Wild, S. M. (2009). Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191

  49. [49]

    and Kokame, H

    Mori, T. and Kokame, H. (1994). Eigenvalue bounds for a certain class of interval matrices.IEICE transactions on fundamentals of electronics, communications and computer sciences, 77(10):1707– 1709

  50. [50]

    Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate o(1/k2)

  51. [51]

    Nesterov, Y. et al. (2018).Lectures on convex optimization, volume 137. Springer

  52. [52]

    and Polyak, B

    Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of newton method and its global performance.Mathematical programming, 108(1):177–205

  53. [53]

    and Wright, S

    Nocedal, J. and Wright, S. J. (1999).Numerical optimization. Springer

  54. [54]

    A., and Anitescu, M

    Pacaud, F., Schanen, M., Shin, S., Maldonado, D. A., and Anitescu, M. (2024). Parallel interior- point solver for block-structured nonlinear programs on simd/gpu architectures.Optimization Methods and Software, 39(4):874–897

  55. [55]

    and Shin, S

    Pacaud, F. and Shin, S. (2024). Gpu-accelerated dynamic nonlinear optimization with examodels and madnlp. In2024 IEEE 63rd Conference on Decision and Control (CDC), pages 5963–5968. IEEE

  56. [56]

    and Sahinidis, N

    Puranik, Y. and Sahinidis, N. (2017). Bounds tightening based on optimality conditions for non- convex box-constrained optimization.Journal of Global Optimization, 67

  57. [57]

    Qian, N. (1999). On the momentum term in gradient descent learning algorithms.Neural networks, 12(1):145–151

  58. [58]

    and Monro, S

    Robbins, H. and Monro, S. (1951). A stochastic approximation method.The annals of mathematical statistics, pages 400–407

  59. [59]

    Rohn, J. (1998). Bounds on eigenvalues of interval matrices.ZAMM-Zeitschrift fur Angewandte Mathematik und Mechanik, 78(3):S1049

  60. [60]

    and Mahoney, M

    Roosta-Khorasani, F. and Mahoney, M. W. (2016a). Sub-sampled newton methods i: Globally convergent algorithms

  61. [61]

    and Mahoney, M

    Roosta-Khorasani, F. and Mahoney, M. W. (2016b). Sub-sampled newton methods ii: Local con- vergence rates

  62. [62]

    and Mahoney, M

    Roosta-Khorasani, F. and Mahoney, M. W. (2019). Sub-sampled newton methods.Mathematical Programming, 174:293–326

  63. [63]

    Ryoo, H. S. and Sahinidis, N. V. (2001). Analysis of bounds for multilinear functions.Journal of Global Optimization, 19(4):403–424. 24

  64. [64]

    K., Stuber, M

    Scott, J. K., Stuber, M. D., and Barton, P. I. (2011). Generalized mccormick relaxations.Journal of Global Optimization, 51(4):569–606

  65. [65]

    Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization.Mathe- matics of Computation, 24:647–656

  66. [66]

    (2004).On the existence of polyhedral convex envelopes

    Tardella, F. (2004).On the existence of polyhedral convex envelopes. Springer

  67. [67]

    and Sahinidis, N

    Tawarmalani, M. and Sahinidis, N. V. (2001). Semidefinite relaxations of fractional programs via novel convexification techniques.Journal of Global Optimization, 20:133–154

  68. [68]

    and Sahinidis, N

    Tawarmalani, M. and Sahinidis, N. V. (2002). Convex extensions and envelopes of lower semi- continuous functions.Mathematical Programming, 93(2):247–263

  69. [69]

    and Biegler, L

    Yoshio, N. and Biegler, L. T. (2021). A nested schur decomposition approach for multiperiod optimization of chemical processes.Computers & Chemical Engineering, 155:107509

  70. [70]

    Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. 25 Appendix A. Proofs This appendix provides proofs of Theorem 1 and Lemma 1 stated in Section 3.1 and 3.4.2, respectively. We prove the following results stated in Theorem 1: •Function value bound: |L(xt)−f(x k)| ≤ Lf 2 √nδ+ |λmin| 8 nδ2 •Gradient bound: ∥∇L(xt)− ∇f(x k)∥ ≤ Lg 2 √nδ+ |λmin...