INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization

Ashutosh Sharma; Gauransh Dingwani; Ishan Bajaj; Krishan Kumar; Nikhil Gupta; Vaishnavi Gupta

arxiv: 2510.22342 · v4 · submitted 2025-10-25 · 🧮 math.OC

INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization

Krishan Kumar , Ashutosh Sharma , Gauransh Dingwani , Nikhil Gupta , Vaishnavi Gupta , Ishan Bajaj This is my paper

Pith reviewed 2026-05-18 04:03 UTC · model grok-4.3

classification 🧮 math.OC

keywords nonconvex optimizationHessian approximationglobal convergencesecond-order methodsdescent directioninterval methodsNewton-type algorithms

0 comments

The pith

INTHOP reuses a positive definite Hessian approximation within a bounded interval to guarantee descent directions and global convergence while inverting the matrix only at selected iterations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents INTHOP, an algorithm that approximates the Hessian by a positive definite matrix and reuses that approximation across iterations as long as the current point stays inside a controlled interval around the prior point. The authors prove that the approximation error remains bounded inside the interval, which ensures the search direction is always a descent direction even when the true Hessian is indefinite. They establish global convergence for the overall method and introduce variants that differ in how the interval size is updated or how the minimum eigenvalue is computed. Numerical comparisons on test problems show that INTHOP requires fewer function and gradient evaluations than steepest descent or quasi-Newton methods and substantially fewer O(n cubed) operations than the full Newton method on nonconvex instances. This combination of reduced cost and convergence guarantees addresses the main barriers to using second-order information on large-scale nonconvex problems.

Core claim

INTHOP computes a positive definite approximation to the Hessian at certain iterations and reuses it for subsequent steps provided the current point remains inside a chosen interval around the previous point; the difference between this approximation and the exact Hessian is proven to lie within a controlled bound, which guarantees that the resulting search direction is a descent direction, and the overall algorithm is shown to converge globally to a stationary point.

What carries the argument

Interval-bounded positive definite Hessian approximation that is reused while iterates remain inside the interval, with the approximation error bounded to preserve descent.

If this is right

Full Hessian evaluation and inversion occur only when iterates exit the current interval.
Global convergence holds across variants that differ in interval-size updating or minimum-eigenvalue computation.
The method solves more test problems with fewer function and gradient evaluations than steepest descent or quasi-Newton approaches.
Total O(n^3) work is substantially lower than Newton because expensive operations are avoided on most iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If interval-updating rules keep iterates inside the interval for long stretches on typical problems, the approach could scale to larger dimensions than standard Newton methods.
The same bounded-reuse idea could be combined with trust-region or cubic-regularization frameworks to further control step quality.
Problems with rapid movement across the domain would expose how frequently the method triggers full Hessian recomputations in practice.

Load-bearing premise

That the iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations.

What would settle it

A nonconvex test problem on which the method either diverges or requires full Hessian recomputation and inversion at nearly every iteration for all interval-updating variants.

Figures

Figures reproduced from arXiv: 2510.22342 by Ashutosh Sharma, Gauransh Dingwani, Ishan Bajaj, Krishan Kumar, Nikhil Gupta, Vaishnavi Gupta.

**Figure 1.** Figure 1: Graphical illustration of f(x), mN k (x) and mIH k (x) Computing α exactly over an interval using Eq. 3.8 is equivalent to solving a nonconvex optimization problem to global optimality, which could be more difficult than solving Eq. 1.1. Therefore, an efficient and scalable alternative is needed to obtain a reliable lower bound on the minimum eigenvalue over the interval. In the following subsection, we in… view at source ↗

**Figure 2.** Figure 2: Graphical illustration of INTHOP with fixed interval size (INTHOP:F) and Algorithm [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Paths taken by steepest descent, quasi-Newton, and INTHOP:F on illustrative example 1. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: (A) Comparison of paths take by INTHOP:F and NEWTON:CAMI on illustrative example 2. (B) Function [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Fraction of problems solved by INTHOP:F variants with (A) gradient evaluations, (B) Hessian evaluations, (C) [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Data profile corresponding to fixed and adaptive [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Variation of function value and δ with iterations for (A) 10_s295 and (B) 2_gold. Overall, these results highlight the robustness of the adaptive interval strategies, particularly when used with the EM and MK α estimation methods. It consistently outperforms the fixed-interval approaches across all the performance metrics. 5.3. Comparison with Competing Algorithms Next, we compare our proposed algorithm w… view at source ↗

**Figure 8.** Figure 8: Fraction of problems solved by INTHOP with EM:A1 and MK:A1, steepest descent, quasi-Newton BFGS, and [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Fraction of problems solved by INTHOP variants and Newton-CAMI on problems requiring Hessian [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Function value with number of (A) function, (B) Hessian evaluations, and (C) CPU time. [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Function value with (A) function, (B) Hessian evaluations, and (C) CPU time. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

Second-order Newton-type algorithms that leverage the exact Hessian or its approximation are central to solve nonlinear optimization problems. However, their applications in solving large-scale nonconvex problems are hindered by three primary challenges: (1) the high computational cost associated with Hessian evaluations, (2) its inversion, and (3) ensuring descent direction at points where the Hessian becomes indefinite. We propose INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems to deal with these primary challenges. The proposed search direction is based on approximating the original Hessian matrix by a positive definite matrix. The novelty of the proposed method is that the proposed search direction is guaranteed to be descent and requires approximation of Hessian and its inversion only at specific iterations. We prove that the difference between the calculated approximate and exact Hessian is bounded within an interval. Accordingly, the approximate Hessian matrix is reused if the iterates are in that chosen interval while computing the gradients at each iteration. We develop various algorithm variants based on the interval size updating methods and minimum eigenvalue computation methods. We also prove the global convergence of the proposed algorithm. Further, we apply the algorithm to an extensive set of test problems and compare its performance with the existing methods such as steepest descent, quasi-Newton, and Newton method. We show empirically that the proposed method solves more problems in fewer function and gradient evaluations than steepest descent and the quasi-Newton method. While in the comparison to the Newton method, we illustrate that for nonconvex optimization problems, we require substantially less $O(n^3)$ operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

INTHOP reuses a positive definite Hessian approximation inside intervals to cut recomputation costs while claiming global convergence, but the practical savings rest on unquantified reuse frequency.

read the letter

The main point is that INTHOP approximates the Hessian with a positive definite version and reuses it as long as the current point stays inside an interval around the last full evaluation. This setup is supposed to ensure the search direction is always a descent direction and to prove global convergence for nonconvex problems. What is new here is the interval mechanism for deciding when to recompute the Hessian and its inverse. The paper shows that the difference between the approximate and exact Hessian is bounded if the iterate is in the interval, and they develop variants based on different ways to update the interval size and compute the minimum eigenvalue. The work does well by addressing the three challenges mentioned: cost of Hessian eval, inversion, and handling indefiniteness. The empirical results indicate it solves more problems with fewer evaluations than steepest descent or quasi-Newton, and it avoids most of the cubic work of full Newton. The soft spot is in the efficiency argument. The savings depend on the iterates not leaving the interval too often. In nonconvex settings where curvature varies a lot, this might not hold, and there is no analysis given on average exit rates or how the updating rule prevents frequent exits. That makes the claimed reduction in O(n^3) operations harder to assess without more details. This paper is for optimization researchers who want second-order methods that scale better to larger nonconvex problems. A reader interested in algorithm design for machine learning or design optimization would find the variants and convergence proof worth looking at. It deserves a serious referee because the core mechanism is original and the performance claims are specific enough to review properly. I recommend putting it through peer review.

Referee Report

2 major / 3 minor

Summary. The paper proposes INTHOP, an interval Hessian-based optimization algorithm for nonconvex problems. The search direction approximates the Hessian by a positive definite matrix, with the approximation and its inversion performed only at specific iterations when the current iterate exits a dynamically maintained interval around the last evaluation point; otherwise the same approximate Hessian is reused while gradients are evaluated at each step. The manuscript claims to prove that the difference between the approximate and exact Hessian remains bounded within the interval, that the resulting direction is always a descent direction, and that the algorithm converges globally. Various algorithm variants are developed based on different interval-size updating rules and minimum-eigenvalue computation methods. Empirical comparisons on test problems show that INTHOP solves more problems with fewer function and gradient evaluations than steepest descent and quasi-Newton methods, while requiring substantially fewer O(n³) operations than the full Newton method for nonconvex problems.

Significance. If the bounded-error and global-convergence claims hold, the approach offers a principled way to amortize the dominant cubic cost of second-order methods by exploiting local constancy of the Hessian within intervals, which could be practically relevant for large-scale nonconvex optimization. The explicit construction of positive-definite approximations that guarantee descent and the development of multiple interval-updating variants are constructive contributions. The empirical results on an extensive test set provide concrete evidence of reduced evaluation counts relative to first-order and quasi-Newton baselines.

major comments (2)

[algorithm description and variants] Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.
[global convergence proof] Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.

minor comments (3)

[algorithm presentation] The manuscript would benefit from a single pseudocode listing that clearly distinguishes the full-Hessian step from the reuse step and shows how the interval is updated after each exit.
[numerical results] In the numerical experiments section, explicit details on the selection of test problems, any exclusion criteria, and the precise definition of “solves more problems” (e.g., success tolerance and maximum iteration limit) should be provided for reproducibility.
[empirical comparisons] A brief comparison table reporting average number of Hessian evaluations (or O(n³) operations) across methods, rather than only qualitative statements, would strengthen the efficiency claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond point by point to the major comments below.

read point-by-point responses

Referee: Description of algorithm variants and interval-size updating: The central efficiency claim—that the method requires substantially fewer O(n³) operations than the Newton method—rests on the assumption that iterates remain inside the chosen interval for enough consecutive steps to offset the cost of occasional full Hessian evaluations and inversions. No quantitative bound on exit frequency is derived, nor is there an analysis showing that the updating rule prevents rapid exits in regions of rapidly varying curvature typical of nonconvex problems. This assumption is load-bearing for the practical advantage asserted in the abstract and the numerical comparisons.

Authors: We agree that a quantitative bound on exit frequency would provide stronger support for the efficiency claims. The manuscript proves that the approximation error is bounded whenever the iterate remains inside the interval and that recomputation occurs on exit, preserving positive-definiteness and descent at every step; global convergence follows from this per-iteration guarantee rather than from any frequency bound. Practical performance is supported by the reported numerical comparisons showing fewer O(n³) operations than Newton on the test set. We will add a clarifying discussion of the interval-updating variants and their adaptation to local curvature, while noting that a worst-case exit-frequency analysis lies outside the current scope. revision: partial
Referee: Proof of global convergence: The convergence argument is described as following from a standard line-search once descent is guaranteed by positive-definiteness of the reused approximate Hessian. However, the validity of reusing the same approximation across multiple steps depends on the interval remaining valid; if the bounded-difference property holds only conditionally on the iterate staying inside the interval, the proof sketch should explicitly address how the line-search and interval-exit logic interact to preserve the descent property at every iteration.

Authors: The proof already relies on the fact that exit from the interval immediately triggers a new positive-definite approximation whose error is again bounded, restoring the descent property before the line search is performed. To make this interaction fully explicit, we will revise the convergence section to state the logical sequence at each iteration: (i) test whether the current point lies inside the maintained interval, (ii) recompute and invert the approximate Hessian if an exit has occurred, (iii) confirm the resulting direction is a descent direction, and (iv) execute the line search with the current approximation before updating the interval. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper constructs INTHOP by defining an interval around a Hessian evaluation point, proving a bound on the difference between the exact Hessian and the positive-definite approximation when iterates remain inside that interval, and reusing the approximation (with only gradient evaluations) inside the interval. Global convergence follows from a standard line-search argument that uses the guaranteed descent property of the positive-definite direction. These steps rely on external regularity assumptions (e.g., continuity or Lipschitz properties of the Hessian) rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. The interval-size updating rules are presented as explicit algorithmic variants whose correctness is independent of the convergence claim itself.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard nonconvex optimization assumptions plus new parameters for interval size and eigenvalue computation that are chosen by the algorithm variants.

free parameters (2)

interval size
Determines how often the approximate Hessian can be reused; chosen via the interval-size updating methods described for the algorithm variants.
minimum eigenvalue computation method
Used to ensure the approximate matrix remains positive definite; different methods are tested as part of the algorithm variants.

axioms (1)

domain assumption Standard assumptions for global convergence proofs in nonconvex optimization (e.g., continuous differentiability, bounded level sets or Lipschitz gradients).
Invoked to prove global convergence of the proposed algorithm.

pith-pipeline@v0.9.0 · 5828 in / 1345 out tokens · 32095 ms · 2026-05-18T04:03:42.285877+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

[1]

S., Androulakis, I

Adjiman, C. S., Androulakis, I. P., and Floudas, C. A. (1998a). A global optimization method, αbb, for general twice-differentiable constrained nlps—ii. implementation and computational results. Computers & chemical engineering, 22(9):1159–1179

work page
[2]

S., Dallwig, S., Floudas, C

Adjiman, C. S., Dallwig, S., Floudas, C. A., and Neumaier, A. (1998b). A global optimization method,αbb, for general twice-differentiable constrained nlps—i. theoretical advances.Computers & Chemical Engineering, 22(9):1137–1158

work page
[3]

Adjiman, C. S. and Floudas, C. A. (1996). Rigorous convex underestimators for general twice- differentiable problems.Journal of Global Optimization, 9:23–40

work page 1996
[4]

Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics

Adjiman, C.S.andPapamichail, I.(2004). Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics. In Floudas, C. A. and Pardalos, P., editors,Frontiers in Global Optimization, pages 1–23, Boston, MA. Springer US

work page 2004
[5]

Al-Khayyal, F. A. and Falk, J. E. (1983). Jointly constrained biconvex programming.Mathematics of Operations Research, 8(2):273–286

work page 1983
[6]

Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

Allman, A., Tang, W., andDaoutidis, P.(2019). Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

work page 2019
[7]

P., Maranas, C

Androulakis, I. P., Maranas, C. D., and Floudas, C. A. (1995).αbb: A global optimization method for general constrained nonconvex problems.Journal of Global Optimization, 7:337–363

work page 1995
[8]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2019). Unipopt: Univariate projection-based optimization without derivatives.Computers & Chemical Engineering, 127:71–87

work page 2019
[9]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2020a). Deterministic global derivative-free optimization of black-box problems with bounded hessian.Optimization Letters, 14(4):1011–1026

work page
[10]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2020b). Global dynamic optimization using edge-concave underestima- tor.Journal of Global Optimization, 77(3):487–512

work page
[11]

S., and Hasan, M

Bajaj, I., Iyer, S. S., and Hasan, M. F. (2018). A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point.Computers & Chemical Engineering, 116:306–321

work page 2018
[12]

Bauer, C., Frink, A., and Kreckel, R. (2002). Introduction to the ginac framework for symbolic computation within the c++ programming language.Journal of Symbolic Computation, 33(1):1–12

work page 2002
[13]

(2015).Convex optimization algorithms

Bertsekas, D. (2015).Convex optimization algorithms. Athena Scientific

work page 2015
[14]

Bollapragada, R., Byrd, R., and Nocedal, J. (2016). Exact and inexact subsampled newton methods for optimization

work page 2016
[15]

and Vandenberghe, L

Boyd, S. and Vandenberghe, L. (2004).Convex optimization. Cambridge university press

work page 2004
[16]

Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 2. the new algorithm.Ima Journal of Applied Mathematics, 6:222–231

work page 1970
[17]

I., and Toint, P

Cartis, C., Gould, N. I., and Toint, P. L. (2011). Adaptive cubic regularisation methods for uncon- strained optimization. part i: motivation, convergence and numerical results.Mathematical Program- ming, 127(2):245–295

work page 2011
[18]

G., and Zavala, V

Chiang, N., Petra, C. G., and Zavala, V. M. (2014). Structured nonconvex optimization of large- scale energy systems using pips-nlp. In2014 Power Systems Computation Conference, pages 1–7. IEEE

work page 2014
[19]

B., and LeCun, Y

Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015). The loss surfaces of multilayer networks. InArtificial intelligence and statistics, pages 192–204. PMLR. 22

work page 2015
[20]

L., Shin, S., and Zavala, V

Cole, D. L., Shin, S., and Zavala, V. M. (2022). A julia framework for graph-structured nonlinear optimization.Industrial & Engineering Chemistry Research, 61(26):9366–9380

work page 2022
[21]

Deif, A. (1991). The interval eigenvalue problem.ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 71(1):61–64

work page 1991
[22]

Doikov, N., Jaggi, M., et al. (2023). Second-order optimization with lazy hessians. InInternational Conference on Machine Learning, pages 8138–8161. PMLR

work page 2023
[23]

Erdogdu, M. A. and Montanari, A. (2015). Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, 28

work page 2015
[24]

Fletcher, R. (1970). A new approach to variable metric algorithms.Comput. J., 13:317–322

work page 1970
[25]

H., Yao, Y., and You, F

Gebreslassie, B. H., Yao, Y., and You, F. (2012). Design under uncertainty of hydrocarbon biore- finery supply chains: multiobjective stochastic programming models, decomposition algorithm, and a comparison between cvar and downside risk.AIChE Journal, 58(7):2155–2179

work page 2012
[26]

and Lan, G

Ghadimi, S. and Lan, G. (2013). Accelerated gradient methods for nonconvex nonlinear and stochas- tic programming.Mathematical Programming, 156

work page 2013
[27]

Goldfarb, D. (1970). A family of variable-metric methods derived by variational means.Mathematics of Computation, 24:23–26

work page 1970
[28]

Goualard, F. (2005). Gaol: Not just another interval library.University of Nantes, France

work page 2005
[29]

Gratton, S., Jerad, S., and Toint, P. L. (2025). Yet another fast variant of newton’s method for nonconvex optimization.IMA Journal of Numerical Analysis, 45(2):971–1008

work page 2025
[30]

Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org

work page 2010
[31]

Hasan, M. F. (2018). An edge-concave underestimator for the global optimization of twice- differentiable nonconvex problems.Journal of Global Optimization, 71(4):735–752

work page 2018
[32]

P., and Laird, C

Kang, J., Cao, Y., Word, D. P., and Laird, C. D. (2014). An interior-point method for efficient solu- tion of block-structured nlp problems using an implicit schur-complement decomposition.Computers & Chemical Engineering, 71:563–573

work page 2014
[33]

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

Karimireddy, S. P., Stich, S. U., and Jaggi, M. (2018). Global linear convergence of newton’s method without strong-convexity or lipschitz gradients.CoRR, abs/1806.00413

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[35]

and Grossmann, I

Li, C. and Grossmann, I. E. (2019). A generalized benders decomposition-based branch and cut algorithm for two-stage stochastic programs with nonconvex constraints and mixed-binary first and second stage variables.Journal of Global Optimization, 75(2):247–272

work page 2019
[36]

Liu, Y., Gao, Y., and Yin, W. (2020). An improved analysis of stochastic gradient descent with momentum.Advances in Neural Information Processing Systems, 33:18261–18271

work page 2020
[37]

Maranas, C. D. and Floudas, C. A. (1995). Finding all solutions of nonlinearly constrained systems of equations.Journal of Global Optimization, 7:143–182

work page 1995
[38]

McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175

work page 1976
[39]

McMahan, H. B. and Streeter, M. (2010). Adaptive bound optimization for online convex optimiza- tion.arXiv preprint arXiv:1002.4908

work page internal anchor Pith review Pith/arXiv arXiv 2010
[40]

Meyer, C. A. and Floudas, C. A. (2004a). Trilinear monomials with mixed sign domains: Facets of the convex and concave envelopes.Journal of Global Optimization, 29:125–155

work page
[41]

Meyer, C. A. and Floudas, C. A. (2004b). Trilinear monomials with positive or negative domains: Facets of the convex and concave envelopes. InFrontiers in global optimization, pages 327–352. Springer. 23

work page
[42]

Meyer, C. A. and Floudas, C. A. (2005). Convex envelopes for edge-concave functions.Mathematical programming, 103:207–224

work page 2005
[43]

and Daoutidis, P

Mitrai, I. and Daoutidis, P. (2021). Efficient solution of enterprise-wide optimization problems using nested stochastic blockmodeling.Industrial & Engineering Chemistry Research, 60(40):14476–14494

work page 2021
[44]

Mitsos, A., Chachuat, B., and Barton, P. I. (2009). Mccormick-based relaxations of algorithms. SIAM Journal on Optimization, 20(2):573–601

work page 2009
[45]

Mitsos, A., Lemonidis, P., and Barton, P. I. (2008). Global solution of bilevel programs with a nonconvex inner program.Journal of Global Optimization, 42:475–513

work page 2008
[46]

Montoison, A., Pacaud, F., Saunders, M., Shin, S., and Orban, D. (2025). Madncl: a gpu implementation of algorithm ncl for large-scale, degenerate nonlinear programs.arXiv preprint arXiv:2510.05885

work page arXiv 2025
[47]

E., Kearfott, R

Moore, R. E., Kearfott, R. B., and Cloud, M. J. (2009).Introduction to interval analysis. SIAM

work page 2009
[48]

Moré, J. J. and Wild, S. M. (2009). Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191

work page 2009
[49]

and Kokame, H

Mori, T. and Kokame, H. (1994). Eigenvalue bounds for a certain class of interval matrices.IEICE transactions on fundamentals of electronics, communications and computer sciences, 77(10):1707– 1709

work page 1994
[50]

Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate o(1/k2)

work page 1983
[51]

Nesterov, Y. et al. (2018).Lectures on convex optimization, volume 137. Springer

work page 2018
[52]

and Polyak, B

Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of newton method and its global performance.Mathematical programming, 108(1):177–205

work page 2006
[53]

and Wright, S

Nocedal, J. and Wright, S. J. (1999).Numerical optimization. Springer

work page 1999
[54]

A., and Anitescu, M

Pacaud, F., Schanen, M., Shin, S., Maldonado, D. A., and Anitescu, M. (2024). Parallel interior- point solver for block-structured nonlinear programs on simd/gpu architectures.Optimization Methods and Software, 39(4):874–897

work page 2024
[55]

and Shin, S

Pacaud, F. and Shin, S. (2024). Gpu-accelerated dynamic nonlinear optimization with examodels and madnlp. In2024 IEEE 63rd Conference on Decision and Control (CDC), pages 5963–5968. IEEE

work page 2024
[56]

and Sahinidis, N

Puranik, Y. and Sahinidis, N. (2017). Bounds tightening based on optimality conditions for non- convex box-constrained optimization.Journal of Global Optimization, 67

work page 2017
[57]

Qian, N. (1999). On the momentum term in gradient descent learning algorithms.Neural networks, 12(1):145–151

work page 1999
[58]

and Monro, S

Robbins, H. and Monro, S. (1951). A stochastic approximation method.The annals of mathematical statistics, pages 400–407

work page 1951
[59]

Rohn, J. (1998). Bounds on eigenvalues of interval matrices.ZAMM-Zeitschrift fur Angewandte Mathematik und Mechanik, 78(3):S1049

work page 1998
[60]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2016a). Sub-sampled newton methods i: Globally convergent algorithms

work page
[61]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2016b). Sub-sampled newton methods ii: Local con- vergence rates

work page
[62]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2019). Sub-sampled newton methods.Mathematical Programming, 174:293–326

work page 2019
[63]

Ryoo, H. S. and Sahinidis, N. V. (2001). Analysis of bounds for multilinear functions.Journal of Global Optimization, 19(4):403–424. 24

work page 2001
[64]

K., Stuber, M

Scott, J. K., Stuber, M. D., and Barton, P. I. (2011). Generalized mccormick relaxations.Journal of Global Optimization, 51(4):569–606

work page 2011
[65]

Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization.Mathe- matics of Computation, 24:647–656

work page 1970
[66]

(2004).On the existence of polyhedral convex envelopes

Tardella, F. (2004).On the existence of polyhedral convex envelopes. Springer

work page 2004
[67]

and Sahinidis, N

Tawarmalani, M. and Sahinidis, N. V. (2001). Semidefinite relaxations of fractional programs via novel convexification techniques.Journal of Global Optimization, 20:133–154

work page 2001
[68]

and Sahinidis, N

Tawarmalani, M. and Sahinidis, N. V. (2002). Convex extensions and envelopes of lower semi- continuous functions.Mathematical Programming, 93(2):247–263

work page 2002
[69]

and Biegler, L

Yoshio, N. and Biegler, L. T. (2021). A nested schur decomposition approach for multiperiod optimization of chemical processes.Computers & Chemical Engineering, 155:107509

work page 2021
[70]

Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. 25 Appendix A. Proofs This appendix provides proofs of Theorem 1 and Lemma 1 stated in Section 3.1 and 3.4.2, respectively. We prove the following results stated in Theorem 1: •Function value bound: |L(xt)−f(x k)| ≤ Lf 2 √nδ+ |λmin| 8 nδ2 •Gradient bound: ∥∇L(xt)− ∇f(x k)∥ ≤ Lg 2 √nδ+ |λmin...

work page 2012

[1] [1]

S., Androulakis, I

Adjiman, C. S., Androulakis, I. P., and Floudas, C. A. (1998a). A global optimization method, αbb, for general twice-differentiable constrained nlps—ii. implementation and computational results. Computers & chemical engineering, 22(9):1159–1179

work page

[2] [2]

S., Dallwig, S., Floudas, C

Adjiman, C. S., Dallwig, S., Floudas, C. A., and Neumaier, A. (1998b). A global optimization method,αbb, for general twice-differentiable constrained nlps—i. theoretical advances.Computers & Chemical Engineering, 22(9):1137–1158

work page

[3] [3]

Adjiman, C. S. and Floudas, C. A. (1996). Rigorous convex underestimators for general twice- differentiable problems.Journal of Global Optimization, 9:23–40

work page 1996

[4] [4]

Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics

Adjiman, C.S.andPapamichail, I.(2004). Adeterministicglobaloptimizationalgorithmforproblems with nonlinear dynamics. In Floudas, C. A. and Pardalos, P., editors,Frontiers in Global Optimization, pages 1–23, Boston, MA. Springer US

work page 2004

[5] [5]

Al-Khayyal, F. A. and Falk, J. E. (1983). Jointly constrained biconvex programming.Mathematics of Operations Research, 8(2):273–286

work page 1983

[6] [6]

Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

Allman, A., Tang, W., andDaoutidis, P.(2019). Decode: acommunity-basedalgorithmforgenerating high-quality decompositions of optimization problems.Optimization and Engineering, 20(4):1067– 1084

work page 2019

[7] [7]

P., Maranas, C

Androulakis, I. P., Maranas, C. D., and Floudas, C. A. (1995).αbb: A global optimization method for general constrained nonconvex problems.Journal of Global Optimization, 7:337–363

work page 1995

[8] [8]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2019). Unipopt: Univariate projection-based optimization without derivatives.Computers & Chemical Engineering, 127:71–87

work page 2019

[9] [9]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2020a). Deterministic global derivative-free optimization of black-box problems with bounded hessian.Optimization Letters, 14(4):1011–1026

work page

[10] [10]

and Hasan, M

Bajaj, I. and Hasan, M. F. (2020b). Global dynamic optimization using edge-concave underestima- tor.Journal of Global Optimization, 77(3):487–512

work page

[11] [11]

S., and Hasan, M

Bajaj, I., Iyer, S. S., and Hasan, M. F. (2018). A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point.Computers & Chemical Engineering, 116:306–321

work page 2018

[12] [12]

Bauer, C., Frink, A., and Kreckel, R. (2002). Introduction to the ginac framework for symbolic computation within the c++ programming language.Journal of Symbolic Computation, 33(1):1–12

work page 2002

[13] [13]

(2015).Convex optimization algorithms

Bertsekas, D. (2015).Convex optimization algorithms. Athena Scientific

work page 2015

[14] [14]

Bollapragada, R., Byrd, R., and Nocedal, J. (2016). Exact and inexact subsampled newton methods for optimization

work page 2016

[15] [15]

and Vandenberghe, L

Boyd, S. and Vandenberghe, L. (2004).Convex optimization. Cambridge university press

work page 2004

[16] [16]

Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 2. the new algorithm.Ima Journal of Applied Mathematics, 6:222–231

work page 1970

[17] [17]

I., and Toint, P

Cartis, C., Gould, N. I., and Toint, P. L. (2011). Adaptive cubic regularisation methods for uncon- strained optimization. part i: motivation, convergence and numerical results.Mathematical Program- ming, 127(2):245–295

work page 2011

[18] [18]

G., and Zavala, V

Chiang, N., Petra, C. G., and Zavala, V. M. (2014). Structured nonconvex optimization of large- scale energy systems using pips-nlp. In2014 Power Systems Computation Conference, pages 1–7. IEEE

work page 2014

[19] [19]

B., and LeCun, Y

Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015). The loss surfaces of multilayer networks. InArtificial intelligence and statistics, pages 192–204. PMLR. 22

work page 2015

[20] [20]

L., Shin, S., and Zavala, V

Cole, D. L., Shin, S., and Zavala, V. M. (2022). A julia framework for graph-structured nonlinear optimization.Industrial & Engineering Chemistry Research, 61(26):9366–9380

work page 2022

[21] [21]

Deif, A. (1991). The interval eigenvalue problem.ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 71(1):61–64

work page 1991

[22] [22]

Doikov, N., Jaggi, M., et al. (2023). Second-order optimization with lazy hessians. InInternational Conference on Machine Learning, pages 8138–8161. PMLR

work page 2023

[23] [23]

Erdogdu, M. A. and Montanari, A. (2015). Convergence rates of sub-sampled newton methods. Advances in Neural Information Processing Systems, 28

work page 2015

[24] [24]

Fletcher, R. (1970). A new approach to variable metric algorithms.Comput. J., 13:317–322

work page 1970

[25] [25]

H., Yao, Y., and You, F

Gebreslassie, B. H., Yao, Y., and You, F. (2012). Design under uncertainty of hydrocarbon biore- finery supply chains: multiobjective stochastic programming models, decomposition algorithm, and a comparison between cvar and downside risk.AIChE Journal, 58(7):2155–2179

work page 2012

[26] [26]

and Lan, G

Ghadimi, S. and Lan, G. (2013). Accelerated gradient methods for nonconvex nonlinear and stochas- tic programming.Mathematical Programming, 156

work page 2013

[27] [27]

Goldfarb, D. (1970). A family of variable-metric methods derived by variational means.Mathematics of Computation, 24:23–26

work page 1970

[28] [28]

Goualard, F. (2005). Gaol: Not just another interval library.University of Nantes, France

work page 2005

[29] [29]

Gratton, S., Jerad, S., and Toint, P. L. (2025). Yet another fast variant of newton’s method for nonconvex optimization.IMA Journal of Numerical Analysis, 45(2):971–1008

work page 2025

[30] [30]

Guennebaud, G., Jacob, B., et al. (2010). Eigen v3. http://eigen.tuxfamily.org

work page 2010

[31] [31]

Hasan, M. F. (2018). An edge-concave underestimator for the global optimization of twice- differentiable nonconvex problems.Journal of Global Optimization, 71(4):735–752

work page 2018

[32] [32]

P., and Laird, C

Kang, J., Cao, Y., Word, D. P., and Laird, C. D. (2014). An interior-point method for efficient solu- tion of block-structured nlp problems using an implicit schur-complement decomposition.Computers & Chemical Engineering, 71:563–573

work page 2014

[33] [33]

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

Karimireddy, S. P., Stich, S. U., and Jaggi, M. (2018). Global linear convergence of newton’s method without strong-convexity or lipschitz gradients.CoRR, abs/1806.00413

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[35] [35]

and Grossmann, I

Li, C. and Grossmann, I. E. (2019). A generalized benders decomposition-based branch and cut algorithm for two-stage stochastic programs with nonconvex constraints and mixed-binary first and second stage variables.Journal of Global Optimization, 75(2):247–272

work page 2019

[36] [36]

Liu, Y., Gao, Y., and Yin, W. (2020). An improved analysis of stochastic gradient descent with momentum.Advances in Neural Information Processing Systems, 33:18261–18271

work page 2020

[37] [37]

Maranas, C. D. and Floudas, C. A. (1995). Finding all solutions of nonlinearly constrained systems of equations.Journal of Global Optimization, 7:143–182

work page 1995

[38] [38]

McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part i—convex underestimating problems.Mathematical programming, 10(1):147–175

work page 1976

[39] [39]

McMahan, H. B. and Streeter, M. (2010). Adaptive bound optimization for online convex optimiza- tion.arXiv preprint arXiv:1002.4908

work page internal anchor Pith review Pith/arXiv arXiv 2010

[40] [40]

Meyer, C. A. and Floudas, C. A. (2004a). Trilinear monomials with mixed sign domains: Facets of the convex and concave envelopes.Journal of Global Optimization, 29:125–155

work page

[41] [41]

Meyer, C. A. and Floudas, C. A. (2004b). Trilinear monomials with positive or negative domains: Facets of the convex and concave envelopes. InFrontiers in global optimization, pages 327–352. Springer. 23

work page

[42] [42]

Meyer, C. A. and Floudas, C. A. (2005). Convex envelopes for edge-concave functions.Mathematical programming, 103:207–224

work page 2005

[43] [43]

and Daoutidis, P

Mitrai, I. and Daoutidis, P. (2021). Efficient solution of enterprise-wide optimization problems using nested stochastic blockmodeling.Industrial & Engineering Chemistry Research, 60(40):14476–14494

work page 2021

[44] [44]

Mitsos, A., Chachuat, B., and Barton, P. I. (2009). Mccormick-based relaxations of algorithms. SIAM Journal on Optimization, 20(2):573–601

work page 2009

[45] [45]

Mitsos, A., Lemonidis, P., and Barton, P. I. (2008). Global solution of bilevel programs with a nonconvex inner program.Journal of Global Optimization, 42:475–513

work page 2008

[46] [46]

Montoison, A., Pacaud, F., Saunders, M., Shin, S., and Orban, D. (2025). Madncl: a gpu implementation of algorithm ncl for large-scale, degenerate nonlinear programs.arXiv preprint arXiv:2510.05885

work page arXiv 2025

[47] [47]

E., Kearfott, R

Moore, R. E., Kearfott, R. B., and Cloud, M. J. (2009).Introduction to interval analysis. SIAM

work page 2009

[48] [48]

Moré, J. J. and Wild, S. M. (2009). Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191

work page 2009

[49] [49]

and Kokame, H

Mori, T. and Kokame, H. (1994). Eigenvalue bounds for a certain class of interval matrices.IEICE transactions on fundamentals of electronics, communications and computer sciences, 77(10):1707– 1709

work page 1994

[50] [50]

Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate o(1/k2)

work page 1983

[51] [51]

Nesterov, Y. et al. (2018).Lectures on convex optimization, volume 137. Springer

work page 2018

[52] [52]

and Polyak, B

Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of newton method and its global performance.Mathematical programming, 108(1):177–205

work page 2006

[53] [53]

and Wright, S

Nocedal, J. and Wright, S. J. (1999).Numerical optimization. Springer

work page 1999

[54] [54]

A., and Anitescu, M

Pacaud, F., Schanen, M., Shin, S., Maldonado, D. A., and Anitescu, M. (2024). Parallel interior- point solver for block-structured nonlinear programs on simd/gpu architectures.Optimization Methods and Software, 39(4):874–897

work page 2024

[55] [55]

and Shin, S

Pacaud, F. and Shin, S. (2024). Gpu-accelerated dynamic nonlinear optimization with examodels and madnlp. In2024 IEEE 63rd Conference on Decision and Control (CDC), pages 5963–5968. IEEE

work page 2024

[56] [56]

and Sahinidis, N

Puranik, Y. and Sahinidis, N. (2017). Bounds tightening based on optimality conditions for non- convex box-constrained optimization.Journal of Global Optimization, 67

work page 2017

[57] [57]

Qian, N. (1999). On the momentum term in gradient descent learning algorithms.Neural networks, 12(1):145–151

work page 1999

[58] [58]

and Monro, S

Robbins, H. and Monro, S. (1951). A stochastic approximation method.The annals of mathematical statistics, pages 400–407

work page 1951

[59] [59]

Rohn, J. (1998). Bounds on eigenvalues of interval matrices.ZAMM-Zeitschrift fur Angewandte Mathematik und Mechanik, 78(3):S1049

work page 1998

[60] [60]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2016a). Sub-sampled newton methods i: Globally convergent algorithms

work page

[61] [61]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2016b). Sub-sampled newton methods ii: Local con- vergence rates

work page

[62] [62]

and Mahoney, M

Roosta-Khorasani, F. and Mahoney, M. W. (2019). Sub-sampled newton methods.Mathematical Programming, 174:293–326

work page 2019

[63] [63]

Ryoo, H. S. and Sahinidis, N. V. (2001). Analysis of bounds for multilinear functions.Journal of Global Optimization, 19(4):403–424. 24

work page 2001

[64] [64]

K., Stuber, M

Scott, J. K., Stuber, M. D., and Barton, P. I. (2011). Generalized mccormick relaxations.Journal of Global Optimization, 51(4):569–606

work page 2011

[65] [65]

Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization.Mathe- matics of Computation, 24:647–656

work page 1970

[66] [66]

(2004).On the existence of polyhedral convex envelopes

Tardella, F. (2004).On the existence of polyhedral convex envelopes. Springer

work page 2004

[67] [67]

and Sahinidis, N

Tawarmalani, M. and Sahinidis, N. V. (2001). Semidefinite relaxations of fractional programs via novel convexification techniques.Journal of Global Optimization, 20:133–154

work page 2001

[68] [68]

and Sahinidis, N

Tawarmalani, M. and Sahinidis, N. V. (2002). Convex extensions and envelopes of lower semi- continuous functions.Mathematical Programming, 93(2):247–263

work page 2002

[69] [69]

and Biegler, L

Yoshio, N. and Biegler, L. T. (2021). A nested schur decomposition approach for multiperiod optimization of chemical processes.Computers & Chemical Engineering, 155:107509

work page 2021

[70] [70]

Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. 25 Appendix A. Proofs This appendix provides proofs of Theorem 1 and Lemma 1 stated in Section 3.1 and 3.4.2, respectively. We prove the following results stated in Theorem 1: •Function value bound: |L(xt)−f(x k)| ≤ Lf 2 √nδ+ |λmin| 8 nδ2 •Gradient bound: ∥∇L(xt)− ∇f(x k)∥ ≤ Lg 2 √nδ+ |λmin...

work page 2012