Parallel Sequential Quadratic Programming with Overlapping Graph Decomposition and Exact Augmented Lagrangian

Haoxuan Wang; Mihai Anitescu; Runxin Ni; Sen Na; Sungho Shin

arxiv: 2402.17170 · v3 · pith:2V46QRUQnew · submitted 2024-02-27 · 🧮 math.OC

Parallel Sequential Quadratic Programming with Overlapping Graph Decomposition and Exact Augmented Lagrangian

Runxin Ni , Haoxuan Wang , Sen Na , Sungho Shin , Mihai Anitescu This is my paper

Pith reviewed 2026-05-24 03:54 UTC · model grok-4.3

classification 🧮 math.OC

keywords graph-structured nonlinear programsoverlapping graph decompositionsequential quadratic programmingaugmented Lagrangianglobal convergencelocal linear convergencePDE-constrained optimizationparallel optimization

0 comments

The pith

Overlapping graph decomposition in SQP produces descent directions for the exact augmented Lagrangian, ensuring global convergence of parallel solves for large graph-structured nonlinear programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a parallel method for solving large-scale graph-structured nonlinear programs by decomposing the graph with overlaps and computing approximate Newton directions within a sequential quadratic programming framework. It proves that these directions are descent directions for an exact augmented Lagrangian merit function, provided the overlaps are large enough, leading to global convergence. The local convergence rate improves exponentially as the overlap size increases. This approach applies to problems like PDE-constrained optimization and dynamic optimization, matching known guarantees for linear graphs. A sympathetic reader would care because it enables scalable, parallel computation for problems that arise in many applications without requiring special structure beyond graph topology.

Core claim

By leveraging overlapping graph decomposition to compute an approximate Newton direction in parallel, the method performs a backtracking line search on the exact augmented Lagrangian. Built on the exponential decay of sensitivity of gsNLPs, the approximate direction is shown to be a descent direction, yielding global convergence with local linear rate that improves exponentially in overlap size, for sufficiently large overlaps, under mild regularity conditions on the graph topology.

What carries the argument

overlapping graph decomposition to compute parallel approximate Newton directions, which remain descent directions for the exact augmented Lagrangian due to exponential sensitivity decay

If this is right

Global convergence holds when overlaps are sufficiently large.
The local linear convergence rate improves exponentially with increasing overlap size.
The guarantees extend to dynamic programs on linear graphs as a special case.
The method applies to PDE-constrained optimization, multi-stage stochastic optimization, and general network optimization.
Standard mild regularity conditions on graph topology suffice along with the sensitivity decay property.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar overlapping techniques might improve convergence in other parallel optimization algorithms for networked systems.
Testing the method on graphs with varying connectivity could reveal minimal overlap thresholds for different topologies.
Extensions to inexact solves or stochastic settings could further enhance scalability for very large problems.
The approach suggests that sensitivity decay properties can be exploited more broadly in decomposition methods for structured optimization.

Load-bearing premise

The exponential decay of sensitivity of the graph-structured nonlinear programs with respect to the graph topology ensures that the approximate Newton direction remains a descent direction for the merit function.

What would settle it

A counterexample gsNLP on a graph where sensitivity does not decay exponentially, for which the method with any finite overlap fails to produce a descent direction or converge globally.

read the original abstract

In this paper, we address the challenge of solving large-scale graph-structured nonlinear programs (gsNLPs) in a scalable manner. GsNLPs are problems in which the objective and constraint functions are associated with nodes on a graph and depend on the variables of adjacent nodes. This graph-structured formulation encompasses various specific instances, such as dynamic optimization, PDE-constrained optimization, multi-stage stochastic optimization, and general network optimization. By leveraging the sequential quadratic programming (SQP) framework, we propose a globally convergent overlapping graph decomposition method to solve large-scale gsNLPs under standard mild regularity conditions on the graph topology. In each iteration, we perform an overlapping graph decomposition to compute an approximate Newton direction in a parallel environment. Then, we select a suitable stepsize and update the primal-dual iterate by performing a backtracking line search on an exact augmented Lagrangian merit function. Built on the exponential decay of sensitivity of gsNLPs, we show that the approximate Newton direction is a descent direction of the augmented Lagrangian, which leads to global convergence with local linear convergence rate. In particular, global convergence is achieved for sufficiently large overlaps, and the local linear convergence rate improves exponentially in terms of the overlap size. Our results match existing state-of-the-art guarantees established for dynamic programs (which simply correspond to linear graphs). We validate the theory on two PDE-constrained problems: a semilinear elliptic problem and a boundary heating problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper extends SQP with overlapping graph decompositions to general graphs and proves global convergence plus exponential local rate gains from larger overlaps.

read the letter

The main point is that this paper gives a parallel SQP method for graph-structured nonlinear programs on general graphs. It uses overlapping decompositions to compute approximate Newton steps in parallel and proves global convergence via line search on the exact augmented Lagrangian when the overlap is large enough. The local linear convergence rate improves exponentially with overlap size. This extends prior work that only handled linear graphs from dynamic programming. The new element is showing the exponential dependence for arbitrary graph topologies using the sensitivity decay property. They back this up with theory under mild regularity conditions and test it on a semilinear elliptic PDE and a boundary heating problem. The analysis holds up as long as the exponential decay of sensitivities is accepted, which follows from the graph structure. No circularity or major gaps stand out in the argument. One minor issue is that the paper focuses on the theory and does not explore how overlap size affects runtime or communication in practice, or give guidance on selecting the overlap. The numerical examples are there but limited. This is worth attention for people building solvers for PDE-constrained or network optimization. It would be useful to readers interested in parallel methods with convergence guarantees. I think it should go to peer review.

Referee Report

0 major / 4 minor

Summary. The paper proposes a parallel SQP algorithm for graph-structured nonlinear programs (gsNLPs) that uses overlapping graph decomposition to compute approximate Newton directions in parallel. Global convergence is established via a backtracking line search on an exact augmented Lagrangian merit function, under mild graph-topology regularity conditions; the local linear convergence rate is shown to improve exponentially with overlap size. The results recover known guarantees for dynamic programs on linear graphs and are illustrated on two PDE-constrained test problems.

Significance. If the central convergence argument holds, the work supplies the first general-graph extension of overlap-based parallel SQP with both global convergence and an explicit exponential dependence of the local rate on overlap size. This is a substantive theoretical advance for scalable structured optimization, directly applicable to dynamic, stochastic, and PDE-constrained problems.

minor comments (4)

The abstract and introduction state that global convergence holds 'for sufficiently large overlaps,' but the precise quantitative threshold (in terms of graph diameter or spectral radius) is not stated explicitly; adding a corollary or remark that isolates this threshold would improve readability.
Notation for the overlapping subproblems (e.g., the definition of the local gsNLP on each subgraph) should be introduced once in §2 and used consistently; several later sections re-define the same quantities with slightly different symbols.
Figure 1 (schematic of the overlapping decomposition) would benefit from an explicit legend indicating which nodes belong to the overlap sets; the current caption is insufficient for readers unfamiliar with the graph notation.
The numerical section reports iteration counts and CPU times but does not tabulate the observed linear rates versus overlap size; adding a small table or plot of rate versus overlap would directly support the exponential-improvement claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation relies on the exponential decay of sensitivities in gsNLPs, which the paper states follows directly from the graph structure under mild regularity conditions on topology. This property is invoked to establish that the approximate Newton direction is a descent direction for the exact augmented Lagrangian, enabling the standard SQP line-search convergence argument. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided chain; the central claims remain independent of the target results once the graph-induced decay is granted. The analysis is internally consistent with classical SQP theory and matches known results for linear graphs without reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions about the problem class; no free parameters or new postulated entities are introduced in the abstract.

axioms (2)

domain assumption exponential decay of sensitivity of gsNLPs with respect to distant nodes
Invoked to establish that the parallel approximate Newton direction is a descent direction for the augmented Lagrangian merit function.
domain assumption standard mild regularity conditions on the graph topology
Required to guarantee global convergence for sufficiently large overlaps.

pith-pipeline@v0.9.0 · 5795 in / 1489 out tokens · 41855 ms · 2026-05-24T03:54:04.779649+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Antil, D

H. Antil, D. P. Kouri, and D. Ridzal , Alesqp: An augmented lagrangian equality-constrained sqp method for optimization with general constraints , SIAM Journal on Optimization, 33 (2023), pp. 237–266

work page 2023
[2]

Beccuti, T

A. Beccuti, T. Geyer, and M. Morari , Temporal Lagrangian decomposition of model predictive control for hybrid systems , in 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601), IEEE, 2004

work page 2004
[3]

D. P. Bertsekas, Constrained optimization and Lagrange multiplier methods, no. 4 in Athena scientific optimization and computation series, Athena Scientific, Belmont, Mass., 1996

work page 1996
[4]

L. T. Biegler , Nonlinear Programming, Society for Industrial and Applied Mathematics, Jan. 2010

work page 2010
[5]

L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen W aanders , Large-Scale PDE-Constrained Optimization: An Introduction , Springer Berlin Heidelberg, 2003

work page 2003
[6]

T. F. Chan and B. F. Smith , Domain decomposition and multigrid algorithms for elliptic problems on unstructured meshes, Contemporary Mathematics, 180 (1994), pp. 175–175

work page 1994
[7]

T. F. Chan and J. Zou , Additive schwarz domain decomposition methods for elliptic problems on unstructured meshes, Numerical Algorithms, 8 (1994), pp. 329–346. 24 RUNXIN NI, SEN NA, SUNGHO SHIN, AND MIHAI ANITESCU

work page 1994
[8]

J. C. Dunn and D. P. Bertsekas , Efficient dynamic programming implementations of newton’s method for unconstrained optimal control problems , Journal of Optimization Theory and Appli- cations, 63 (1989), pp. 23–38

work page 1989
[9]

Dunning, J

I. Dunning, J. Huchette, and M. Lubin , Jump: A modeling language for mathematical optimization , SIAM Review, 59 (2017), pp. 295–320

work page 2017
[10]

Ghadimi, A

E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson , Optimal parameter selection for the al- ternating direction method of multipliers (ADMM): Quadratic problems , IEEE Transactions on Automatic Control, 60 (2015), pp. 644–658

work page 2015
[11]

N. I. M. Gould and P. L. Toint , SQP Methods for Large-Scale Nonlinear Programming , Springer US, 2000, pp. 149–178

work page 2000
[12]

I. Hong, S. Na, M. W. Mahoney, and M. Kolar , Constrained optimization via exact augmented lagrangian and randomized iterative sketching , International Conference on Machine Learning, (2023)

work page 2023
[13]

J. R. Jackson and I. E. Grossmann , Temporal decomposition scheme for nonlinear multisite produc- tion planning and distribution models , Industrial & Engineering Chemistry Research, 42 (2003), pp. 3045–3055

work page 2003
[14]

Kozma, C

A. Kozma, C. Conte, and M. Diehl , Benchmarking large-scale distributed convex quadratic pro- gramming algorithms, Optimization Methods and Software, 30 (2014), pp. 191–214

work page 2014
[15]

Na , Global convergence of online optimization for nonlinear model predictive control , Advances in Neural Information Processing Systems, 34 (2021), pp

S. Na , Global convergence of online optimization for nonlinear model predictive control , Advances in Neural Information Processing Systems, 34 (2021), pp. 12441–12453

work page 2021
[16]

Na and M

S. Na and M. Anitescu , Exponential decay in the sensitivity analysis of nonlinear dynamic program- ming, SIAM Journal on Optimization, 30 (2020), pp. 1527–1554

work page 2020
[17]

Na and M

S. Na and M. Anitescu , Superconvergence of online optimization for model predictive control, IEEE Transactions on Automatic Control, 68 (2023), pp. 1383–1398

work page 2023
[18]

S. Na, M. Anitescu, and M. Kolar , An adaptive stochastic sequential quadratic programming with differentiable exact augmented lagrangians, Mathematical Programming, 199 (2022), pp. 721–791

work page 2022
[19]

S. Na, M. Anitescu, and M. Kolar , A fast temporal decomposition procedure for long-horizon nonlinear dynamic programming, Mathematics of Operations Research, (2023)

work page 2023
[20]

S. Na, S. Shin, M. Anitescu, and V. M. Zavala , On the convergence of overlapping schwarz de- composition for nonlinear optimal control , IEEE Transactions on Automatic Control, 67 (2022), pp. 5996–6011

work page 2022
[21]

Nocedal and S

J. Nocedal and S. J. Wright , Numerical optimization, Springer, 2006

work page 2006
[22]

M. V. F. Pereira and L. M. V. G. Pinto , Multi-stage stochastic optimization applied to energy planning, Mathematical Programming, 52 (1991), pp. 359–375

work page 1991
[23]

S. Shin, M. Anitescu, and V. M. Zavala , Overlapping Schwarz decomposition for constrained qua- dratic programs, in 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2020

work page 2020
[24]

S. Shin, M. Anitescu, and V. M. Zavala , Exponential decay of sensitivity in graph-structured non- linear programs, SIAM Journal on Optimization, 32 (2022), pp. 1156–1183

work page 2022
[25]

S. Shin, Y. Lin, G. Qu, A. Wierman, and M. Anitescu , Near-optimal distributed linear-quadratic regulator for networked systems, SIAM Journal on Control and Optimization, 61 (2023), pp. 1113– 1135

work page 2023
[26]

S. Shin, V. M. Zavala, and M. Anitescu , Decentralized schemes with overlap for solving graph- structured optimization problems , IEEE Transactions on Control of Network Systems, 7 (2020), pp. 1225–1236

work page 2020
[27]

W ¨achter and L

A. W ¨achter and L. T. Biegler , On the implementation of an interior-point filter line-search algo- rithm for large-scale nonlinear programming, Mathematical Programming, 106 (2005), pp. 25–57

work page 2005
[28]

Xu and M

W. Xu and M. Anitescu , Exponentially accurate temporal decomposition for long-horizon linear- quadratic dynamic optimization , SIAM Journal on Optimization, 28 (2018), pp. 2541–2573

work page 2018
[29]

V. M. Zavala , New architectures for hierarchical predictive control , IFAC-PapersOnLine, 49 (2016), pp. 43–48

work page 2016
[30]

Zlotnik, M

A. Zlotnik, M. Chertkov, and S. Backhaus , Optimal control of transient flow in natural gas networks, in 2015 54th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2015. Government License: The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy ...

work page 2015

[1] [1]

Antil, D

H. Antil, D. P. Kouri, and D. Ridzal , Alesqp: An augmented lagrangian equality-constrained sqp method for optimization with general constraints , SIAM Journal on Optimization, 33 (2023), pp. 237–266

work page 2023

[2] [2]

Beccuti, T

A. Beccuti, T. Geyer, and M. Morari , Temporal Lagrangian decomposition of model predictive control for hybrid systems , in 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601), IEEE, 2004

work page 2004

[3] [3]

D. P. Bertsekas, Constrained optimization and Lagrange multiplier methods, no. 4 in Athena scientific optimization and computation series, Athena Scientific, Belmont, Mass., 1996

work page 1996

[4] [4]

L. T. Biegler , Nonlinear Programming, Society for Industrial and Applied Mathematics, Jan. 2010

work page 2010

[5] [5]

L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen W aanders , Large-Scale PDE-Constrained Optimization: An Introduction , Springer Berlin Heidelberg, 2003

work page 2003

[6] [6]

T. F. Chan and B. F. Smith , Domain decomposition and multigrid algorithms for elliptic problems on unstructured meshes, Contemporary Mathematics, 180 (1994), pp. 175–175

work page 1994

[7] [7]

T. F. Chan and J. Zou , Additive schwarz domain decomposition methods for elliptic problems on unstructured meshes, Numerical Algorithms, 8 (1994), pp. 329–346. 24 RUNXIN NI, SEN NA, SUNGHO SHIN, AND MIHAI ANITESCU

work page 1994

[8] [8]

J. C. Dunn and D. P. Bertsekas , Efficient dynamic programming implementations of newton’s method for unconstrained optimal control problems , Journal of Optimization Theory and Appli- cations, 63 (1989), pp. 23–38

work page 1989

[9] [9]

Dunning, J

I. Dunning, J. Huchette, and M. Lubin , Jump: A modeling language for mathematical optimization , SIAM Review, 59 (2017), pp. 295–320

work page 2017

[10] [10]

Ghadimi, A

E. Ghadimi, A. Teixeira, I. Shames, and M. Johansson , Optimal parameter selection for the al- ternating direction method of multipliers (ADMM): Quadratic problems , IEEE Transactions on Automatic Control, 60 (2015), pp. 644–658

work page 2015

[11] [11]

N. I. M. Gould and P. L. Toint , SQP Methods for Large-Scale Nonlinear Programming , Springer US, 2000, pp. 149–178

work page 2000

[12] [12]

I. Hong, S. Na, M. W. Mahoney, and M. Kolar , Constrained optimization via exact augmented lagrangian and randomized iterative sketching , International Conference on Machine Learning, (2023)

work page 2023

[13] [13]

J. R. Jackson and I. E. Grossmann , Temporal decomposition scheme for nonlinear multisite produc- tion planning and distribution models , Industrial & Engineering Chemistry Research, 42 (2003), pp. 3045–3055

work page 2003

[14] [14]

Kozma, C

A. Kozma, C. Conte, and M. Diehl , Benchmarking large-scale distributed convex quadratic pro- gramming algorithms, Optimization Methods and Software, 30 (2014), pp. 191–214

work page 2014

[15] [15]

Na , Global convergence of online optimization for nonlinear model predictive control , Advances in Neural Information Processing Systems, 34 (2021), pp

S. Na , Global convergence of online optimization for nonlinear model predictive control , Advances in Neural Information Processing Systems, 34 (2021), pp. 12441–12453

work page 2021

[16] [16]

Na and M

S. Na and M. Anitescu , Exponential decay in the sensitivity analysis of nonlinear dynamic program- ming, SIAM Journal on Optimization, 30 (2020), pp. 1527–1554

work page 2020

[17] [17]

Na and M

S. Na and M. Anitescu , Superconvergence of online optimization for model predictive control, IEEE Transactions on Automatic Control, 68 (2023), pp. 1383–1398

work page 2023

[18] [18]

S. Na, M. Anitescu, and M. Kolar , An adaptive stochastic sequential quadratic programming with differentiable exact augmented lagrangians, Mathematical Programming, 199 (2022), pp. 721–791

work page 2022

[19] [19]

S. Na, M. Anitescu, and M. Kolar , A fast temporal decomposition procedure for long-horizon nonlinear dynamic programming, Mathematics of Operations Research, (2023)

work page 2023

[20] [20]

S. Na, S. Shin, M. Anitescu, and V. M. Zavala , On the convergence of overlapping schwarz de- composition for nonlinear optimal control , IEEE Transactions on Automatic Control, 67 (2022), pp. 5996–6011

work page 2022

[21] [21]

Nocedal and S

J. Nocedal and S. J. Wright , Numerical optimization, Springer, 2006

work page 2006

[22] [22]

M. V. F. Pereira and L. M. V. G. Pinto , Multi-stage stochastic optimization applied to energy planning, Mathematical Programming, 52 (1991), pp. 359–375

work page 1991

[23] [23]

S. Shin, M. Anitescu, and V. M. Zavala , Overlapping Schwarz decomposition for constrained qua- dratic programs, in 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2020

work page 2020

[24] [24]

S. Shin, M. Anitescu, and V. M. Zavala , Exponential decay of sensitivity in graph-structured non- linear programs, SIAM Journal on Optimization, 32 (2022), pp. 1156–1183

work page 2022

[25] [25]

S. Shin, Y. Lin, G. Qu, A. Wierman, and M. Anitescu , Near-optimal distributed linear-quadratic regulator for networked systems, SIAM Journal on Control and Optimization, 61 (2023), pp. 1113– 1135

work page 2023

[26] [26]

S. Shin, V. M. Zavala, and M. Anitescu , Decentralized schemes with overlap for solving graph- structured optimization problems , IEEE Transactions on Control of Network Systems, 7 (2020), pp. 1225–1236

work page 2020

[27] [27]

W ¨achter and L

A. W ¨achter and L. T. Biegler , On the implementation of an interior-point filter line-search algo- rithm for large-scale nonlinear programming, Mathematical Programming, 106 (2005), pp. 25–57

work page 2005

[28] [28]

Xu and M

W. Xu and M. Anitescu , Exponentially accurate temporal decomposition for long-horizon linear- quadratic dynamic optimization , SIAM Journal on Optimization, 28 (2018), pp. 2541–2573

work page 2018

[29] [29]

V. M. Zavala , New architectures for hierarchical predictive control , IFAC-PapersOnLine, 49 (2016), pp. 43–48

work page 2016

[30] [30]

Zlotnik, M

A. Zlotnik, M. Chertkov, and S. Backhaus , Optimal control of transient flow in natural gas networks, in 2015 54th IEEE Conference on Decision and Control (CDC), IEEE, Dec. 2015. Government License: The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy ...

work page 2015