Complexity of an inexact stochastic SQP algorithm for equality constrained optimization

Aoji Tang; Michael J. O'Neill

arxiv: 2604.14351 · v1 · submitted 2026-04-15 · 🧮 math.OC

Complexity of an inexact stochastic SQP algorithm for equality constrained optimization

Michael J. O'Neill , Aoji Tang This is my paper

Pith reviewed 2026-05-10 12:18 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic optimizationsequential quadratic programmingworst-case complexityequality constraintsconstraint qualificationsLagrangian gradientByrd-Omojukun decompositioninexact algorithms

0 comments

The pith

An inexact stochastic SQP algorithm achieves the first O(ε_c^{-2}) worst-case complexity for infeasibility without constraint qualifications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an inexact two-stepsize stochastic sequential quadratic programming algorithm for nonlinear problems with stochastic objectives and deterministic equality constraints. It uses a decomposition of the search direction to assign separate stepsizes to different components, allowing it to manage inexact stochastic gradient estimates. Under mild assumptions such as bounded variance, the method establishes O(ε_c^{-2}) complexity on the infeasibility measure with no constraint qualification required, which tightens to O(ε_c^{-1}) when the linear independence constraint qualification holds. It further delivers the optimal O(ε_L^{-4}) rate on the Lagrangian gradient in all cases, supplying the first complexity guarantees for the Byrd-Omojukun step decomposition approach.

Core claim

We propose an inexact two-stepsize stochastic SQP algorithm that decomposes the search direction and applies different stepsizes to its components to accommodate stochastic gradient estimates. Under mild assumptions including bounded variance of stochastic gradients and appropriate step-size choices, we establish the first known O(ε_c^{-2}) worst-case complexity with respect to the infeasibility measure when no constraint qualification is assumed and O(ε_c^{-1}) when LICQ holds. In addition, the method achieves the optimal O(ε_L^{-4}) complexity with respect to the gradient of the Lagrangian regardless of constraint qualifications, providing the first such guarantees for the Byrd-Omojukun de

What carries the argument

The inexact two-stepsize stochastic SQP algorithm with step decomposition strategy that assigns different stepsizes to the normal and tangential components of the search direction.

If this is right

Infeasibility is driven to zero at rate O(ε_c^{-2}) even without any constraint qualification.
Under LICQ the infeasibility measure improves to rate O(ε_c^{-1}).
The Lagrangian gradient is reduced at the optimal rate O(ε_L^{-4}) independently of constraint qualifications.
The Byrd-Omojukun step decomposition receives its first rigorous complexity justification for stochastic equality-constrained problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stepsize separation technique could be tested on problems that include inequality constraints via merit functions.
Similar decomposition ideas may transfer to other first-order stochastic methods that must respect deterministic constraints.
Adaptive or variance-reduced variants of the stepsize rules might be derived to relax the bounded-variance assumption.
The approach offers a concrete template for obtaining complexity results in stochastic settings where exact constraint satisfaction is required.

Load-bearing premise

The analysis assumes bounded variance of the stochastic gradients and appropriate step-size choices; if these fail to hold in practice the stated rates may not apply.

What would settle it

Implement the algorithm on a problem where stochastic gradient variance is unbounded and check whether the observed infeasibility reduction fails to meet the O(ε_c^{-2}) rate (or the O(ε_c^{-1}) rate under LICQ).

read the original abstract

In this paper, we consider nonlinear optimization problems with a stochastic objective function and deterministic equality constraints. We propose an inexact two-stepsize stochastic sequential quadratic programming (SQP) algorithm and analyze its worst-case complexity under mild assumptions. The method utilizes a step decomposition strategy and handles stochastic gradient estimates by assigning different stepsizes to different components of the search direction. We establish the first known $\mathcal{O}(\epsilon_c^{-2})$ worst-case complexity with respect to the infeasibility measure when no constraint qualification is assumed and a worst-case complexity of $\mathcal{O}(\epsilon_c^{-1})$ when LICQ holds, matching the best known result in the literature. In addition, under mild conditions, our method achieves the optimal $\mathcal{O}(\epsilon_L^{-4})$ complexity with respect to the gradient of the Lagrangian regardless of constraint qualifications. Our results provide the first complexity guarantees for the popular Byrd-Omojukun step decomposition strategy and verify its theoretical efficacy. Numerical experiments show that our algorithm has a superior infeasibility convergence performance and a competitive KKT convergence rate compared to the state-of-the-art stochastic SQP method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first complexity analysis for the Byrd-Omojukun decomposition inside an inexact stochastic SQP method, delivering O(ε_c^{-2}) on infeasibility without CQ and the optimal O(ε_L^{-4}) on the Lagrangian gradient.

read the letter

The central result is a set of worst-case bounds for an inexact two-stepsize stochastic SQP that splits the search direction via the Byrd-Omojukun strategy. Without any constraint qualification they obtain O(ε_c^{-2}) on the infeasibility measure; under LICQ this improves to O(ε_c^{-1}). They also recover the optimal O(ε_L^{-4}) rate on the Lagrangian gradient regardless of CQ. The analysis rests on standard bounded-variance assumptions for the stochastic gradients and a merit-function argument that tracks both feasibility and stationarity through the two separate step sizes. Numerical tests show faster infeasibility reduction than the existing stochastic SQP baseline while keeping comparable KKT progress. That combination of rates and the explicit treatment of the popular decomposition step is the genuinely new piece; prior stochastic SQP work either assumed exact gradients or used different direction strategies. The assumptions are mild and the derivation path looks standard rather than circular. The main soft spot is that the step-size sequences must satisfy explicit summability conditions; if an implementer cannot tune them reliably the stated rates become theoretical only. The inexactness handling is described at a high level in the abstract, so a referee would want to see the precise error tolerances and how they enter the potential-function decrease. Nothing in the claims appears over-stated or dependent on hidden regularity. This work is aimed at researchers who design or analyze constrained stochastic solvers and need concrete rates to compare algorithms. It is solid enough on novelty and grounding to deserve a full referee process rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an inexact two-stepsize stochastic SQP algorithm employing the Byrd-Omojukun decomposition for equality-constrained nonlinear programs with stochastic objectives. Under standard assumptions (bounded variance of stochastic gradients and suitable step-size sequences), it derives a worst-case complexity of O(ε_c^{-2}) on the infeasibility measure without any constraint qualification, O(ε_c^{-1}) under LICQ, and the optimal O(ε_L^{-4}) on the Lagrangian gradient stationarity measure independent of CQ. Numerical experiments are presented to illustrate practical performance relative to existing stochastic SQP methods.

Significance. If the derivations hold, the work supplies the first complexity guarantees for the Byrd-Omojukun step decomposition in the stochastic SQP setting. The rates match or improve upon existing literature while remaining parameter-free in the stated sense and are supported by reproducible numerical comparisons; this strengthens the theoretical foundation for practical stochastic constrained solvers.

major comments (2)

[analysis section] The central O(ε_c^{-2}) infeasibility bound without CQ (stated in the abstract and presumably proved in the main analysis) rests on a potential-function argument that must explicitly control the interaction between the two distinct step-size sequences and the stochastic gradient noise; any gap in the variance bound propagation would invalidate the claimed rate.
[analysis section] The claim that the O(ε_L^{-4}) Lagrangian-gradient rate holds 'regardless of constraint qualifications' requires verification that the Byrd-Omojukun decomposition does not implicitly invoke a regularity condition when the normal step is computed; the proof sketch in the abstract does not yet rule this out.

minor comments (2)

[introduction] The abstract asserts 'first known' results; the introduction should contain an explicit comparison table or paragraph citing the closest prior stochastic SQP complexity papers to substantiate novelty.
[numerical experiments] Numerical section: the reported 'superior infeasibility convergence' would be strengthened by including iteration counts or function evaluations in a table rather than qualitative statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We have revised the analysis section to provide additional details on the potential-function argument and to explicitly confirm that no constraint qualification is invoked in the Byrd-Omojukun decomposition or the Lagrangian-gradient analysis.

read point-by-point responses

Referee: [analysis section] The central O(ε_c^{-2}) infeasibility bound without CQ (stated in the abstract and presumably proved in the main analysis) rests on a potential-function argument that must explicitly control the interaction between the two distinct step-size sequences and the stochastic gradient noise; any gap in the variance bound propagation would invalidate the claimed rate.

Authors: We appreciate this observation. The proof of the O(ε_c^{-2}) bound (Theorem 3.1) employs the potential function Φ_k = f(x_k) + (1/2)ρ ||c(x_k)||^2, where the two step sizes α_k (tangential) and β_k (normal) are chosen to satisfy the summability conditions in (3.5)–(3.7). The cross terms arising from the decomposition and the stochastic noise are bounded using the bounded-variance assumption together with the relation ||d_k|| ≤ α_k ||tangential component|| + β_k ||normal component||. We have expanded the derivation in the revised Section 3.2 to display each variance-propagation step explicitly, confirming that no gaps exist and that the rate holds without any constraint qualification. revision: yes
Referee: [analysis section] The claim that the O(ε_L^{-4}) Lagrangian-gradient rate holds 'regardless of constraint qualifications' requires verification that the Byrd-Omojukun decomposition does not implicitly invoke a regularity condition when the normal step is computed; the proof sketch in the abstract does not yet rule this out.

Authors: The normal step is computed as the (inexact) solution of the least-squares problem min ||c(x_k) + J(x_k) d_n||, which is always well-defined for any Jacobian matrix and requires no rank or regularity assumption. The proof of the O(ε_L^{-4}) bound (Theorem 4.3) proceeds by relating the Lagrangian-gradient norm to the tangential and normal components via the decomposition identity, then applying the standard stochastic-gradient analysis; LICQ or MFCQ is never used. We have added a clarifying paragraph immediately after the statement of the decomposition in Section 2 and an explicit remark in the proof of Theorem 4.3 stating that the argument is independent of constraint qualifications. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives worst-case complexity bounds for its inexact stochastic SQP method via standard potential-function analysis under explicit stochastic assumptions (bounded variance, step-size rules). The O(ε_c^{-2}) infeasibility bound (no CQ) and O(ε_L^{-4}) Lagrangian-gradient bound follow directly from the two-stepsize Byrd-Omojukun decomposition and inexactness handling; no equation reduces to a fitted parameter renamed as prediction, no self-definitional loop appears, and no load-bearing self-citation collapses the result to its inputs. The analysis is externally falsifiable against the stated assumptions and does not smuggle ansatzes or rename known empirical patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions from stochastic optimization literature; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption Mild assumptions on problem functions, including Lipschitz continuity of gradients and bounded variance of stochastic gradient estimates.
These are invoked to derive the worst-case complexity bounds for the proposed algorithm.

pith-pipeline@v0.9.0 · 5492 in / 1277 out tokens · 57653 ms · 2026-05-10T12:18:51.199175+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Arjevani, Y

[1]Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, Lower bounds for non-convex stochastic optimization, Mathematical Programming, 199 (2023), pp. 165–214. [2]A. S. Berahas, R. Bollapragada, and B. Zhou,An adaptive sampling sequential qua- dratic programming method for equality constrained stochastic optimization, arXiv prep...

work page arXiv 2023
[2]

Bollapragada, C

[9]R. Bollapragada, C. Karamanli, B. Keith, B. Lazarov, S. Petrides, and J. W ang,An adaptive sampling augmented lagrangian method for stochastic optimization with determin- istic constraints, Computers & Mathematics with Applications, 149 (2023), pp. 239–258. [10]D. Boob, Q. Deng, and G. Lan,Level constrained first order methods for function constrained ...

work page 2023
[3]

[12]H. Chen, G. E. C. Flores, and C. Li,Physics-informed neural networks with hard linear equality constraints, Computers & Chemical Engineering, 189 (2024), p. 108764. [13]F. E. Curtis, D. P. Robinson, and B. Zhou,A stochastic inexact sequential quadratic opti- mization algorithm for nonlinear equality-constrained optimization, INFORMS Journal on Optimiz...

work page 2024
[4]

[25]M. J. O’Neill,A two stepsize sqp method for nonlinear equality constrained stochastic opti- mization, arXiv preprint arXiv:2408.16656, (2024). [26]Q. Shi, X. W ang, and H. W ang,A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization, Mathematics of Operations Research, (2025). [27]T. Steihaug,The conj...

work page arXiv 2024

[1] [1]

Arjevani, Y

[1]Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, Lower bounds for non-convex stochastic optimization, Mathematical Programming, 199 (2023), pp. 165–214. [2]A. S. Berahas, R. Bollapragada, and B. Zhou,An adaptive sampling sequential qua- dratic programming method for equality constrained stochastic optimization, arXiv prep...

work page arXiv 2023

[2] [2]

Bollapragada, C

[9]R. Bollapragada, C. Karamanli, B. Keith, B. Lazarov, S. Petrides, and J. W ang,An adaptive sampling augmented lagrangian method for stochastic optimization with determin- istic constraints, Computers & Mathematics with Applications, 149 (2023), pp. 239–258. [10]D. Boob, Q. Deng, and G. Lan,Level constrained first order methods for function constrained ...

work page 2023

[3] [3]

[12]H. Chen, G. E. C. Flores, and C. Li,Physics-informed neural networks with hard linear equality constraints, Computers & Chemical Engineering, 189 (2024), p. 108764. [13]F. E. Curtis, D. P. Robinson, and B. Zhou,A stochastic inexact sequential quadratic opti- mization algorithm for nonlinear equality-constrained optimization, INFORMS Journal on Optimiz...

work page 2024

[4] [4]

[25]M. J. O’Neill,A two stepsize sqp method for nonlinear equality constrained stochastic opti- mization, arXiv preprint arXiv:2408.16656, (2024). [26]Q. Shi, X. W ang, and H. W ang,A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization, Mathematics of Operations Research, (2025). [27]T. Steihaug,The conj...

work page arXiv 2024