Complexity of an inexact stochastic SQP algorithm for equality constrained optimization
Pith reviewed 2026-05-10 12:18 UTC · model grok-4.3
The pith
An inexact stochastic SQP algorithm achieves the first O(ε_c^{-2}) worst-case complexity for infeasibility without constraint qualifications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an inexact two-stepsize stochastic SQP algorithm that decomposes the search direction and applies different stepsizes to its components to accommodate stochastic gradient estimates. Under mild assumptions including bounded variance of stochastic gradients and appropriate step-size choices, we establish the first known O(ε_c^{-2}) worst-case complexity with respect to the infeasibility measure when no constraint qualification is assumed and O(ε_c^{-1}) when LICQ holds. In addition, the method achieves the optimal O(ε_L^{-4}) complexity with respect to the gradient of the Lagrangian regardless of constraint qualifications, providing the first such guarantees for the Byrd-Omojukun de
What carries the argument
The inexact two-stepsize stochastic SQP algorithm with step decomposition strategy that assigns different stepsizes to the normal and tangential components of the search direction.
If this is right
- Infeasibility is driven to zero at rate O(ε_c^{-2}) even without any constraint qualification.
- Under LICQ the infeasibility measure improves to rate O(ε_c^{-1}).
- The Lagrangian gradient is reduced at the optimal rate O(ε_L^{-4}) independently of constraint qualifications.
- The Byrd-Omojukun step decomposition receives its first rigorous complexity justification for stochastic equality-constrained problems.
Where Pith is reading between the lines
- The same stepsize separation technique could be tested on problems that include inequality constraints via merit functions.
- Similar decomposition ideas may transfer to other first-order stochastic methods that must respect deterministic constraints.
- Adaptive or variance-reduced variants of the stepsize rules might be derived to relax the bounded-variance assumption.
- The approach offers a concrete template for obtaining complexity results in stochastic settings where exact constraint satisfaction is required.
Load-bearing premise
The analysis assumes bounded variance of the stochastic gradients and appropriate step-size choices; if these fail to hold in practice the stated rates may not apply.
What would settle it
Implement the algorithm on a problem where stochastic gradient variance is unbounded and check whether the observed infeasibility reduction fails to meet the O(ε_c^{-2}) rate (or the O(ε_c^{-1}) rate under LICQ).
read the original abstract
In this paper, we consider nonlinear optimization problems with a stochastic objective function and deterministic equality constraints. We propose an inexact two-stepsize stochastic sequential quadratic programming (SQP) algorithm and analyze its worst-case complexity under mild assumptions. The method utilizes a step decomposition strategy and handles stochastic gradient estimates by assigning different stepsizes to different components of the search direction. We establish the first known $\mathcal{O}(\epsilon_c^{-2})$ worst-case complexity with respect to the infeasibility measure when no constraint qualification is assumed and a worst-case complexity of $\mathcal{O}(\epsilon_c^{-1})$ when LICQ holds, matching the best known result in the literature. In addition, under mild conditions, our method achieves the optimal $\mathcal{O}(\epsilon_L^{-4})$ complexity with respect to the gradient of the Lagrangian regardless of constraint qualifications. Our results provide the first complexity guarantees for the popular Byrd-Omojukun step decomposition strategy and verify its theoretical efficacy. Numerical experiments show that our algorithm has a superior infeasibility convergence performance and a competitive KKT convergence rate compared to the state-of-the-art stochastic SQP method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an inexact two-stepsize stochastic SQP algorithm employing the Byrd-Omojukun decomposition for equality-constrained nonlinear programs with stochastic objectives. Under standard assumptions (bounded variance of stochastic gradients and suitable step-size sequences), it derives a worst-case complexity of O(ε_c^{-2}) on the infeasibility measure without any constraint qualification, O(ε_c^{-1}) under LICQ, and the optimal O(ε_L^{-4}) on the Lagrangian gradient stationarity measure independent of CQ. Numerical experiments are presented to illustrate practical performance relative to existing stochastic SQP methods.
Significance. If the derivations hold, the work supplies the first complexity guarantees for the Byrd-Omojukun step decomposition in the stochastic SQP setting. The rates match or improve upon existing literature while remaining parameter-free in the stated sense and are supported by reproducible numerical comparisons; this strengthens the theoretical foundation for practical stochastic constrained solvers.
major comments (2)
- [analysis section] The central O(ε_c^{-2}) infeasibility bound without CQ (stated in the abstract and presumably proved in the main analysis) rests on a potential-function argument that must explicitly control the interaction between the two distinct step-size sequences and the stochastic gradient noise; any gap in the variance bound propagation would invalidate the claimed rate.
- [analysis section] The claim that the O(ε_L^{-4}) Lagrangian-gradient rate holds 'regardless of constraint qualifications' requires verification that the Byrd-Omojukun decomposition does not implicitly invoke a regularity condition when the normal step is computed; the proof sketch in the abstract does not yet rule this out.
minor comments (2)
- [introduction] The abstract asserts 'first known' results; the introduction should contain an explicit comparison table or paragraph citing the closest prior stochastic SQP complexity papers to substantiate novelty.
- [numerical experiments] Numerical section: the reported 'superior infeasibility convergence' would be strengthened by including iteration counts or function evaluations in a table rather than qualitative statements.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We have revised the analysis section to provide additional details on the potential-function argument and to explicitly confirm that no constraint qualification is invoked in the Byrd-Omojukun decomposition or the Lagrangian-gradient analysis.
read point-by-point responses
-
Referee: [analysis section] The central O(ε_c^{-2}) infeasibility bound without CQ (stated in the abstract and presumably proved in the main analysis) rests on a potential-function argument that must explicitly control the interaction between the two distinct step-size sequences and the stochastic gradient noise; any gap in the variance bound propagation would invalidate the claimed rate.
Authors: We appreciate this observation. The proof of the O(ε_c^{-2}) bound (Theorem 3.1) employs the potential function Φ_k = f(x_k) + (1/2)ρ ||c(x_k)||^2, where the two step sizes α_k (tangential) and β_k (normal) are chosen to satisfy the summability conditions in (3.5)–(3.7). The cross terms arising from the decomposition and the stochastic noise are bounded using the bounded-variance assumption together with the relation ||d_k|| ≤ α_k ||tangential component|| + β_k ||normal component||. We have expanded the derivation in the revised Section 3.2 to display each variance-propagation step explicitly, confirming that no gaps exist and that the rate holds without any constraint qualification. revision: yes
-
Referee: [analysis section] The claim that the O(ε_L^{-4}) Lagrangian-gradient rate holds 'regardless of constraint qualifications' requires verification that the Byrd-Omojukun decomposition does not implicitly invoke a regularity condition when the normal step is computed; the proof sketch in the abstract does not yet rule this out.
Authors: The normal step is computed as the (inexact) solution of the least-squares problem min ||c(x_k) + J(x_k) d_n||, which is always well-defined for any Jacobian matrix and requires no rank or regularity assumption. The proof of the O(ε_L^{-4}) bound (Theorem 4.3) proceeds by relating the Lagrangian-gradient norm to the tangential and normal components via the decomposition identity, then applying the standard stochastic-gradient analysis; LICQ or MFCQ is never used. We have added a clarifying paragraph immediately after the statement of the decomposition in Section 2 and an explicit remark in the proof of Theorem 4.3 stating that the argument is independent of constraint qualifications. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives worst-case complexity bounds for its inexact stochastic SQP method via standard potential-function analysis under explicit stochastic assumptions (bounded variance, step-size rules). The O(ε_c^{-2}) infeasibility bound (no CQ) and O(ε_L^{-4}) Lagrangian-gradient bound follow directly from the two-stepsize Byrd-Omojukun decomposition and inexactness handling; no equation reduces to a fitted parameter renamed as prediction, no self-definitional loop appears, and no load-bearing self-citation collapses the result to its inputs. The analysis is externally falsifiable against the stated assumptions and does not smuggle ansatzes or rename known empirical patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mild assumptions on problem functions, including Lipschitz continuity of gradients and bounded variance of stochastic gradient estimates.
Reference graph
Works this paper leans on
-
[1]
[1]Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, Lower bounds for non-convex stochastic optimization, Mathematical Programming, 199 (2023), pp. 165–214. [2]A. S. Berahas, R. Bollapragada, and B. Zhou,An adaptive sampling sequential qua- dratic programming method for equality constrained stochastic optimization, arXiv prep...
-
[2]
[9]R. Bollapragada, C. Karamanli, B. Keith, B. Lazarov, S. Petrides, and J. W ang,An adaptive sampling augmented lagrangian method for stochastic optimization with determin- istic constraints, Computers & Mathematics with Applications, 149 (2023), pp. 239–258. [10]D. Boob, Q. Deng, and G. Lan,Level constrained first order methods for function constrained ...
work page 2023
-
[3]
[12]H. Chen, G. E. C. Flores, and C. Li,Physics-informed neural networks with hard linear equality constraints, Computers & Chemical Engineering, 189 (2024), p. 108764. [13]F. E. Curtis, D. P. Robinson, and B. Zhou,A stochastic inexact sequential quadratic opti- mization algorithm for nonlinear equality-constrained optimization, INFORMS Journal on Optimiz...
work page 2024
-
[4]
[25]M. J. O’Neill,A two stepsize sqp method for nonlinear equality constrained stochastic opti- mization, arXiv preprint arXiv:2408.16656, (2024). [26]Q. Shi, X. W ang, and H. W ang,A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization, Mathematics of Operations Research, (2025). [27]T. Steihaug,The conj...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.