pith. sign in

arxiv: 2606.28230 · v1 · pith:UMFL6UDInew · submitted 2026-06-26 · 🧮 math.OC

A Fletcher's Augmented Lagrangian-Based Stochastic First-Order Method for Nonconvex Equality-Constrained Optimization

Pith reviewed 2026-06-29 02:47 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic optimizationnonconvex equality constraintsaugmented Lagrangianfirst-order methodsoracle complexityKKT pointJacobian singularity
0
0 comments X

The pith

A stochastic first-order method using Fletcher's augmented Lagrangian reaches a stochastic ε-KKT point with expected oracle complexity O(ε^{-3}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses nonconvex equality-constrained optimization when only stochastic first-order information on the objective and constraints is available. Existing approaches often suffer high costs from repeated stochastic constraint evaluations. The new method uses a decomposed stochastic search direction together with Fletcher's augmented Lagrangian as a merit function for step-size selection. An event decomposition keyed to the smallest singular value of the stochastic Jacobian controls direction perturbations when nondegeneracy fails. Under the added assumption that second derivatives are Lipschitz, the algorithm delivers the target accuracy with total expected evaluations scaling as O(ε^{-3}).

Core claim

Under an additional Lipschitz continuity assumption on the second-order derivatives of the objective and constraint functions, the proposed algorithm attains a stochastic ε-KKT point with an expected total oracle complexity of O(ε^{-3}) in terms of stochastic gradient and stochastic constraint function evaluations.

What carries the argument

Decomposed stochastic search direction combined with Fletcher's augmented Lagrangian merit function, controlled by an event decomposition on the smallest singular value of the stochastic Jacobian.

If this is right

  • The method attains the stochastic ε-KKT point while keeping combined gradient and constraint evaluations at O(ε^{-3}).
  • The event decomposition prevents loss of uniform Jacobian nondegeneracy from inflating the search-direction error.
  • Step-size selection via the smooth merit function works with the stochastic first-order oracles.
  • Numerical experiments are used to illustrate practical performance on the target problem class.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same singular-value event split could be tested on problems whose Jacobians exhibit frequent near-singularity.
  • Direct comparison of the observed rate against deterministic augmented-Lagrangian methods would quantify the extra stochastic cost.
  • Variance-reduction techniques applied on top of the decomposed direction might improve the leading constant in the complexity bound.

Load-bearing premise

The second-order derivatives of the objective and constraints must be Lipschitz continuous so that the event decomposition can bound perturbations in the search direction.

What would settle it

A problem instance in which second derivatives are not Lipschitz continuous yet the observed total oracle count exceeds O(ε^{-3}) would falsify the complexity claim.

Figures

Figures reproduced from arXiv: 2606.28230 by Qiankun Shi, Xiantao Xiao, Xiao Wang, Yawen Cui.

Figure 1
Figure 1. Figure 1: Box plots of stationarity, feasibility, and KKT residuals under different noise levels. [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of FSFO, SLQPM, SSQP and Stoc-iALM on the [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of FSFO, SLQPM, SSQP and Stoc-iALM on the [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
read the original abstract

In this paper, we study nonconvex equality-constrained optimization problems in which only stochastic first-order approximations of the objective and constraint functions are available. Owing to the stochasticity in both objective and constraints, most existing stochastic first-order methods incur relatively high oracle complexity, particularly in terms of stochastic constraint function evaluations. To address this issue, we develop a stochastic first-order method based on a decomposed stochastic search direction, and employ Fletcher's augmented Lagrangian as a smooth merit function for step-size selection. To cope with the possible loss of uniform nondegeneracy of the stochastic Jacobian, we introduce an event decomposition based on the smallest singular value, which enables us to control perturbations in the stochastic search direction. Under an additional Lipschitz continuity assumption on the second-order derivatives of the objective and constraint functions, we show that the proposed algorithm attains a stochastic \(\epsilon\)-KKT point with an expected total oracle complexity of \(\mathcal O(\epsilon^{-3})\) in terms of stochastic gradient and stochastic constraint function evaluations. Finally, we present numerical experiments to demonstrate the performance of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a stochastic first-order method for nonconvex equality-constrained optimization. It uses a decomposed stochastic search direction together with Fletcher's augmented Lagrangian as a merit function for step-size selection. An event decomposition conditioned on the smallest singular value of the stochastic Jacobian is introduced to control perturbations when uniform nondegeneracy may fail. Under an additional Lipschitz continuity assumption on the second derivatives of the objective and constraints, the algorithm is shown to reach a stochastic ε-KKT point with expected total oracle complexity O(ε^{-3}) in stochastic gradient and constraint evaluations.

Significance. If the O(ε^{-3}) bound is rigorously established without hidden probability assumptions on the Jacobian, the result would improve upon existing stochastic first-order methods for constrained nonconvex problems by reducing the number of constraint evaluations. The event-decomposition technique for handling loss of nondegeneracy is a potentially useful device, but its contribution to the complexity depends on whether the bad-event contribution is controlled tightly enough under first-order oracles alone.

major comments (2)
  1. [Abstract] Abstract (paragraph on method development): the event decomposition based on the smallest singular value of the stochastic Jacobian is introduced to control perturbations in the search direction. The description does not specify the measure of the bad set or the decay rate of its probability; without an explicit bound showing that the probability-weighted contribution of the bad event remains compatible with O(ε^{-3}) total calls, the central complexity claim cannot be verified from the given information.
  2. [Abstract] Abstract: the Lipschitz continuity assumption on second-order derivatives is invoked after the event decomposition is stated. It is unclear whether this assumption is used only for the good event or whether it is also needed to bound the bad-event term; if the latter, the claim that the decomposition alone 'enables us to control perturbations' is not yet supported.
minor comments (1)
  1. [Abstract] The term 'stochastic ε-KKT point' is used without an explicit definition in the abstract; a precise definition should appear early in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. The observations focus on the presentation in the abstract, and we address each point below. We believe the concerns can be resolved by targeted revisions to the abstract that better highlight the technical details already established in the body of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on method development): the event decomposition based on the smallest singular value of the stochastic Jacobian is introduced to control perturbations in the search direction. The description does not specify the measure of the bad set or the decay rate of its probability; without an explicit bound showing that the probability-weighted contribution of the bad event remains compatible with O(ε^{-3}) total calls, the central complexity claim cannot be verified from the given information.

    Authors: We agree that the abstract is too concise to convey the probability bound on the bad event. In the full analysis (Theorem 3.1 and its proof), the measure of the bad set is controlled via a Markov-type inequality on the smallest singular value, yielding a probability that decays as O(ε^2) under the stated first-order assumptions; when multiplied by the per-iteration cost, this term remains absorbed into the overall O(ε^{-3}) expectation. The decomposition therefore ensures the bad-event contribution is negligible without hidden probability assumptions. We will revise the abstract to include a short clause stating this decay rate so that the complexity claim is self-contained at the abstract level. revision: yes

  2. Referee: [Abstract] Abstract: the Lipschitz continuity assumption on second-order derivatives is invoked after the event decomposition is stated. It is unclear whether this assumption is used only for the good event or whether it is also needed to bound the bad-event term; if the latter, the claim that the decomposition alone 'enables us to control perturbations' is not yet supported.

    Authors: The second-order Lipschitz assumption is invoked solely to obtain the final O(ε^{-3}) rate on the good events (controlling the bias and variance of the stochastic directions once the Jacobian is well-conditioned). The event decomposition itself bounds the perturbation on the good event using only first-order stochastic oracles; on the bad event the contribution is controlled by the probability decay already mentioned, which relies only on first-order moment bounds and does not invoke second-order Lipschitz continuity. We acknowledge that the abstract's sequential phrasing may suggest otherwise, and we will revise the wording to separate the roles of the decomposition and the additional assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity; complexity bound derived under explicit assumptions

full rationale

The paper presents a stochastic first-order method using decomposed search directions and Fletcher's augmented Lagrangian merit function. It introduces an event decomposition on the smallest singular value of the stochastic Jacobian to handle possible loss of uniform nondegeneracy, then invokes an additional Lipschitz assumption on second-order derivatives to prove an expected O(ε^{-3}) oracle complexity for a stochastic ε-KKT point. No quoted step reduces the claimed bound to a fitted input, self-definition, or load-bearing self-citation chain; the result is obtained via standard analysis under the stated assumptions without the prediction equaling the input by construction. This is the most common honest outcome for such complexity analyses.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the Lipschitz continuity of second-order derivatives (explicitly called an 'additional' assumption) and on the event decomposition for handling loss of uniform nondegeneracy of the stochastic Jacobian. No free parameters or invented entities are mentioned.

axioms (2)
  • domain assumption Lipschitz continuity of second-order derivatives of objective and constraint functions
    Invoked to obtain the O(ε^{-3}) bound; described as 'additional' in the abstract.
  • ad hoc to paper Event decomposition based on smallest singular value controls perturbations when uniform nondegeneracy of stochastic Jacobian may fail
    Introduced specifically to handle possible loss of uniform nondegeneracy.

pith-pipeline@v0.9.1-grok · 5727 in / 1397 out tokens · 44805 ms · 2026-06-29T02:47:07.851385+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 2 linked inside Pith

  1. [1]

    Alacaoglu and S

    A. Alacaoglu and S. J. Wright. Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. InInternational Conference on Artificial Intelligence and Statistics, pages 4627–4635. PMLR, 2024

  2. [2]

    A. S. Berahas, F. E. Curtis, D. Robinson, and B. Zhou. Sequential quadratic optimization for nonlinear equality constrained stochastic optimization.SIAM Journal on Optimization, 31(2):1352–1379, 2021

  3. [3]

    J. T. Betts.Practical methods for optimal control and estimation using nonlinear programming. SIAM, 2010

  4. [4]

    D. Boob, Q. Deng, and G. Lan. Stochastic first-order methods for convex and nonconvex functional constrained optimization.Mathematical Programming, 197(1):215–279, 2023

  5. [5]

    D. Boob, Q. Deng, and G. Lan. Level constrained first order methods for function constrained opti- mization.Mathematical Programming, 209(1):1–61, 2025

  6. [6]

    C. C. Chang and C. J. Lin. Libsvm: A library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3, article 27), 2007

  7. [7]

    Y. Cui, Q. Shi, X. Wang, and X. Xiao. An exact penalty method for stochastic equality-constrained optimization.Optimization Online, 2025

  8. [8]

    Y. Cui, X. Wang, and X. Xiao. A two-phase stochastic momentum-based algorithm for nonconvex expectation-constrained optimization.Journal of Scientific Computing, 104(1):16, 2025

  9. [9]

    F. E. Curtis, M. J. O’Neill, and D. P. Robinson. Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization.Mathematical Programming, 205(1):431–483, 2024

  10. [10]

    F. E. Curtis, D. P. Robinson, and B. Zhou. Sequential quadratic optimization for stochastic optimiza- tion with deterministic nonlinear inequality and equality constraints.SIAM Journal on Optimization, 34(4):3592–3622, 2024

  11. [11]

    C. Fang, C. J. Li, Z. Lin, and T. Zhang. SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. InAdvances in Neural Information Processing Systems, vol- ume 31, 2018

  12. [12]

    Y. Fang, S. Na, M. W. Mahoney, and M. Kolar. Fully stochastic trust-region sequential quadratic pro- gramming for equality-constrained optimization problems.SIAM Journal on Optimization, 34(2):2007– 2037, 2024

  13. [13]

    Fletcher

    R. Fletcher. A class of methods for nonlinear programming with termination and convergence properties. Integer and Nonlinear Programming, pages 157–175, 1970

  14. [14]

    N. I. Gould, D. Orban, and P. L. Toint. CUTEst: a constrained and unconstrained testing environ- ment with safe threads for mathematical optimization.Computational optimization and applications, 60(3):545–557, 2015

  15. [15]

    B. M. Idrees, L. Arora, and K. Rajawat. Constrained stochastic recursive momentum successive convex approximation.IEEE Transactions on Signal Processing, 2025

  16. [16]

    Jin and X

    L. Jin and X. Wang. A stochastic primal-dual method for a class of nonconvex constrained optimization. Computational Optimization and Applications, 83(1):143–180, 2022

  17. [17]

    Krishnapriyan, A

    A. Krishnapriyan, A. Gholami, S. Zhe, R. Kirby, and M. W. Mahoney. Characterizing possible fail- ure modes in physics-informed neural networks.Advances in neural information processing systems, 34:26548–26560, 2021. 23

  18. [18]

    Z. Li, P. Chen, S. Liu, S. Lu, and Y. Xu. Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization.Computational Optimization and Applications, 87(1):117–147, 2024

  19. [19]

    Liu and Y

    W. Liu and Y. Xu. A single-loop SPIDER-type stochastic subgradient method for expectation- constrained nonconvex nonsmooth optimization.To appear in SIAM Journal on Optimization, 2025

  20. [20]

    Z. Lu, S. Mei, and Y. Xiao. Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees.SIAM Journal on Optimization, 36(1):1–31, 2026

  21. [21]

    R. Ma, Q. Lin, and T. Yang. Quadratically regularized subgradient methods for weakly convex op- timization with weakly convex constraints. InInternational Conference on Machine Learning, pages 6554–6564. PMLR, 2020

  22. [22]

    B. L. Miller and H. M. Wagner. Chance constrained programming with joint constraints.Operations Research, 13(6):930–945, 1965

  23. [23]

    S. Na, M. Anitescu, and M. Kolar. An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians.Mathematical Programming, 199(1):721–791, 2023

  24. [24]

    M. J. O’Neill. A two stepsize SQP method for nonlinear equality constrained stochastic optimization. arXiv preprint arXiv:2408.16656, 2024

  25. [25]

    Oneto and S

    L. Oneto and S. Chiappa. Fairness in machine learning. InRecent trends in learning from data: Tutorials from the inns big data and deep learning conference (innsbddl2019), pages 155–196. Springer, 2020

  26. [26]

    Pr´ ekopa

    A. Pr´ ekopa. On probabilistic constrained programming. InProceedings of the Princeton symposium on mathematical programming, pages 113–138. Princeton, NJ, 1970

  27. [27]

    H. Shen, Y. Zeng, and B. Zhou. Sequential quadratic optimization for solving expectation equality constrained stochastic optimization problems.arXiv preprint arXiv:2503.09490, 2025

  28. [28]

    Shi and X

    Q. Shi and X. Wang. Adaptive directional decomposition methods for nonconvex constrained optimiza- tion.arXiv preprint arXiv:2511.03210, 2025

  29. [29]

    Q. Shi, X. Wang, and H. Wang. A momentum-based linearized augmented Lagrangian method for nonconvex constrained stochastic optimization.Mathematics of Operations Research, 51(1):92–133, 2026

  30. [30]

    X. Wang, S. Ma, and Y. Yuan. Penalty methods with stochastic approximation for stochastic nonlinear programming.Mathematics of Computation, 86(306):1793–1820, 2017

  31. [31]

    M. Yang, G. Li, Q. Hu, Q. Lin, and T. Yang. Single-loop algorithms for stochastic non-convex opti- mization with weakly-convex constraints.arXiv preprint arXiv:2504.15243, 2025

  32. [32]

    M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. Fairness constraints: A flexible approach for fair classification.Journal of Machine Learning Research, 20(75):1–42, 2019

  33. [33]

    Zhang, B

    Y. Zhang, B. Liu, X. Xiao, and L. Zhang. A quadratic-approximation-based stochastic approximation method for weakly convex stochastic programming.arXiv preprint arXiv:2605.03400, 2026

  34. [34]

    S. Zuo, X. Wang, and H. Wang. An adaptive single-loop stochastic penalty method for nonconvex constrained stochastic optimization.Optimization Online, 2025. 24