A Proximal Augmented Lagrangian Method Based on Quadratic Approximations for Weakly Convex Optimization
Pith reviewed 2026-05-07 15:44 UTC · model grok-4.3
The pith
QPALM solves weakly convex nonlinear programs by embedding quadratic approximations inside a proximal augmented Lagrangian framework and proves all three KKT metrics converge at O(T^{-1/3}).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing the nonlinear objective and constraints with their quadratic approximations inside the proximal augmented Lagrangian, the resulting algorithm reaches points whose three associated ε-KKT residuals—the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation—all decrease at the rate O(T^{-1/3}) after T iterations, provided the problem functions are weakly convex and a strictly feasible point exists.
What carries the argument
The quadratic proximal augmented Lagrangian iteration, which forms quadratic models of the objective and constraints, augments them with a proximal term, and solves the resulting strongly convex subproblems to drive the three KKT metrics to zero.
If this is right
- The method produces an explicit non-asymptotic bound on the number of subproblems needed to reach any prescribed ε-KKT tolerance.
- Each iteration reduces to solving a strongly convex quadratic program that is readily handled by off-the-shelf solvers.
- The same three-metric convergence applies to any problem whose functions satisfy the stated weak-convexity and feasibility conditions.
- Preliminary numerical tests indicate that the approach is competitive in practice on standard nonlinear programming benchmarks.
Where Pith is reading between the lines
- The quadratic-approximation idea could be transplanted to other augmented-Lagrangian variants to obtain similar rates without strong convexity.
- Because the subproblems remain strongly convex, the method may pair naturally with warm-starting or inexact inner solvers to reduce total work.
- The strict-feasibility assumption suggests a possible extension via an initial phase-1 problem or barrier regularization when feasibility is not obvious.
Load-bearing premise
The proof requires that every objective and constraint function is weakly convex and that a strictly feasible point exists.
What would settle it
On a weakly convex problem possessing a strictly feasible point, run QPALM for increasing T and check whether at least one of the three KKT metrics fails to fall below C/T^{1/3} for any constant C.
Figures
read the original abstract
This paper proposes QPALM, a proximal augmented Lagrangian method based on quadratic approximations, for solving nonlinear programming problems with weakly convex objective and constraint functions. The algorithm is constructed by incorporating quadratic approximations of both the objective and constraint functions into a proximal Lagrangian framework. We establish its non-asymptotic convergence rate in terms of the total number of subproblems solved. The convergence of QPALM is characterized by three metrics associated with the $\varepsilon$-KKT conditions: the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation. All three metrics are shown to converge at a rate of $O(T^{-1/3})$ after $T$ iterations. Preliminary numerical results demonstrate the practical efficiency of the proposed method. These results are established under two mild conditions: (i) weak convexity of all problem functions, and (ii) the existence of a strictly feasible point. The proposed QPALM is a sequentially strongly convex programming method that is readily implementable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes QPALM, a proximal augmented Lagrangian method incorporating quadratic approximations of the objective and constraint functions for nonlinear programs with weakly convex data. It establishes non-asymptotic convergence rates in terms of the number T of subproblems solved, showing that three ε-KKT metrics—the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation—all converge as O(T^{-1/3}). The analysis holds under weak convexity of all functions and the existence of a strictly feasible point; the method produces sequentially strongly convex subproblems and is claimed to be readily implementable. Preliminary numerical results are presented to illustrate practical performance.
Significance. If the stated rates hold, the result is a useful addition to the literature on first-order methods for weakly convex optimization. The combination of proximal augmented Lagrangian with quadratic models to obtain strongly convex subproblems is technically clean, and the explicit non-asymptotic bounds on three distinct KKT metrics (via the Moreau envelope) provide a concrete, falsifiable guarantee. The paper correctly avoids requiring strong convexity or penalty escalation to infinity, relying instead on the strict-feasibility assumption to control multipliers. These features make the contribution substantive for both theory and implementation in nonconvex nonlinear programming.
minor comments (4)
- §3, Algorithm 1: the update rules for the quadratic approximation parameters (e.g., the Hessian approximations or step-size choices) are stated without explicit bounds on the approximation error relative to the weak-convexity modulus; a short remark clarifying how these are chosen in practice would improve reproducibility.
- §5, Theorem 3.1: the dependence of the hidden constants in the O(T^{-1/3}) bound on the weak-convexity parameter and the strict-feasibility margin is not displayed; adding an explicit statement of this dependence (even if only in the proof sketch) would strengthen the result.
- §6, numerical experiments: the test problems, solver tolerances, and comparison baselines are described only briefly; expanding Table 1 or 2 with iteration counts, CPU times, and final KKT residuals would make the practical-efficiency claim easier to assess.
- Notation: the Moreau envelope of the Lagrangian is introduced in §2 but its gradient norm is used as a central metric without a dedicated definition or reference to its Lipschitz properties; a short paragraph in §2.2 would aid readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and positive assessment of our manuscript. The summary accurately reflects the contributions of QPALM, including the non-asymptotic O(T^{-1/3}) rates on the three ε-KKT metrics under weak convexity and strict feasibility. We appreciate the recognition that the combination of proximal augmented Lagrangian with quadratic approximations yields sequentially strongly convex subproblems without requiring penalty escalation to infinity.
Circularity Check
No significant circularity; derivation rests on independent assumptions and standard potential-function analysis
full rationale
The paper derives an O(T^{-1/3}) rate on three explicitly defined ε-KKT metrics (Moreau-envelope gradient norm squared, average constraint violation, average complementarity violation) for the QPALM algorithm. These metrics are constructed from the Lagrangian and its Moreau envelope independently of the algorithm's internal parameters or fitted quantities. The proof relies on the stated assumptions of weak convexity of all functions and existence of a strictly feasible point, together with the proximal term ensuring strong convexity of each subproblem and a potential-function decrease argument. No self-definitional loop, fitted-input prediction, or load-bearing self-citation chain is present in the claimed derivation; the central result is not equivalent to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Weak convexity of the objective and all constraint functions
- domain assumption Existence of a strictly feasible point
Reference graph
Works this paper leans on
- [1]
-
[2]
Amir Beck,First-Order Methods in Optimization, SIAM, Philadelphia, 2017
work page 2017
-
[3]
Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982
D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982
work page 1982
-
[4]
Frank E. Curtis, Michael J. O’Neill and Daniel P. Robinson,Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization, Mathematical Program- ming, 205(2024),431-483
work page 2024
-
[5]
Hari Dahal, Wei Liu and Yangyang Xu,Damped proximal augmented Lagrangian method for weakly-Convex problems with convex constraints, Mathematical Programming Computation https://doi.org/10.1007/s12532-025-00302-1
-
[6]
G. N. Grapiglia and Y . Yuan,On the complexity of an augmented Lagrangian method for non- convex optimization, IMA Journal of Numerical Analysis, 41:2(2021), 1546–1568
work page 2021
-
[7]
Isabelle Guyon, Steve Gunn, Aviel Ben-Hur, Gideon Dror,Result analysis of the NIPS 2003 feature selection challenge, Advances in Neural Information Processing Systems, V ol. 17, 2004, pp. 545–552 (MIT Press)
work page 2003
-
[8]
D. Hajinezhad and M. Hong,Perturbed proximal primal-dual algorithm for nonconvex nons- mooth optimization, Mathematical Programming, 176:1-2(2019), 207-245
work page 2019
-
[9]
M. R. Hestenes,Multiplier and gradient methods, Journal of Optimization Theory and Appli- cations, 4:5(1969), pp. 303–320
work page 1969
-
[10]
M. Hong, D. Hajinezhad, and M.-M. Zhao,Prox-PDA: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks, Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, V ol. 70, 2017, pp. 1529–1538
work page 2017
-
[11]
Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/
Yann LeCun, Corinna Cortes, Christopher J.C. Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/
work page 2010
-
[12]
Powell,A method for nonlinear constraints in minimization problems, In R
M.J.D. Powell,A method for nonlinear constraints in minimization problems, In R. Fletcher, Editor,Optimization, Academic Press, New York, 1969, 283–298. 29
work page 1969
-
[13]
Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970
R. Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970
work page 1970
-
[14]
R. T. Rockafellar and R. J. -B. Wets,V ariational Analysis, Springer-Verlag, New York, 1998
work page 1998
-
[15]
Workbench Team C, A Marketing Dataset, 2008.http://www.causality.inf.ethz.ch/ data/CINA.html
work page 2008
-
[16]
Y . Xie and S. J. Wright,Complexity of Proximal Augmented Lagrangian for Nonconvex Opti- mization with Nonlinear Equality Constraints, Journal of Scientific Computing, 86:38(2021), https://doi.org/10.1007/s10915-021-01409-y. 130, 2021, pp. 2170–2178
-
[17]
X. Wang and Y .X. Yuan, An augmented Lagrangian trust region method for equality con- strained optimization. Optim. Methods Software, 30:3(2015), 559–582
work page 2015
-
[18]
X. Wang and H. C. Zhang, An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software, 30:5(2015),934-964
work page 2015
-
[19]
L. W. Zhang, Y . L. Zhang, X. T. Xiao and J. Wu, Stochastic Approximation Proximal Method of Multipliers for Convex Stochastic Programming, Mathematics of Operations Research, 48:1(2023),177-193. 30
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.