A Proximal Augmented Lagrangian Method Based on Quadratic Approximations for Weakly Convex Optimization

Benqi Liu; Liwei Zhang; Xiantao Xiao; Yule Zhang

arxiv: 2605.03415 · v1 · submitted 2026-05-05 · 🧮 math.OC

A Proximal Augmented Lagrangian Method Based on Quadratic Approximations for Weakly Convex Optimization

Yule Zhang , Benqi Liu , Xiantao Xiao , Liwei Zhang This is my paper

Pith reviewed 2026-05-07 15:44 UTC · model grok-4.3

classification 🧮 math.OC

keywords proximal augmented Lagrangianweakly convex optimizationquadratic approximationsnon-asymptotic convergenceKKT conditionsMoreau envelopenonlinear programming

0 comments

The pith

QPALM solves weakly convex nonlinear programs by embedding quadratic approximations inside a proximal augmented Lagrangian framework and proves all three KKT metrics converge at O(T^{-1/3}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents QPALM as a new method that approximates both the objective and the constraints by quadratic models and then applies a proximal augmented Lagrangian iteration. It shows that after T subproblems are solved, the squared gradient norm of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation each decay like O(T^{-1/3}). A sympathetic reader would care because many engineering and machine-learning problems satisfy only weak convexity yet still need reliable first-order stationarity guarantees without requiring strong convexity or exact penalty tuning. The proof relies on the existence of a strictly feasible point and produces a sequence of strongly convex subproblems that can be solved by standard solvers. This yields a practical algorithm whose non-asymptotic rate is expressed directly in the total number of inner solves rather than in oracle calls.

Core claim

By replacing the nonlinear objective and constraints with their quadratic approximations inside the proximal augmented Lagrangian, the resulting algorithm reaches points whose three associated ε-KKT residuals—the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation—all decrease at the rate O(T^{-1/3}) after T iterations, provided the problem functions are weakly convex and a strictly feasible point exists.

What carries the argument

The quadratic proximal augmented Lagrangian iteration, which forms quadratic models of the objective and constraints, augments them with a proximal term, and solves the resulting strongly convex subproblems to drive the three KKT metrics to zero.

If this is right

The method produces an explicit non-asymptotic bound on the number of subproblems needed to reach any prescribed ε-KKT tolerance.
Each iteration reduces to solving a strongly convex quadratic program that is readily handled by off-the-shelf solvers.
The same three-metric convergence applies to any problem whose functions satisfy the stated weak-convexity and feasibility conditions.
Preliminary numerical tests indicate that the approach is competitive in practice on standard nonlinear programming benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The quadratic-approximation idea could be transplanted to other augmented-Lagrangian variants to obtain similar rates without strong convexity.
Because the subproblems remain strongly convex, the method may pair naturally with warm-starting or inexact inner solvers to reduce total work.
The strict-feasibility assumption suggests a possible extension via an initial phase-1 problem or barrier regularization when feasibility is not obvious.

Load-bearing premise

The proof requires that every objective and constraint function is weakly convex and that a strictly feasible point exists.

What would settle it

On a weakly convex problem possessing a strictly feasible point, run QPALM for increasing T and check whether at least one of the three KKT metrics fails to fall below C/T^{1/3} for any constant C.

Figures

Figures reproduced from arXiv: 2605.03415 by Benqi Liu, Liwei Zhang, Xiantao Xiao, Yule Zhang.

**Figure 1.** Figure 1: Numerical performance of QPALM, pALM, and ALM on nonconvex QCQP instances. view at source ↗

**Figure 2.** Figure 2: Theory validation and algorithmic comparison on the CINA dataset. view at source ↗

**Figure 3.** Figure 3: Theory validation and algorithmic comparison on the gisette dataset. view at source ↗

**Figure 4.** Figure 4: Theory validation and algorithmic comparison on the MNIST dataset. view at source ↗

read the original abstract

This paper proposes QPALM, a proximal augmented Lagrangian method based on quadratic approximations, for solving nonlinear programming problems with weakly convex objective and constraint functions. The algorithm is constructed by incorporating quadratic approximations of both the objective and constraint functions into a proximal Lagrangian framework. We establish its non-asymptotic convergence rate in terms of the total number of subproblems solved. The convergence of QPALM is characterized by three metrics associated with the $\varepsilon$-KKT conditions: the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation. All three metrics are shown to converge at a rate of $O(T^{-1/3})$ after $T$ iterations. Preliminary numerical results demonstrate the practical efficiency of the proposed method. These results are established under two mild conditions: (i) weak convexity of all problem functions, and (ii) the existence of a strictly feasible point. The proposed QPALM is a sequentially strongly convex programming method that is readily implementable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QPALM adds quadratic approximations to proximal AL for weakly convex NLPs and proves a concrete O(T^{-1/3}) rate on three epsilon-KKT metrics under mild assumptions.

read the letter

QPALM is a proximal augmented Lagrangian method that folds quadratic approximations of both the objective and the constraints into the framework for weakly convex nonlinear programs. It proves an O(T^{-1/3}) non-asymptotic rate on the squared Moreau gradient norm, average constraint violation, and average complementarity violation after T subproblem solves. The construction keeps each subproblem strongly convex and counts exact solves as the iteration measure T. The analysis absorbs the weak-convexity modulus into the Moreau envelope and uses the strict-feasibility assumption to bound multipliers without driving the penalty parameter to infinity. That combination looks internally consistent and gives a usable implementable scheme rather than just an existence result. The paper does a clean job of making the subproblems sequentially strongly convex and of stating the three-metric convergence explicitly. The assumptions are standard and mild for this setting. The soft spots are modest. The rate is the usual sublinear one for nonconvex methods, so high accuracy still requires many subproblem solves. The numerical section is labeled preliminary, which means we do not yet see scaling behavior or head-to-head comparisons on larger instances. In practice one would also need to check how inexact subproblem solves affect the rate, since the theory counts exact solves. This is for readers working on nonconvex optimization algorithms in machine learning or control who want explicit rates and an implementable method. It deserves a serious referee because the framework is new, the claims are specific, and the analysis appears to hold up without hidden circularity or uncontrolled approximation errors.

Referee Report

0 major / 4 minor

Summary. The paper proposes QPALM, a proximal augmented Lagrangian method incorporating quadratic approximations of the objective and constraint functions for nonlinear programs with weakly convex data. It establishes non-asymptotic convergence rates in terms of the number T of subproblems solved, showing that three ε-KKT metrics—the squared norm of the gradient of the Moreau envelope of the Lagrangian, the average constraint violation, and the average complementarity violation—all converge as O(T^{-1/3}). The analysis holds under weak convexity of all functions and the existence of a strictly feasible point; the method produces sequentially strongly convex subproblems and is claimed to be readily implementable. Preliminary numerical results are presented to illustrate practical performance.

Significance. If the stated rates hold, the result is a useful addition to the literature on first-order methods for weakly convex optimization. The combination of proximal augmented Lagrangian with quadratic models to obtain strongly convex subproblems is technically clean, and the explicit non-asymptotic bounds on three distinct KKT metrics (via the Moreau envelope) provide a concrete, falsifiable guarantee. The paper correctly avoids requiring strong convexity or penalty escalation to infinity, relying instead on the strict-feasibility assumption to control multipliers. These features make the contribution substantive for both theory and implementation in nonconvex nonlinear programming.

minor comments (4)

§3, Algorithm 1: the update rules for the quadratic approximation parameters (e.g., the Hessian approximations or step-size choices) are stated without explicit bounds on the approximation error relative to the weak-convexity modulus; a short remark clarifying how these are chosen in practice would improve reproducibility.
§5, Theorem 3.1: the dependence of the hidden constants in the O(T^{-1/3}) bound on the weak-convexity parameter and the strict-feasibility margin is not displayed; adding an explicit statement of this dependence (even if only in the proof sketch) would strengthen the result.
§6, numerical experiments: the test problems, solver tolerances, and comparison baselines are described only briefly; expanding Table 1 or 2 with iteration counts, CPU times, and final KKT residuals would make the practical-efficiency claim easier to assess.
Notation: the Moreau envelope of the Lagrangian is introduced in §2 but its gradient norm is used as a central metric without a dedicated definition or reference to its Lipschitz properties; a short paragraph in §2.2 would aid readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of our manuscript. The summary accurately reflects the contributions of QPALM, including the non-asymptotic O(T^{-1/3}) rates on the three ε-KKT metrics under weak convexity and strict feasibility. We appreciate the recognition that the combination of proximal augmented Lagrangian with quadratic approximations yields sequentially strongly convex subproblems without requiring penalty escalation to infinity.

Circularity Check

0 steps flagged

No significant circularity; derivation rests on independent assumptions and standard potential-function analysis

full rationale

The paper derives an O(T^{-1/3}) rate on three explicitly defined ε-KKT metrics (Moreau-envelope gradient norm squared, average constraint violation, average complementarity violation) for the QPALM algorithm. These metrics are constructed from the Lagrangian and its Moreau envelope independently of the algorithm's internal parameters or fitted quantities. The proof relies on the stated assumptions of weak convexity of all functions and existence of a strictly feasible point, together with the proximal term ensuring strong convexity of each subproblem and a potential-function decrease argument. No self-definitional loop, fitted-input prediction, or load-bearing self-citation chain is present in the claimed derivation; the central result is not equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions in optimization rather than new invented entities or fitted constants.

axioms (2)

domain assumption Weak convexity of the objective and all constraint functions
Invoked as condition (i) to guarantee the subproblems are strongly convex and to derive the convergence rate.
domain assumption Existence of a strictly feasible point
Invoked as condition (ii) to ensure the analysis can proceed without additional constraint qualifications.

pith-pipeline@v0.9.0 · 5484 in / 1442 out tokens · 29193 ms · 2026-05-07T15:44:49.971668+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

A. D. Adeoye, P. Latafat, and A. Bemporad,A proximal augmented Lagrangian method for nonconvex optimization with equality and inequality constraints,arXiv preprint,2025, arXiv:2509.02894

work page arXiv 2025
[2]

Amir Beck,First-Order Methods in Optimization, SIAM, Philadelphia, 2017

work page 2017
[3]

Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982

D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982

work page 1982
[4]

Curtis, Michael J

Frank E. Curtis, Michael J. O’Neill and Daniel P. Robinson,Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization, Mathematical Program- ming, 205(2024),431-483

work page 2024
[5]

Hari Dahal, Wei Liu and Yangyang Xu,Damped proximal augmented Lagrangian method for weakly-Convex problems with convex constraints, Mathematical Programming Computation https://doi.org/10.1007/s12532-025-00302-1

work page doi:10.1007/s12532-025-00302-1
[6]

G. N. Grapiglia and Y . Yuan,On the complexity of an augmented Lagrangian method for non- convex optimization, IMA Journal of Numerical Analysis, 41:2(2021), 1546–1568

work page 2021
[7]

17, 2004, pp

Isabelle Guyon, Steve Gunn, Aviel Ben-Hur, Gideon Dror,Result analysis of the NIPS 2003 feature selection challenge, Advances in Neural Information Processing Systems, V ol. 17, 2004, pp. 545–552 (MIT Press)

work page 2003
[8]

Hajinezhad and M

D. Hajinezhad and M. Hong,Perturbed proximal primal-dual algorithm for nonconvex nons- mooth optimization, Mathematical Programming, 176:1-2(2019), 207-245

work page 2019
[9]

M. R. Hestenes,Multiplier and gradient methods, Journal of Optimization Theory and Appli- cations, 4:5(1969), pp. 303–320

work page 1969
[10]

M. Hong, D. Hajinezhad, and M.-M. Zhao,Prox-PDA: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks, Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, V ol. 70, 2017, pp. 1529–1538

work page 2017
[11]

Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/

Yann LeCun, Corinna Cortes, Christopher J.C. Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/

work page 2010
[12]

Powell,A method for nonlinear constraints in minimization problems, In R

M.J.D. Powell,A method for nonlinear constraints in minimization problems, In R. Fletcher, Editor,Optimization, Academic Press, New York, 1969, 283–298. 29

work page 1969
[13]

Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970

R. Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970

work page 1970
[14]

R. T. Rockafellar and R. J. -B. Wets,V ariational Analysis, Springer-Verlag, New York, 1998

work page 1998
[15]

Workbench Team C, A Marketing Dataset, 2008.http://www.causality.inf.ethz.ch/ data/CINA.html

work page 2008
[16]

Xie and S

Y . Xie and S. J. Wright,Complexity of Proximal Augmented Lagrangian for Nonconvex Opti- mization with Nonlinear Equality Constraints, Journal of Scientific Computing, 86:38(2021), https://doi.org/10.1007/s10915-021-01409-y. 130, 2021, pp. 2170–2178

work page doi:10.1007/s10915-021-01409-y 2021
[17]

Wang and Y .X

X. Wang and Y .X. Yuan, An augmented Lagrangian trust region method for equality con- strained optimization. Optim. Methods Software, 30:3(2015), 559–582

work page 2015
[18]

Wang and H

X. Wang and H. C. Zhang, An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software, 30:5(2015),934-964

work page 2015
[19]

L. W. Zhang, Y . L. Zhang, X. T. Xiao and J. Wu, Stochastic Approximation Proximal Method of Multipliers for Convex Stochastic Programming, Mathematics of Operations Research, 48:1(2023),177-193. 30

work page 2023

[1] [1]

A. D. Adeoye, P. Latafat, and A. Bemporad,A proximal augmented Lagrangian method for nonconvex optimization with equality and inequality constraints,arXiv preprint,2025, arXiv:2509.02894

work page arXiv 2025

[2] [2]

Amir Beck,First-Order Methods in Optimization, SIAM, Philadelphia, 2017

work page 2017

[3] [3]

Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982

D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982

work page 1982

[4] [4]

Curtis, Michael J

Frank E. Curtis, Michael J. O’Neill and Daniel P. Robinson,Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization, Mathematical Program- ming, 205(2024),431-483

work page 2024

[5] [5]

Hari Dahal, Wei Liu and Yangyang Xu,Damped proximal augmented Lagrangian method for weakly-Convex problems with convex constraints, Mathematical Programming Computation https://doi.org/10.1007/s12532-025-00302-1

work page doi:10.1007/s12532-025-00302-1

[6] [6]

G. N. Grapiglia and Y . Yuan,On the complexity of an augmented Lagrangian method for non- convex optimization, IMA Journal of Numerical Analysis, 41:2(2021), 1546–1568

work page 2021

[7] [7]

17, 2004, pp

Isabelle Guyon, Steve Gunn, Aviel Ben-Hur, Gideon Dror,Result analysis of the NIPS 2003 feature selection challenge, Advances in Neural Information Processing Systems, V ol. 17, 2004, pp. 545–552 (MIT Press)

work page 2003

[8] [8]

Hajinezhad and M

D. Hajinezhad and M. Hong,Perturbed proximal primal-dual algorithm for nonconvex nons- mooth optimization, Mathematical Programming, 176:1-2(2019), 207-245

work page 2019

[9] [9]

M. R. Hestenes,Multiplier and gradient methods, Journal of Optimization Theory and Appli- cations, 4:5(1969), pp. 303–320

work page 1969

[10] [10]

M. Hong, D. Hajinezhad, and M.-M. Zhao,Prox-PDA: the proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks, Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, V ol. 70, 2017, pp. 1529–1538

work page 2017

[11] [11]

Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/

Yann LeCun, Corinna Cortes, Christopher J.C. Burges, The MNIST Database of Handwritten Digits, 2010.http://yann.lecun.com/exdb/mnist/

work page 2010

[12] [12]

Powell,A method for nonlinear constraints in minimization problems, In R

M.J.D. Powell,A method for nonlinear constraints in minimization problems, In R. Fletcher, Editor,Optimization, Academic Press, New York, 1969, 283–298. 29

work page 1969

[13] [13]

Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970

R. Tyrrell Rockafellar,Convex Analysis, Princeton University Press,1970

work page 1970

[14] [14]

R. T. Rockafellar and R. J. -B. Wets,V ariational Analysis, Springer-Verlag, New York, 1998

work page 1998

[15] [15]

Workbench Team C, A Marketing Dataset, 2008.http://www.causality.inf.ethz.ch/ data/CINA.html

work page 2008

[16] [16]

Xie and S

Y . Xie and S. J. Wright,Complexity of Proximal Augmented Lagrangian for Nonconvex Opti- mization with Nonlinear Equality Constraints, Journal of Scientific Computing, 86:38(2021), https://doi.org/10.1007/s10915-021-01409-y. 130, 2021, pp. 2170–2178

work page doi:10.1007/s10915-021-01409-y 2021

[17] [17]

Wang and Y .X

X. Wang and Y .X. Yuan, An augmented Lagrangian trust region method for equality con- strained optimization. Optim. Methods Software, 30:3(2015), 559–582

work page 2015

[18] [18]

Wang and H

X. Wang and H. C. Zhang, An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software, 30:5(2015),934-964

work page 2015

[19] [19]

L. W. Zhang, Y . L. Zhang, X. T. Xiao and J. Wu, Stochastic Approximation Proximal Method of Multipliers for Convex Stochastic Programming, Mathematics of Operations Research, 48:1(2023),177-193. 30

work page 2023