Smoothness of the Augmented Lagrangian Dual in Convex Optimization
Pith reviewed 2026-05-22 16:22 UTC · model grok-4.3
The pith
Under the condition that the standard dual has nonempty domain, the augmented Lagrangian dual is 1/ρ-smooth everywhere.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the fundamental condition that dom φ ≠ ∅, φ_ρ is 1/ρ-smooth everywhere and the solution to min_x L_ρ(x, λ) exists for any λ ∈ R^p. These properties are shown for any closed proper convex f and any linear map A, substantially weakening the assumptions commonly used to guarantee smoothness and existence in the augmented dual.
What carries the argument
The augmented Lagrangian dual φ_ρ(λ) obtained by infimizing the augmented Lagrangian L_ρ(x, λ) = f(x) + ⟨λ, Ax − b⟩ + (ρ/2)‖Ax − b‖², whose quadratic term supplies both the uniform smoothness and the existence of the inner minimizer.
If this is right
- Dual ascent or proximal-point algorithms applied to φ_ρ may employ a fixed step size ρ without local adjustment.
- The x-subproblem in augmented Lagrangian or ADMM iterations is guaranteed to be solvable for every dual iterate.
- Convergence proofs for augmented-Lagrangian schemes can proceed with only the nonempty-domain hypothesis instead of interior-point or strong-convexity assumptions.
Where Pith is reading between the lines
- The same quadratic augmentation may yield analogous smoothness for problems with inequality constraints once the dual is suitably lifted.
- Practical checks for nonempty dom φ remain difficult, yet the result clarifies the precise boundary between well-behaved and ill-behaved duals.
- Smoothness with explicit modulus could be used to derive explicit iteration complexity bounds for first-order dual methods.
Load-bearing premise
There exists at least one multiplier for which the ordinary dual function φ takes a finite value.
What would settle it
Take any convex f and A such that dom φ is nonempty, compute or estimate the gradient of φ_ρ at two points, and check whether its difference quotient exceeds 1/ρ.
read the original abstract
This paper focuses on the general linearly constrained optimization problem: $\min_{x \in \mathbb{R}^d} f(x) \ \text{s.t.} \ Ax = b$, where $f: \mathbb{R}^d \rightarrow \mathbb{R} \cup \{+\infty\}$ is a closed proper convex function, $A \in \mathbb{R}^{p \times d}$, and $b \in \mathbb{R}^p$. We define the standard dual function $\phi(\lambda) = \inf_x \{f(x) + \langle \lambda, A x - b \rangle\}$, the augmented Lagrangian $\mathcal{L}_{\rho}(x, \lambda) = f(x) + \langle \lambda, Ax - b \rangle + \frac{\rho}{2}\|Ax - b\|^2$ ($\rho > 0$), and the augmented Lagrangian dual function $\phi_{\rho}(\lambda) = \inf_x \mathcal{L}_{\rho}(x, \lambda)$. Under the fundamental condition that $\text{dom} \ \phi \neq \emptyset$, we establish that: (1) $\phi_{\rho}$ is $\frac{1}{\rho}$-smooth everywhere; and (2) the solution to $\min_{x \in \mathbb{R}^d} \mathcal{L}_{\rho}(x, \lambda)$ exists for any $\lambda \in \mathbb{R}^p$. These theoretical findings substantially weaken the stringent assumptions typically imposed in the literature to ensure such properties.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers the linearly constrained convex problem min f(x) s.t. Ax=b with f closed proper convex. It defines the standard dual φ(λ)=inf_x {f(x)+⟨λ,Ax-b⟩} and the augmented dual φ_ρ(λ)=inf_x L_ρ(x,λ) where L_ρ is the augmented Lagrangian. Under the sole assumption dom φ ≠ ∅ it claims that φ_ρ is 1/ρ-smooth on all of R^p and that the infimum defining φ_ρ(λ) is attained for every λ.
Significance. If the claims were correct they would relax common coercivity or strong-convexity assumptions used to guarantee smoothness and attainment in augmented-Lagrangian theory, potentially simplifying convergence analyses for first-order methods.
major comments (2)
- [Abstract and §3] Abstract and §3 (main theorem): the claim that dom φ ≠ ∅ implies attainment of min_x L_ρ(x,λ) for every λ is false. Counter-example: d=2, p=1, A=[1 0], b=0, f(x1,x2)=½x1² + exp(x2). Then φ(λ) is finite for all λ (dom φ = R), yet L_ρ(x,λ) = (½+ρ/2)x1² + λ x1 + exp(x2) has infimum equal to the completed-square value over x1 but is never attained as x2→-∞ along ker(A). This directly falsifies claim (2).
- [§4] §4 (proof of smoothness): the argument that 1/ρ-smoothness follows from the same dom φ ≠ ∅ condition appears to rely on the same attainment step used for claim (2); once attainment fails the subdifferential argument for smoothness cannot be completed for arbitrary λ.
minor comments (1)
- Notation: the paper uses φ_ρ for the augmented dual; it would help to explicitly relate it to the standard Moreau envelope or proximal mapping of the dual.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the explicit counterexample, which has helped us identify an error in one of our claims. We acknowledge that the attainment result does not hold under the sole assumption dom φ ≠ ∅. However, we maintain that the smoothness result is correct and can be established by a revised argument that does not depend on attainment. We will update the manuscript to correct the inaccurate claim, replace the flawed proof, and clarify the conditions needed for attainment.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (main theorem): the claim that dom φ ≠ ∅ implies attainment of min_x L_ρ(x,λ) for every λ is false. Counter-example: d=2, p=1, A=[1 0], b=0, f(x1,x2)=½x1² + exp(x2). Then φ(λ) is finite for all λ (dom φ = R), yet L_ρ(x,λ) = (½+ρ/2)x1² + λ x1 + exp(x2) has infimum equal to the completed-square value over x1 but is never attained as x2→-∞ along ker(A). This directly falsifies claim (2).
Authors: We have verified the counterexample and agree that it is valid: the infimum of L_ρ is approached as x2 → -∞ but is never attained for any finite x. This shows that claim (2) is false under only dom φ ≠ ∅. We will revise the manuscript to remove this claim from the abstract and main theorem, and we will add a remark explaining that attainment requires additional assumptions such as coercivity of f along ker(A). revision: yes
-
Referee: [§4] §4 (proof of smoothness): the argument that 1/ρ-smoothness follows from the same dom φ ≠ ∅ condition appears to rely on the same attainment step used for claim (2); once attainment fails the subdifferential argument for smoothness cannot be completed for arbitrary λ.
Authors: We agree that the current proof of smoothness is invalid because it relies on attainment of the infimum in x. Nevertheless, the 1/ρ-smoothness statement itself remains true. We will replace the proof with a corrected argument that expresses φ_ρ via an auxiliary convex function ψ(y) = inf{f(x) : Ax = y} and recognizes φ_ρ as a composition involving the Moreau envelope of ψ. The gradient of φ_ρ is then given by a proximal mapping, which is nonexpansive and yields the desired 1/ρ-Lipschitz continuity without requiring attainment in the original variables. We will also confirm the result on the referee's counterexample, where φ_ρ is explicitly quadratic and hence smooth. revision: partial
Circularity Check
No significant circularity; derivation applies standard convex analysis to definitions.
full rationale
The paper derives the 1/ρ-smoothness of φ_ρ and attainment of the infimum in L_ρ(x, λ) directly from the definitions of φ and φ_ρ together with the assumption dom φ ≠ ∅, using properties of closed proper convex functions and infimal projections. No self-definitional reductions, fitted parameters presented as predictions, or load-bearing self-citations appear; the central claims rest on external convex-analysis facts rather than reducing to the paper's own inputs by construction. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption f is a closed proper convex function from R^d to R union +∞
- domain assumption dom φ ≠ ∅
Forward citations
Cited by 2 Pith papers
-
Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs
An inexact augmented Lagrangian method with projected Q-ascent yields global last-iterate convergence guarantees for constrained MDP policy optimization, extending from tabular to log-linear and non-linear policies.
-
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
The Dual² approach produces iD2A and MiD2A gradient methods that achieve asymptotic convergence under milder conditions on the public function and linear rates with reduced communication and computation complexity.
Reference graph
Works this paper leans on
-
[1]
Multiplier and gradient methods,
M. R. Hestenes, “Multiplier and gradient methods,” Journal of Optimization Theory and Applications , vol. 4, no. 5, pp. 303–320, 1969
work page 1969
-
[2]
A method for nonlinear constraints in mini mization problems,
M. J. Powell, “A method for nonlinear constraints in mini mization problems,” Optimization, pp. 283–298, 1969. 6
work page 1969
-
[3]
V . Nedelcu, I. Necoara, and Q. Tran-Dinh, “Computationa l complexity of inexact gradient augmented lagrangian methods: Application to constrained mpc,” SIAM Journal on Control and Optimization , vol. 52, no. 5, pp. 3109–3134, 2014
work page 2014
-
[4]
Iteration-complexity of first -order augmented lagrangian methods for convex programming,
G. Lan and R. D. Monteiro, “Iteration-complexity of first -order augmented lagrangian methods for convex programming,” Mathematical Programming , vol. 155, no. 1, pp. 511–547, 2016
work page 2016
-
[5]
First-order m ethods of smooth convex optimization with inexact oracle,
O. Devolder, F. Glineur, and Y . Nesterov, “First-order m ethods of smooth convex optimization with inexact oracle,” Mathematical Programming , vol. 146, no. 1, pp. 37–75, 2014
work page 2014
-
[6]
Convergence rates of in exact proximal-gradient methods for convex optimization,
M. Schmidt, N. Roux, and F. Bach, “Convergence rates of in exact proximal-gradient methods for convex optimization,” Advances in Neural Information Processing Systems , vol. 24, 2011
work page 2011
-
[7]
D. P . Bertsekas, Nonlinear Programming . Athena Scientific, 2016, vol. 4
work page 2016
-
[8]
Smooth minimization of non-smooth functi ons,
Y . Nesterov, “Smooth minimization of non-smooth functi ons,” Mathematical Programming, vol. 103, pp. 127– 152, 2005
work page 2005
-
[9]
On the linear convergence of the al ternating direction method of multipliers,
M. Hong and Z.-Q. Luo, “On the linear convergence of the al ternating direction method of multipliers,” Mathematical Programming , vol. 162, no. 1, pp. 165–199, 2017
work page 2017
-
[10]
V andenberghe, Optimization Methods for Large-Scale Systems
L. V andenberghe, Optimization Methods for Large-Scale Systems . Lecture Slides, UCLA, 2022. [Online]. Available: https://www.seas.ucla.edu/∼ vandenbe/236C
work page 2022
-
[11]
Nesterov, Lectures on Convex Optimization
Y . Nesterov, Lectures on Convex Optimization . Springer, 2018, vol. 137
work page 2018
-
[12]
On the acceleration of augmented lagr angian method for linearly constrained optimiza- tion,
B. He and X. Y uan, “On the acceleration of augmented lagr angian method for linearly constrained optimiza- tion,” Optimization Online , vol. 3, 2010
work page 2010
-
[13]
Accelerated bregma n method for linearly constrained–minimization,
M. Kang, S. Y un, H. Woo, and M. Kang, “Accelerated bregma n method for linearly constrained–minimization,” Journal of Scientific Computing , vol. 56, no. 3, pp. 515–534, 2013
work page 2013
-
[14]
Davis, Mathematical Programming I
D. Davis, Mathematical Programming I . Lecture Notes, Cornell University, 2016. [Online]. Avail able: https://people.orie.cornell.edu/dsd95/orie63002016.html
work page 2016
-
[15]
J.-B. Hiriart-Urruty and C. Lemar´ echal, Fundamentals of Convex Analysis . Springer Science & Business Media, 2004
work page 2004
-
[16]
R. T. Rockafellar, Convex Analysis. Princeton University Press, 1970
work page 1970
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.