Primal-Dual Methods for Nonsmooth Nonconvex Optimization with Orthogonality Constraints

Anthony Man-Cho So; Linglingzhi Zhu; Shangyuan Liu; Wentao Ding

arxiv: 2604.04130 · v1 · submitted 2026-04-05 · 🧮 math.OC · cs.LG· cs.NA· math.NA

Primal-Dual Methods for Nonsmooth Nonconvex Optimization with Orthogonality Constraints

Linglingzhi Zhu , Wentao Ding , Shangyuan Liu , Anthony Man-Cho So This is my paper

Pith reviewed 2026-05-13 17:06 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NA

keywords nonsmooth nonconvex optimizationorthogonality constraintsaugmented Lagrangianprimal-dual methodsiteration complexityStiefel manifoldKurdyka-Lojasiewicz property

0 comments

The pith

A single-loop primal-dual algorithm reaches O(ε^{-3}) iteration complexity for nonsmooth nonconvex problems with orthogonality constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a retraction-free primal-dual method based on a linearized smoothing augmented Lagrangian for solving optimization problems that mix nonsmooth nonconvex objectives with orthogonality constraints. The algorithm runs in a single loop and never solves an inner subproblem at any step. It proves that the number of iterations needed to reach an ε-KKT point scales as O(ε^{-3}), matching the best known bound from Riemannian methods on the Stiefel manifold. The same framework also yields asymptotic sequential convergence once the objective satisfies the Kurdyka-Lojasiewicz property. Experiments on both smooth and nonsmooth test cases indicate faster run times and better scaling than current state-of-the-art solvers.

Core claim

The paper establishes a linearized smoothing augmented Lagrangian method for nonsmooth nonconvex optimization problems subject to orthogonality constraints. This primal-dual approach is single-loop and does not require solving subproblems at each iteration. It achieves an iteration complexity of O(ε^{-3}) to find ε-KKT points, which matches the best-known complexity in the Riemannian optimization literature. Additionally, asymptotic sequential convergence is shown under the standard Kurdyka-Lojasiewicz property.

What carries the argument

The linearized smoothing augmented Lagrangian method, which smooths the nonsmooth objective while using dual variables to enforce orthogonality constraints directly in the primal-dual updates without retractions or manifold projections.

If this is right

The method applies to both smooth and nonsmooth objectives without changing the algorithmic structure.
It produces points that satisfy the approximate KKT conditions after O(ε^{-3}) iterations.
Asymptotic sequential convergence holds for any problem obeying the Kurdyka-Lojasiewicz property.
Experiments show improved computational efficiency and scalability over existing Riemannian solvers on test instances of varying size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-loop, subproblem-free design may reduce cumulative floating-point error compared with methods that repeatedly retract onto the manifold.
Because dual variables handle the constraints, the approach could extend naturally to distributed or parallel implementations where each processor updates its own primal block.
Applications such as orthogonal dictionary learning or robust principal-component analysis could see larger problem sizes solved in practice if the observed scaling holds.

Load-bearing premise

The convergence analysis requires that the objective function satisfies the Kurdyka-Lojasiewicz property.

What would settle it

Numerical runs on a concrete nonsmooth nonconvex orthogonally constrained problem that measure whether the iteration count required to reach a prescribed ε-KKT accuracy grows exactly as O(ε^{-3}) or deviates from that rate.

Figures

Figures reproduced from arXiv: 2604.04130 by Anthony Man-Cho So, Linglingzhi Zhu, Shangyuan Liu, Wentao Ding.

**Figure 2.** Figure 2: Feasibility and Update of Primal Variables with Different [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence of LSALM with varying r and β problem is more complex because it requires enforcing both sparsity and orthogonality simultaneously. LSALM is designed to solve this more challenging case for larger values of n. To evaluate the performance of LSALM, we compare it against the following algorithms: ManPGAda [CMSZ20], PAMAL [CJY16], SOC [LO14], and RADMM [LMS25]. The experimental setup is as follow… view at source ↗

**Figure 4.** Figure 4: Comparison of Average CPU time and Per-Iteration Efficiency of the Algorithms [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

read the original abstract

Recent advancements in data science have significantly elevated the importance of orthogonally constrained optimization problems. The Riemannian approach has become a popular technique for addressing these problems due to the advantageous computational and analytical properties of the Stiefel manifold. Nonetheless, the interplay of nonsmoothness alongside orthogonality constraints introduces substantial challenges to current Riemannian methods, including scalability, parallelizability, complicated subproblems, and cumulative numerical errors that threaten feasibility. In this paper, we take a retraction-free primal-dual approach and propose a linearized smoothing augmented Lagrangian method specifically designed for nonsmooth and nonconvex optimization with orthogonality constraints. Our proposed method is single-loop and free of subproblem solving. We establish its iteration complexity of $O(\epsilon^{-3})$ for finding $\epsilon$-KKT points, matching the best-known results in the Riemannian optimization literature. Additionally, by invoking the standard Kurdyka-Lojasiewicz (KL) property, we demonstrate asymptotic sequential convergence of the proposed algorithm. Numerical experiments on both smooth and nonsmooth orthogonal constrained problems demonstrate the superior computational efficiency and scalability of the proposed method compared with state-of-the-art algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A single-loop primal-dual method reaches the best-known complexity for nonsmooth nonconvex orthogonality problems but ties sequential convergence to the KL property.

read the letter

The key point here is a retraction-free primal-dual algorithm for nonsmooth nonconvex problems with orthogonality constraints. It uses a linearized smoothing augmented Lagrangian scheme that runs in one loop and avoids solving subproblems, which is a step forward for scalability. This stands out because it matches the O(ε^{-3}) iteration complexity for finding ε-KKT points that the Riemannian literature has achieved, but without the manifold-specific tools that can introduce numerical errors or limit parallelization. The experiments on smooth and nonsmooth cases show it running faster and scaling better than current methods. What works is the focus on practical implementation: no retractions means staying in the ambient space, and the smoothing handles the nonsmooth parts directly. This could be useful in data science applications where orthogonality appears, like in matrix factorization with regularizers. The soft spot is the convergence analysis. While the complexity holds under standard assumptions, the asymptotic sequential convergence invokes the KL property. That assumption does not apply to every nonsmooth nonconvex function, so the full-sequence guarantee is limited to problems where the objective satisfies KL. The abstract gives no proof details, which makes it tough to assess how tightly the arguments are constructed. This paper is for optimization researchers and ML practitioners working on constrained nonsmooth problems. A reader looking for implementable algorithms with solid complexity bounds would find it worthwhile. It has enough substance to merit peer review, as the claims are testable and the problem is timely. Recommendation: send it to referees.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a linearized smoothing augmented Lagrangian method for nonsmooth nonconvex optimization problems subject to orthogonality constraints. The algorithm is single-loop and retraction-free, avoiding explicit subproblem solves. It establishes an O(ε^{-3}) iteration complexity for finding ε-KKT points under standard Lipschitz and boundedness assumptions, matching the best-known rates in the Riemannian literature. Asymptotic sequential convergence of the full sequence is shown by invoking the Kurdyka-Łojasiewicz property. Numerical experiments on smooth and nonsmooth test problems illustrate improved efficiency and scalability relative to existing methods.

Significance. If the stated complexity and convergence results are rigorously established, the work supplies a practical primal-dual framework that sidesteps the scalability and feasibility issues of retraction-based Riemannian methods for nonsmooth orthogonal-constrained problems. The explicit O(ε^{-3}) rate without subproblem solving and the conditional KL-based sequential convergence constitute clear technical contributions with direct relevance to large-scale data-science applications.

major comments (2)

[§4, Theorem 4.2] §4, Theorem 4.2: the O(ε^{-3}) complexity derivation for ε-KKT points is stated to hold under Lipschitz continuity of the objective and boundedness of the dual sequence, yet the dependence of the hidden constants on the smoothing parameter sequence is not made explicit; this leaves open whether the rate remains uniform when the smoothing parameter is driven to zero at the rate required by the analysis.
[§5, Theorem 5.1] §5, Theorem 5.1: the asymptotic sequential convergence result invokes the standard KL inequality with exponent θ ∈ [0,1); the proof sketch does not delineate the precise range of θ for which the argument closes, nor does it supply a concrete example of a nonsmooth nonconvex orthogonal-constrained problem that satisfies the KL property with the claimed exponent.

minor comments (2)

[Abstract] The abstract claims the method 'matches the best-known results' but supplies no citations to the specific Riemannian papers whose O(ε^{-3}) rates are being matched.
[Numerical Experiments] In the numerical section, the reported CPU times and iteration counts for the proposed method versus baselines would benefit from error bars or statistics over multiple random initializations to substantiate the claimed superiority.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We provide point-by-point responses to the major comments below and indicate the revisions we plan to make.

read point-by-point responses

Referee: [§4, Theorem 4.2] the O(ε^{-3}) complexity derivation for ε-KKT points is stated to hold under Lipschitz continuity of the objective and boundedness of the dual sequence, yet the dependence of the hidden constants on the smoothing parameter sequence is not made explicit; this leaves open whether the rate remains uniform when the smoothing parameter is driven to zero at the rate required by the analysis.

Authors: We appreciate the referee's observation regarding the explicit dependence on the smoothing parameter. In our analysis, the smoothing parameter sequence is chosen to decrease at a rate that ensures the constants remain bounded independently of the target accuracy ε. Specifically, by setting the smoothing parameter proportional to ε, the O(ε^{-3}) rate holds uniformly. We will revise the proof in Theorem 4.2 to make this dependence explicit and confirm the uniformity. revision: yes
Referee: [§5, Theorem 5.1] the asymptotic sequential convergence result invokes the standard KL inequality with exponent θ ∈ [0,1); the proof sketch does not delineate the precise range of θ for which the argument closes, nor does it supply a concrete example of a nonsmooth nonconvex orthogonal-constrained problem that satisfies the KL property with the claimed exponent.

Authors: We thank the referee for highlighting the need for precision in the KL analysis. The proof of Theorem 5.1 is valid for θ in [0, 1/2), as this range guarantees the necessary summability conditions for the sequence to converge. We will update the statement and proof sketch to specify this range explicitly. Regarding a concrete example, while the general theory applies to many problems in the literature (such as certain formulations of orthogonal nonnegative matrix factorization), providing a detailed verification of the KL exponent for a specific instance would require additional technical work that is beyond the current scope. We can add a remark pointing to relevant references. revision: partial

standing simulated objections not resolved

Supplying a fully verified concrete example of a nonsmooth nonconvex orthogonally constrained problem that satisfies the KL property with a specific exponent, including the explicit desingularizing function, cannot be provided without substantial new analysis not present in the manuscript.

Circularity Check

0 steps flagged

No circularity: complexity bound and convergence rest on standard external assumptions

full rationale

The paper derives an O(ε^{-3}) iteration complexity for ε-KKT points via a linearized smoothing augmented Lagrangian scheme under explicit Lipschitz and boundedness assumptions stated in the manuscript. This bound matches known Riemannian results but is obtained directly from the algorithm's update rules and penalty-parameter analysis rather than from any fitted quantity or self-referential definition. Asymptotic sequential convergence is obtained by invoking the standard Kurdyka-Łojasiewicz property as an external domain assumption, not derived from the paper's own iterates or parameters. No load-bearing step reduces to a self-citation chain, an ansatz smuggled via prior work, or a renaming of a known empirical pattern; the derivation remains self-contained against external optimization benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the Kurdyka-Lojasiewicz property as a key assumption for convergence and standard nonsmooth optimization theory; no free parameters or new entities are mentioned.

axioms (1)

domain assumption Kurdyka-Lojasiewicz property
Invoked to establish asymptotic sequential convergence of the algorithm sequence.

pith-pipeline@v0.9.0 · 5513 in / 1275 out tokens · 55994 ms · 2026-05-13T17:06:20.671358+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We establish its iteration complexity of O(ε^{-3}) for finding ε-KKT points... by invoking the standard Kurdyka-Lojasiewicz (KL) property, we demonstrate asymptotic sequential convergence
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the function X ↦ ½∥X⊤X − I_n∥²_F is 2-weakly convex

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

24 [JC16] Ian T Jolliffe and Jorge Cadima. Principal component analysis: A review and recent developments.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202,

work page 2065
[2]

MADMM: A generic algo- rithm for non-smooth optimization on manifolds

[KGB16] Artiom Kovnatsky, Klaus Glashoff, and Michael M Bronstein. MADMM: A generic algo- rithm for non-smooth optimization on manifolds. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 680–696. Springer,

work page 2016
[3]

Keshavan, Andrea Montanari, and Sewoong Oh

[KMO10] Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries.Journal of Machine Learning Research, 11(69):2057–2078,

work page 2057
[4]

Rotation group synchronization via quotient manifold.arXiv preprint arXiv:2306.12730,

[ZLS23] Linglingzhi Zhu, Chong Li, and Anthony Man-Cho So. Rotation group synchronization via quotient manifold.arXiv preprint arXiv:2306.12730,

work page arXiv

[1] [1]

24 [JC16] Ian T Jolliffe and Jorge Cadima. Principal component analysis: A review and recent developments.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202,

work page 2065

[2] [2]

MADMM: A generic algo- rithm for non-smooth optimization on manifolds

[KGB16] Artiom Kovnatsky, Klaus Glashoff, and Michael M Bronstein. MADMM: A generic algo- rithm for non-smooth optimization on manifolds. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 680–696. Springer,

work page 2016

[3] [3]

Keshavan, Andrea Montanari, and Sewoong Oh

[KMO10] Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy entries.Journal of Machine Learning Research, 11(69):2057–2078,

work page 2057

[4] [4]

Rotation group synchronization via quotient manifold.arXiv preprint arXiv:2306.12730,

[ZLS23] Linglingzhi Zhu, Chong Li, and Anthony Man-Cho So. Rotation group synchronization via quotient manifold.arXiv preprint arXiv:2306.12730,

work page arXiv