An inexact variable metric proximal linearization method for composite optimization on manifolds

Hao He; Ruyu Liu; Shaohua Pan; Yitian Qian

arxiv: 2508.12003 · v2 · submitted 2025-08-16 · 🧮 math.OC

An inexact variable metric proximal linearization method for composite optimization on manifolds

Hao He , Ruyu Liu , Yitian Qian , Shaohua Pan This is my paper

Pith reviewed 2026-05-18 23:06 UTC · model grok-4.3

classification 🧮 math.OC

keywords composite optimizationmanifold optimizationproximal linearizationvariable metricoracle complexityKurdyka-Lojasiewicz propertynonsmooth nonconvex optimizationretraction

0 comments

The pith

An inexact variable metric proximal linearization method achieves O(ε^{-3}) oracle complexity for composite optimization on manifolds under bounded iterates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an algorithm for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping over a C^2-smooth closed embedded manifold. The method applies a variable metric proximal linearization step that respects the manifold geometry through retractions and first-order information, solving each subproblem only inexactly via a practical criterion. Under the assumption that the generated sequence of points remains bounded, the approach reaches an approximate stationary point after order ε^{-3} calls to an oracle when a dual fast gradient method is used inside, and every cluster point satisfies the stationarity condition. Readers would care because many applied problems in data analysis and engineering involve nonsmooth objectives on curved domains where such complexity and convergence guarantees have been missing.

Core claim

We propose an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping F over a C^2-smooth embedded closed submanifold M. By leveraging the composite structure together with retractions and first-order information on M, each iteration solves an inexact subspace-constrained strongly convex problem according to a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, the method establishes O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver and proves that any cluster point is a stationary point. When the constructed potential function additionally满足

What carries the argument

Inexact variable metric proximal linearization method that at each step seeks an inexact solution to a subspace-constrained strongly convex subproblem using manifold retractions and first-order information.

Load-bearing premise

The sequence of iterates generated by the method remains bounded.

What would settle it

A concrete composite manifold problem on which the generated iterates become unbounded or on which more than order ε^{-3} oracle calls are required to reach stationarity.

Figures

Figures reproduced from arXiv: 2508.12003 by Hao He, Ruyu Liu, Shaohua Pan, Yitian Qian.

**Figure 2.** Figure 2: The objective values, running time and NMI scores b [PITH_FULL_IMAGE:figures/full_fig_p034_2.png] view at source ↗

**Figure 3.** Figure 3: The objective value and running time of RiVMPL for ( [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗

**Figure 4.** Figure 4: The objective values and running time of three meth [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗

**Figure 5.** Figure 5: The infeasibility and running time of three methods for (4) under different ρ. when ρ is greater than 2.5, the one by RiVMPL and RiALM increases to 1 sharply, but the one by RADMM decreases to 0 abnormally. We also observe that as λ increases in [0.5, 3.2], the running time of RiALM increases remarkably, but that of RiVMPL and RADMM has no too much variation. 0.5 1 1.5 2 2.5 3 3.5 0 0.2 0.4 0.6 0.8 1 spars… view at source ↗

**Figure 6.** Figure 6: The row sparsity and running time of three methods under different λ. By Figures 5-6, we test the three methods for solving (4) for (m, λ, ρ) = (50, 2.0, 0.5) with different n and r [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗

read the original abstract

This paper concerns the minimization of the composition of a nonsmooth convex function and a $\mathcal{C}^{1,1}$ mapping $F$ over a $\mathcal{C}^2$-smooth embedded closed submanifold $\mathcal{M}$. For this class of nonconvex and nonsmooth problems, we propose an inexact variable metric proximal linearization method by leveraging its composite structure and the retraction and first-order information of $\mathcal{M}$, which at each iteration seeks an inexact solution to a subspace constrained strongly convex problem by a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, we establish the $O(\epsilon^{-3})$ oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point. If in addition the constructed potential function has the Kurdyka-Lojasiewicz (KL) property on the set of cluster points, the iterate sequence converges to a stationary point, and if the potential function has the KL property of exponent $q\in[\frac{1}{2},1)$, the local convergence rate is characterized. We also provide a condition only involving the original data to identify the KL property of the potential function with an exponent $q\in[0,1)$. Numerical comparisons with the existing methods validate the efficiency of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete inexact proximal linearization scheme with O(ε^{-3}) complexity for composite nonsmooth problems on manifolds, but the guarantees rest on an unproven boundedness assumption for the iterates.

read the letter

The core contribution is an inexact variable-metric proximal linearization algorithm that exploits the composite structure, retractions on the manifold, and a dual fast gradient solver for the inner subproblem. They supply a practical inexactness criterion and derive O(ε^{-3}) oracle complexity plus stationarity of cluster points, all under the assumption that the iterate sequence stays bounded. They also characterize local rates when the potential satisfies the KL property and give a data-only condition to verify the exponent in [0,1). The numerical tests show competitive performance against existing manifold methods on the problems they tried.

Referee Report

2 major / 2 minor

Summary. This paper introduces an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping over a C^2-smooth embedded closed submanifold. The method uses retractions and first-order information on the manifold to solve inexact subspace-constrained strongly convex subproblems at each iteration. Under the boundedness assumption on the iterate sequence, it establishes an O(ε^{-3}) oracle complexity result when a dual fast gradient method is used as the inner solver, proves that cluster points are stationary points, and derives convergence to a stationary point (with rates) when a constructed potential function satisfies the Kurdyka-Łojasiewicz property. A data-only condition is also supplied to guarantee the KL property with exponent in [0,1).

Significance. If the boundedness assumption can be justified or replaced by verifiable conditions on the problem data, the paper would offer a useful algorithmic framework with explicit complexity guarantees for composite nonsmooth nonconvex problems on manifolds. The combination of variable metrics, inexact proximal linearization, and a practical inner solver is a reasonable extension of Euclidean methods. The data-dependent KL identification condition is a constructive contribution. However, the current dependence on an unverified boundedness hypothesis limits the strength of the theoretical claims.

major comments (2)

[Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.
[Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.

minor comments (2)

[Method description (§3)] The precise definition of the variable metric and its relation to the tangent space at each iterate should be stated more explicitly to avoid ambiguity in the subproblem formulation.
[Preliminaries] A few minor notation inconsistencies appear in the description of the retraction and its differential; these do not affect the results but should be standardized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.

Authors: We acknowledge that the O(ε^{-3}) complexity and stationarity results are conditioned on boundedness of {x_k}, which is required to prevent potential divergence from invalidating the conclusions even as the potential decreases. This is a standard hypothesis in nonconvex manifold optimization analyses. In the revision we will state the assumption more prominently in the abstract and Theorem 4.1, and add a remark with verifiable sufficient conditions on the data (e.g., coercivity of the composite objective relative to the manifold or compactness of level sets) that guarantee bounded iterates. A fully general coercivity condition covering all nonsmooth nonconvex instances is difficult to obtain without further restrictions, but the added discussion will clarify the scope of the claims. revision: partial
Referee: [Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.

Authors: The referee is correct that the stationarity argument uses boundedness to ensure convergent subsequences exist, since the manifold is closed but not necessarily compact. We will revise the proof in §4 to explicitly note this dependence and to clarify that boundedness supplies the needed compactness for subsequence extraction. The same sufficient conditions on the problem data added in response to the first comment will be referenced here to strengthen the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are conditional on explicit external assumption

full rationale

The paper's central results—the O(ε^{-3}) oracle complexity and stationarity of cluster points—are explicitly derived under the boundedness assumption on the iterate sequence, which is stated as an input hypothesis rather than obtained from the algorithm or prior steps. The subsequent KL-property analysis, local convergence rates, and data-only condition for identifying the KL exponent are standard consequences of manifold geometry and nonsmooth optimization theory, without any self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the claims to their own inputs. The derivation chain remains self-contained once the boundedness hypothesis is granted, with no evidence of ansatzes smuggled via citation or renaming of known results as novel unifications.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on two standard but nontrivial domain assumptions common in nonsmooth optimization: boundedness of iterates and the Kurdyka-Lojasiewicz property of a constructed potential function. No free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Boundedness of the iterate sequence
Invoked to obtain the O(ε^{-3}) oracle complexity and to guarantee that cluster points are stationary.
domain assumption Kurdyka-Lojasiewicz property of the potential function on the set of cluster points
Required for global convergence of the sequence and for characterizing local convergence rates.

pith-pipeline@v0.9.0 · 5775 in / 1409 out tokens · 52938 ms · 2026-05-18T23:06:59.804931+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the boundedness assumption on the iterate sequence, we establish the O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Absil PA, Mahony R, Sepulchre R (2008) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ)

work page 2008
[2]

Mathematical Programming 116(1):5–16

Attouch H, Bolte J (2009) On the convergence of the proxim al algorithm for nonsmooth functions involving analytic features. Mathematical Programming 116(1):5–16

work page 2009
[3]

Mathematics of Operations Research 35(2):438–457

Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality. Mathematics of Operations Research 35(2):438–457

work page 2010
[4]

Mathematical Programming 137(1):91–129

Attouch H, Bolte J, Svaiter BF (2013) Convergence of desc ent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming 137(1):91–129

work page 2013
[5]

IMA Journal of Nu- merical Analysis 8(1):141–148

Barzilai J, Borwein JM (1988) Two-point step size gradie nt methods. IMA Journal of Nu- merical Analysis 8(1):141–148

work page 1988
[6]

Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia, US)

work page 2017
[7]

SIAM Journal on Optimization 33(3):1473–1493

Beck A, Rosset I (2023) A dynamic smoothing technique for a class of nonsmooth optimiza- tion problems on manifolds. SIAM Journal on Optimization 33(3):1473–1493

work page 2023
[8]

arXiv preprint arXiv:2108.12447

Bendokat T, Zimmermann R (2021) The real symplectic Stie fel and Grassmann manifolds: metrics, geodesics and applications. arXiv preprint arXiv:2108.12447

work page arXiv 2021
[9]

Mathematical Pro- gramming 117(1):5–19

Bolte J, Daniilidis A, Lewis A (2009) Tame functions are s emismooth. Mathematical Pro- gramming 117(1):5–19

work page 2009
[10]

Mathematical Programming 146(1):459–494

Bolte J, Sabach S, Teboulle M (2014) Proximal alternati ng linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1):459–494

work page 2014
[11]

Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge Uni- versity Press, Cambridge, UK)

work page 2023
[12]

IMA Journal of Numerical Analysis 39(1):1–33

Boumal N, Absil PA, Cartis C (2019) Global rates of conve rgence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis 39(1):1–33

work page 2019
[13]

SIAM Journal on Optimization 30(1):210–239

Chen S, Ma S, Man-Cho So A, Zhang T (2020) Proximal gradie nt method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization 30(1):210–239

work page 2020
[14]

Clarke FH (1983) Optimization and Nonsmooth Analysis (Society for Industrial and Applied Mathematics, Philadelphia, US). 41

work page 1983
[15]

Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complemen - tarity Problems (Springer, New York, US)

work page 2003
[16]

International Conference on Geometric Science of In- formation, 789–796 (Springer)

Gao B, Son NT, Absil PA, Stykel T (2021) Geometry of the sy mplectic Stiefel manifold endowed with the Euclidean metric. International Conference on Geometric Science of In- formation, 789–796 (Springer)

work page 2021
[17]

Linear Algebra and its Applications 682:50–85

Gao B, Son NT, Stykel T (2024) Optimization on the symple ctic Stiefel manifold: SR decomposition-based retraction and applications. Linear Algebra and its Applications 682:50–85

work page 2024
[18]

Applied Mathematics and Opti- mization 11(1):43–56

Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Gener alized Hessian matrix and second- order optimality conditions for problems with C1,1 data. Applied Mathematics and Opti- mization 11(1):43–56

work page 1984
[19]

SIAM Journal on Optimization 28(1):596–619

Hosseini S, Huang W, Yousefpour R (2018) Line search alg orithms for locally Lipschitz functions on Riemannian manifolds. SIAM Journal on Optimization 28(1):596–619

work page 2018
[20]

Mathematical Program- ming 194(1):371–413

Huang W, Wei K (2022) Riemannian proximal gradient meth ods. Mathematical Program- ming 194(1):371–413

work page 2022
[21]

Computational Optimization and Applications 85(1):1–32

Huang W, Wei K (2023) An inexact Riemannian proximal gra dient method. Computational Optimization and Applications 85(1):1–32

work page 2023
[22]

SIAM Journal on Optimization 19(4):1894–1917

Ioﬀe AD (2009) An invitation to tame optimization. SIAM Journal on Optimization 19(4):1894–1917

work page 2009
[23]

arXiv preprint arXiv:2404.08463

Jensen R, Zimmermann R (2024) Riemannian optimization on the symplectic Stiefel mani- fold using second-order information. arXiv preprint arXiv:2404.08463

work page arXiv 2024
[24]

Foundations of Computational Mathematics 18(5):1199–1232

Li G, Pong TK (2018) Calculus of the exponent of Kurdyka– Łojasiewicz inequality and its applications to linear convergence of ﬁrst-order metho ds. Foundations of Computational Mathematics 18(5):1199–1232

work page 2018
[25]

Mathematics of Operations Research

Li J, Ma S, Srivastava T (2024) A Riemannian alternating direction method of multipliers. Mathematics of Operations Research

work page 2024
[26]

arXiv preprint arXiv:2411.15776

Li Q, Zhang N, Yan H, Feng J (2024) Proximal methods for st ructured nonsmooth opti- mization over Remannian submanifolds. arXiv preprint arXiv:2411.15776

work page arXiv 2024
[27]

Mathematical Programming 178(1):215–262

Liu H, So AMC, Wu W (2019) Quadratic optimization with or thogonality constraint: explicit Łojasiewicz exponent and linear convergence of re traction-based line-search and stochastic variance-reduced gradient methods. Mathematical Programming 178(1):215–262

work page 2019
[28]

Journal of Scientiﬁc Computing 99(2):30

Liu X, Xiao N, Yuan YX (2024) A penalty-free infeasible a pproach for a class of nonsmooth optimization problems over the Stiefel manifold. Journal of Scientiﬁc Computing 99(2):30

work page 2024
[29]

IEEE Transactions on Image Processing 25(6):2833–2843

Lu C, Yan S, Lin Z (2016) Convex sparse spectral clusteri ng: single-view to multi-view. IEEE Transactions on Image Processing 25(6):2833–2843

work page 2016
[30]

Mathematical Programming 135(1):149–193

Lu Z, Zhang Y (2012) An augmented Lagrangian approach fo r sparse principal component analysis. Mathematical Programming 135(1):149–193

work page 2012
[31]

Optimization Methods and Software 31(3):645–678

Necoara I, Patrascu A (2016) Iteration complexity anal ysis of dual ﬁrst-order methods for conic convex programming. Optimization Methods and Software 31(3):645–678

work page 2016
[32]

Dokl Akad Nauk Sssr , volume 269, 543

Nesterov Y (1983) A method for solving the convex progra mming problem with convergence rate o(1/k 2). Dokl Akad Nauk Sssr , volume 269, 543

work page 1983
[33]

Bioinformat- ics 34(12):2069–2076

Park S, Zhao H (2018) Spectral clustering based on learn ing similarity matrix. Bioinformat- ics 34(12):2069–2076

work page 2018
[34]

SIAM Jour- nal on Scientiﬁc Computing 38(1):A1–A27

Peng L, Mohseni K (2016) Symplectic model reduction of H amiltonian systems. SIAM Jour- nal on Scientiﬁc Computing 38(1):A1–A27. 42

work page 2016
[35]

Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, Berlin, Germany)

work page 1998
[36]

SIAM Journal on Optimization 34(1):654–681

Si W, Absil PA, Huang W, Jiang R, Vary S (2024) A Riemannia n proximal Newton method. SIAM Journal on Optimization 34(1):654–681

work page 2024
[37]

Journal of Machine Learning Research 3(Dec):583–617

Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge r euse framework for combining multiple partitions. Journal of Machine Learning Research 3(Dec):583–617

work page 2002
[38]

arXiv preprint arXiv:2503.02494

Wang L, Liu X, Chen X (2025) The distributionally robust optimization model of sparse principal component analysis. arXiv preprint arXiv:2503.02494

work page arXiv 2025
[39]

Journal of Scientiﬁc Computing 95(2):39

Wang Q, Yang WH (2023) Proximal quasi-Newton method for composite optimization over the Stiefel manifold. Journal of Scientiﬁc Computing 95(2):39

work page 2023
[40]

Computational Optimization and Applications 89(2):419–457

Wang Q, Yang WH (2024) An adaptive regularized proximal Newton-type methods for com- posite optimization over the Stiefel manifold. Computational Optimization and Applications 89(2):419–457

work page 2024
[41]

INFORMS Journal on Optimization 4(2):200–214

Wang Z, Liu B, Chen S, Ma S, Xue L, Zhao H (2022) A manifold p roximal linear method for sparse spectral clustering with application to single- cell RNA sequencing data analysis. INFORMS Journal on Optimization 4(2):200–214

work page 2022
[42]

Mathematical Programming 142(1):397–434

Wen Z, Yin W (2013) A feasible method for optimization wi th orthogonality constraints. Mathematical Programming 142(1):397–434

work page 2013
[43]

SIAM Journal on Optimization 31(4):3097–3126

Xiao N, Liu X, Yuan YX (2021) Exact penalty function for ℓ2,1 norm minimization over the Stiefel manifold. SIAM Journal on Optimization 31(4):3097–3126

work page 2021
[44]

Optimization Letters 1–19

Xu M, Jiang B, Liu YF, So AMC (2025) On the oracle complexi ty of a Riemannian in- exact augmented Lagrangian method for nonsmooth composite problems over Riemannian submanifolds. Optimization Letters 1–19

work page 2025
[45]

Mathematics of Operations Research 49(3):1710–1733

Zhang C, Chen X, Ma S (2024) A Riemannian smoothing steep est descent method for non-Lipschitz optimization on embedded submanifolds of Rn. Mathematics of Operations Research 49(3):1710–1733

work page 2024
[46]

SIAM Journal on Optimization 20(4):1737–1765

Zhao XY, Sun D, Toh KC (2010) A Newton-CG augmented Lagra ngian method for semidef- inite programming. SIAM Journal on Optimization 20(4):1737–1765

work page 2010
[47]

Mathematical Program- ming 201(1):1–61

Zhou Y, Bao C, Ding C, Zhu J (2023) A semismooth Newton bas ed augmented La- grangian method for nonsmooth optimization on matrix manif olds. Mathematical Program- ming 201(1):1–61. 43

work page 2023

[1] [1]

Absil PA, Mahony R, Sepulchre R (2008) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ)

work page 2008

[2] [2]

Mathematical Programming 116(1):5–16

Attouch H, Bolte J (2009) On the convergence of the proxim al algorithm for nonsmooth functions involving analytic features. Mathematical Programming 116(1):5–16

work page 2009

[3] [3]

Mathematics of Operations Research 35(2):438–457

Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality. Mathematics of Operations Research 35(2):438–457

work page 2010

[4] [4]

Mathematical Programming 137(1):91–129

Attouch H, Bolte J, Svaiter BF (2013) Convergence of desc ent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming 137(1):91–129

work page 2013

[5] [5]

IMA Journal of Nu- merical Analysis 8(1):141–148

Barzilai J, Borwein JM (1988) Two-point step size gradie nt methods. IMA Journal of Nu- merical Analysis 8(1):141–148

work page 1988

[6] [6]

Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia, US)

work page 2017

[7] [7]

SIAM Journal on Optimization 33(3):1473–1493

Beck A, Rosset I (2023) A dynamic smoothing technique for a class of nonsmooth optimiza- tion problems on manifolds. SIAM Journal on Optimization 33(3):1473–1493

work page 2023

[8] [8]

arXiv preprint arXiv:2108.12447

Bendokat T, Zimmermann R (2021) The real symplectic Stie fel and Grassmann manifolds: metrics, geodesics and applications. arXiv preprint arXiv:2108.12447

work page arXiv 2021

[9] [9]

Mathematical Pro- gramming 117(1):5–19

Bolte J, Daniilidis A, Lewis A (2009) Tame functions are s emismooth. Mathematical Pro- gramming 117(1):5–19

work page 2009

[10] [10]

Mathematical Programming 146(1):459–494

Bolte J, Sabach S, Teboulle M (2014) Proximal alternati ng linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1):459–494

work page 2014

[11] [11]

Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge Uni- versity Press, Cambridge, UK)

work page 2023

[12] [12]

IMA Journal of Numerical Analysis 39(1):1–33

Boumal N, Absil PA, Cartis C (2019) Global rates of conve rgence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis 39(1):1–33

work page 2019

[13] [13]

SIAM Journal on Optimization 30(1):210–239

Chen S, Ma S, Man-Cho So A, Zhang T (2020) Proximal gradie nt method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization 30(1):210–239

work page 2020

[14] [14]

Clarke FH (1983) Optimization and Nonsmooth Analysis (Society for Industrial and Applied Mathematics, Philadelphia, US). 41

work page 1983

[15] [15]

Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complemen - tarity Problems (Springer, New York, US)

work page 2003

[16] [16]

International Conference on Geometric Science of In- formation, 789–796 (Springer)

Gao B, Son NT, Absil PA, Stykel T (2021) Geometry of the sy mplectic Stiefel manifold endowed with the Euclidean metric. International Conference on Geometric Science of In- formation, 789–796 (Springer)

work page 2021

[17] [17]

Linear Algebra and its Applications 682:50–85

Gao B, Son NT, Stykel T (2024) Optimization on the symple ctic Stiefel manifold: SR decomposition-based retraction and applications. Linear Algebra and its Applications 682:50–85

work page 2024

[18] [18]

Applied Mathematics and Opti- mization 11(1):43–56

Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Gener alized Hessian matrix and second- order optimality conditions for problems with C1,1 data. Applied Mathematics and Opti- mization 11(1):43–56

work page 1984

[19] [19]

SIAM Journal on Optimization 28(1):596–619

Hosseini S, Huang W, Yousefpour R (2018) Line search alg orithms for locally Lipschitz functions on Riemannian manifolds. SIAM Journal on Optimization 28(1):596–619

work page 2018

[20] [20]

Mathematical Program- ming 194(1):371–413

Huang W, Wei K (2022) Riemannian proximal gradient meth ods. Mathematical Program- ming 194(1):371–413

work page 2022

[21] [21]

Computational Optimization and Applications 85(1):1–32

Huang W, Wei K (2023) An inexact Riemannian proximal gra dient method. Computational Optimization and Applications 85(1):1–32

work page 2023

[22] [22]

SIAM Journal on Optimization 19(4):1894–1917

Ioﬀe AD (2009) An invitation to tame optimization. SIAM Journal on Optimization 19(4):1894–1917

work page 2009

[23] [23]

arXiv preprint arXiv:2404.08463

Jensen R, Zimmermann R (2024) Riemannian optimization on the symplectic Stiefel mani- fold using second-order information. arXiv preprint arXiv:2404.08463

work page arXiv 2024

[24] [24]

Foundations of Computational Mathematics 18(5):1199–1232

Li G, Pong TK (2018) Calculus of the exponent of Kurdyka– Łojasiewicz inequality and its applications to linear convergence of ﬁrst-order metho ds. Foundations of Computational Mathematics 18(5):1199–1232

work page 2018

[25] [25]

Mathematics of Operations Research

Li J, Ma S, Srivastava T (2024) A Riemannian alternating direction method of multipliers. Mathematics of Operations Research

work page 2024

[26] [26]

arXiv preprint arXiv:2411.15776

Li Q, Zhang N, Yan H, Feng J (2024) Proximal methods for st ructured nonsmooth opti- mization over Remannian submanifolds. arXiv preprint arXiv:2411.15776

work page arXiv 2024

[27] [27]

Mathematical Programming 178(1):215–262

Liu H, So AMC, Wu W (2019) Quadratic optimization with or thogonality constraint: explicit Łojasiewicz exponent and linear convergence of re traction-based line-search and stochastic variance-reduced gradient methods. Mathematical Programming 178(1):215–262

work page 2019

[28] [28]

Journal of Scientiﬁc Computing 99(2):30

Liu X, Xiao N, Yuan YX (2024) A penalty-free infeasible a pproach for a class of nonsmooth optimization problems over the Stiefel manifold. Journal of Scientiﬁc Computing 99(2):30

work page 2024

[29] [29]

IEEE Transactions on Image Processing 25(6):2833–2843

Lu C, Yan S, Lin Z (2016) Convex sparse spectral clusteri ng: single-view to multi-view. IEEE Transactions on Image Processing 25(6):2833–2843

work page 2016

[30] [30]

Mathematical Programming 135(1):149–193

Lu Z, Zhang Y (2012) An augmented Lagrangian approach fo r sparse principal component analysis. Mathematical Programming 135(1):149–193

work page 2012

[31] [31]

Optimization Methods and Software 31(3):645–678

Necoara I, Patrascu A (2016) Iteration complexity anal ysis of dual ﬁrst-order methods for conic convex programming. Optimization Methods and Software 31(3):645–678

work page 2016

[32] [32]

Dokl Akad Nauk Sssr , volume 269, 543

Nesterov Y (1983) A method for solving the convex progra mming problem with convergence rate o(1/k 2). Dokl Akad Nauk Sssr , volume 269, 543

work page 1983

[33] [33]

Bioinformat- ics 34(12):2069–2076

Park S, Zhao H (2018) Spectral clustering based on learn ing similarity matrix. Bioinformat- ics 34(12):2069–2076

work page 2018

[34] [34]

SIAM Jour- nal on Scientiﬁc Computing 38(1):A1–A27

Peng L, Mohseni K (2016) Symplectic model reduction of H amiltonian systems. SIAM Jour- nal on Scientiﬁc Computing 38(1):A1–A27. 42

work page 2016

[35] [35]

Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, Berlin, Germany)

work page 1998

[36] [36]

SIAM Journal on Optimization 34(1):654–681

Si W, Absil PA, Huang W, Jiang R, Vary S (2024) A Riemannia n proximal Newton method. SIAM Journal on Optimization 34(1):654–681

work page 2024

[37] [37]

Journal of Machine Learning Research 3(Dec):583–617

Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge r euse framework for combining multiple partitions. Journal of Machine Learning Research 3(Dec):583–617

work page 2002

[38] [38]

arXiv preprint arXiv:2503.02494

Wang L, Liu X, Chen X (2025) The distributionally robust optimization model of sparse principal component analysis. arXiv preprint arXiv:2503.02494

work page arXiv 2025

[39] [39]

Journal of Scientiﬁc Computing 95(2):39

Wang Q, Yang WH (2023) Proximal quasi-Newton method for composite optimization over the Stiefel manifold. Journal of Scientiﬁc Computing 95(2):39

work page 2023

[40] [40]

Computational Optimization and Applications 89(2):419–457

Wang Q, Yang WH (2024) An adaptive regularized proximal Newton-type methods for com- posite optimization over the Stiefel manifold. Computational Optimization and Applications 89(2):419–457

work page 2024

[41] [41]

INFORMS Journal on Optimization 4(2):200–214

Wang Z, Liu B, Chen S, Ma S, Xue L, Zhao H (2022) A manifold p roximal linear method for sparse spectral clustering with application to single- cell RNA sequencing data analysis. INFORMS Journal on Optimization 4(2):200–214

work page 2022

[42] [42]

Mathematical Programming 142(1):397–434

Wen Z, Yin W (2013) A feasible method for optimization wi th orthogonality constraints. Mathematical Programming 142(1):397–434

work page 2013

[43] [43]

SIAM Journal on Optimization 31(4):3097–3126

Xiao N, Liu X, Yuan YX (2021) Exact penalty function for ℓ2,1 norm minimization over the Stiefel manifold. SIAM Journal on Optimization 31(4):3097–3126

work page 2021

[44] [44]

Optimization Letters 1–19

Xu M, Jiang B, Liu YF, So AMC (2025) On the oracle complexi ty of a Riemannian in- exact augmented Lagrangian method for nonsmooth composite problems over Riemannian submanifolds. Optimization Letters 1–19

work page 2025

[45] [45]

Mathematics of Operations Research 49(3):1710–1733

Zhang C, Chen X, Ma S (2024) A Riemannian smoothing steep est descent method for non-Lipschitz optimization on embedded submanifolds of Rn. Mathematics of Operations Research 49(3):1710–1733

work page 2024

[46] [46]

SIAM Journal on Optimization 20(4):1737–1765

Zhao XY, Sun D, Toh KC (2010) A Newton-CG augmented Lagra ngian method for semidef- inite programming. SIAM Journal on Optimization 20(4):1737–1765

work page 2010

[47] [47]

Mathematical Program- ming 201(1):1–61

Zhou Y, Bao C, Ding C, Zhu J (2023) A semismooth Newton bas ed augmented La- grangian method for nonsmooth optimization on matrix manif olds. Mathematical Program- ming 201(1):1–61. 43

work page 2023