An inexact variable metric proximal linearization method for composite optimization on manifolds
Pith reviewed 2026-05-18 23:06 UTC · model grok-4.3
The pith
An inexact variable metric proximal linearization method achieves O(ε^{-3}) oracle complexity for composite optimization on manifolds under bounded iterates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping F over a C^2-smooth embedded closed submanifold M. By leveraging the composite structure together with retractions and first-order information on M, each iteration solves an inexact subspace-constrained strongly convex problem according to a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, the method establishes O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver and proves that any cluster point is a stationary point. When the constructed potential function additionally满足
What carries the argument
Inexact variable metric proximal linearization method that at each step seeks an inexact solution to a subspace-constrained strongly convex subproblem using manifold retractions and first-order information.
Load-bearing premise
The sequence of iterates generated by the method remains bounded.
What would settle it
A concrete composite manifold problem on which the generated iterates become unbounded or on which more than order ε^{-3} oracle calls are required to reach stationarity.
Figures
read the original abstract
This paper concerns the minimization of the composition of a nonsmooth convex function and a $\mathcal{C}^{1,1}$ mapping $F$ over a $\mathcal{C}^2$-smooth embedded closed submanifold $\mathcal{M}$. For this class of nonconvex and nonsmooth problems, we propose an inexact variable metric proximal linearization method by leveraging its composite structure and the retraction and first-order information of $\mathcal{M}$, which at each iteration seeks an inexact solution to a subspace constrained strongly convex problem by a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, we establish the $O(\epsilon^{-3})$ oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point. If in addition the constructed potential function has the Kurdyka-Lojasiewicz (KL) property on the set of cluster points, the iterate sequence converges to a stationary point, and if the potential function has the KL property of exponent $q\in[\frac{1}{2},1)$, the local convergence rate is characterized. We also provide a condition only involving the original data to identify the KL property of the potential function with an exponent $q\in[0,1)$. Numerical comparisons with the existing methods validate the efficiency of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper introduces an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping over a C^2-smooth embedded closed submanifold. The method uses retractions and first-order information on the manifold to solve inexact subspace-constrained strongly convex subproblems at each iteration. Under the boundedness assumption on the iterate sequence, it establishes an O(ε^{-3}) oracle complexity result when a dual fast gradient method is used as the inner solver, proves that cluster points are stationary points, and derives convergence to a stationary point (with rates) when a constructed potential function satisfies the Kurdyka-Łojasiewicz property. A data-only condition is also supplied to guarantee the KL property with exponent in [0,1).
Significance. If the boundedness assumption can be justified or replaced by verifiable conditions on the problem data, the paper would offer a useful algorithmic framework with explicit complexity guarantees for composite nonsmooth nonconvex problems on manifolds. The combination of variable metrics, inexact proximal linearization, and a practical inner solver is a reasonable extension of Euclidean methods. The data-dependent KL identification condition is a constructive contribution. However, the current dependence on an unverified boundedness hypothesis limits the strength of the theoretical claims.
major comments (2)
- [Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.
- [Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.
minor comments (2)
- [Method description (§3)] The precise definition of the variable metric and its relation to the tangent space at each iterate should be stated more explicitly to avoid ambiguity in the subproblem formulation.
- [Preliminaries] A few minor notation inconsistencies appear in the description of the retraction and its differential; these do not affect the results but should be standardized.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the major comments point by point below, indicating planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.
Authors: We acknowledge that the O(ε^{-3}) complexity and stationarity results are conditioned on boundedness of {x_k}, which is required to prevent potential divergence from invalidating the conclusions even as the potential decreases. This is a standard hypothesis in nonconvex manifold optimization analyses. In the revision we will state the assumption more prominently in the abstract and Theorem 4.1, and add a remark with verifiable sufficient conditions on the data (e.g., coercivity of the composite objective relative to the manifold or compactness of level sets) that guarantee bounded iterates. A fully general coercivity condition covering all nonsmooth nonconvex instances is difficult to obtain without further restrictions, but the added discussion will clarify the scope of the claims. revision: partial
-
Referee: [Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.
Authors: The referee is correct that the stationarity argument uses boundedness to ensure convergent subsequences exist, since the manifold is closed but not necessarily compact. We will revise the proof in §4 to explicitly note this dependence and to clarify that boundedness supplies the needed compactness for subsequence extraction. The same sufficient conditions on the problem data added in response to the first comment will be referenced here to strengthen the result. revision: yes
Circularity Check
No significant circularity; results are conditional on explicit external assumption
full rationale
The paper's central results—the O(ε^{-3}) oracle complexity and stationarity of cluster points—are explicitly derived under the boundedness assumption on the iterate sequence, which is stated as an input hypothesis rather than obtained from the algorithm or prior steps. The subsequent KL-property analysis, local convergence rates, and data-only condition for identifying the KL exponent are standard consequences of manifold geometry and nonsmooth optimization theory, without any self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the claims to their own inputs. The derivation chain remains self-contained once the boundedness hypothesis is granted, with no evidence of ansatzes smuggled via citation or renaming of known results as novel unifications.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Boundedness of the iterate sequence
- domain assumption Kurdyka-Lojasiewicz property of the potential function on the set of cluster points
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under the boundedness assumption on the iterate sequence, we establish the O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Absil PA, Mahony R, Sepulchre R (2008) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ)
work page 2008
-
[2]
Mathematical Programming 116(1):5–16
Attouch H, Bolte J (2009) On the convergence of the proxim al algorithm for nonsmooth functions involving analytic features. Mathematical Programming 116(1):5–16
work page 2009
-
[3]
Mathematics of Operations Research 35(2):438–457
Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality. Mathematics of Operations Research 35(2):438–457
work page 2010
-
[4]
Mathematical Programming 137(1):91–129
Attouch H, Bolte J, Svaiter BF (2013) Convergence of desc ent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming 137(1):91–129
work page 2013
-
[5]
IMA Journal of Nu- merical Analysis 8(1):141–148
Barzilai J, Borwein JM (1988) Two-point step size gradie nt methods. IMA Journal of Nu- merical Analysis 8(1):141–148
work page 1988
-
[6]
Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia, US)
work page 2017
-
[7]
SIAM Journal on Optimization 33(3):1473–1493
Beck A, Rosset I (2023) A dynamic smoothing technique for a class of nonsmooth optimiza- tion problems on manifolds. SIAM Journal on Optimization 33(3):1473–1493
work page 2023
-
[8]
arXiv preprint arXiv:2108.12447
Bendokat T, Zimmermann R (2021) The real symplectic Stie fel and Grassmann manifolds: metrics, geodesics and applications. arXiv preprint arXiv:2108.12447
-
[9]
Mathematical Pro- gramming 117(1):5–19
Bolte J, Daniilidis A, Lewis A (2009) Tame functions are s emismooth. Mathematical Pro- gramming 117(1):5–19
work page 2009
-
[10]
Mathematical Programming 146(1):459–494
Bolte J, Sabach S, Teboulle M (2014) Proximal alternati ng linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1):459–494
work page 2014
-
[11]
Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge Uni- versity Press, Cambridge, UK)
work page 2023
-
[12]
IMA Journal of Numerical Analysis 39(1):1–33
Boumal N, Absil PA, Cartis C (2019) Global rates of conve rgence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis 39(1):1–33
work page 2019
-
[13]
SIAM Journal on Optimization 30(1):210–239
Chen S, Ma S, Man-Cho So A, Zhang T (2020) Proximal gradie nt method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization 30(1):210–239
work page 2020
-
[14]
Clarke FH (1983) Optimization and Nonsmooth Analysis (Society for Industrial and Applied Mathematics, Philadelphia, US). 41
work page 1983
-
[15]
Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complemen - tarity Problems (Springer, New York, US)
work page 2003
-
[16]
International Conference on Geometric Science of In- formation, 789–796 (Springer)
Gao B, Son NT, Absil PA, Stykel T (2021) Geometry of the sy mplectic Stiefel manifold endowed with the Euclidean metric. International Conference on Geometric Science of In- formation, 789–796 (Springer)
work page 2021
-
[17]
Linear Algebra and its Applications 682:50–85
Gao B, Son NT, Stykel T (2024) Optimization on the symple ctic Stiefel manifold: SR decomposition-based retraction and applications. Linear Algebra and its Applications 682:50–85
work page 2024
-
[18]
Applied Mathematics and Opti- mization 11(1):43–56
Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Gener alized Hessian matrix and second- order optimality conditions for problems with C1,1 data. Applied Mathematics and Opti- mization 11(1):43–56
work page 1984
-
[19]
SIAM Journal on Optimization 28(1):596–619
Hosseini S, Huang W, Yousefpour R (2018) Line search alg orithms for locally Lipschitz functions on Riemannian manifolds. SIAM Journal on Optimization 28(1):596–619
work page 2018
-
[20]
Mathematical Program- ming 194(1):371–413
Huang W, Wei K (2022) Riemannian proximal gradient meth ods. Mathematical Program- ming 194(1):371–413
work page 2022
-
[21]
Computational Optimization and Applications 85(1):1–32
Huang W, Wei K (2023) An inexact Riemannian proximal gra dient method. Computational Optimization and Applications 85(1):1–32
work page 2023
-
[22]
SIAM Journal on Optimization 19(4):1894–1917
Ioffe AD (2009) An invitation to tame optimization. SIAM Journal on Optimization 19(4):1894–1917
work page 2009
-
[23]
arXiv preprint arXiv:2404.08463
Jensen R, Zimmermann R (2024) Riemannian optimization on the symplectic Stiefel mani- fold using second-order information. arXiv preprint arXiv:2404.08463
-
[24]
Foundations of Computational Mathematics 18(5):1199–1232
Li G, Pong TK (2018) Calculus of the exponent of Kurdyka– Łojasiewicz inequality and its applications to linear convergence of first-order metho ds. Foundations of Computational Mathematics 18(5):1199–1232
work page 2018
-
[25]
Mathematics of Operations Research
Li J, Ma S, Srivastava T (2024) A Riemannian alternating direction method of multipliers. Mathematics of Operations Research
work page 2024
-
[26]
arXiv preprint arXiv:2411.15776
Li Q, Zhang N, Yan H, Feng J (2024) Proximal methods for st ructured nonsmooth opti- mization over Remannian submanifolds. arXiv preprint arXiv:2411.15776
-
[27]
Mathematical Programming 178(1):215–262
Liu H, So AMC, Wu W (2019) Quadratic optimization with or thogonality constraint: explicit Łojasiewicz exponent and linear convergence of re traction-based line-search and stochastic variance-reduced gradient methods. Mathematical Programming 178(1):215–262
work page 2019
-
[28]
Journal of Scientific Computing 99(2):30
Liu X, Xiao N, Yuan YX (2024) A penalty-free infeasible a pproach for a class of nonsmooth optimization problems over the Stiefel manifold. Journal of Scientific Computing 99(2):30
work page 2024
-
[29]
IEEE Transactions on Image Processing 25(6):2833–2843
Lu C, Yan S, Lin Z (2016) Convex sparse spectral clusteri ng: single-view to multi-view. IEEE Transactions on Image Processing 25(6):2833–2843
work page 2016
-
[30]
Mathematical Programming 135(1):149–193
Lu Z, Zhang Y (2012) An augmented Lagrangian approach fo r sparse principal component analysis. Mathematical Programming 135(1):149–193
work page 2012
-
[31]
Optimization Methods and Software 31(3):645–678
Necoara I, Patrascu A (2016) Iteration complexity anal ysis of dual first-order methods for conic convex programming. Optimization Methods and Software 31(3):645–678
work page 2016
-
[32]
Dokl Akad Nauk Sssr , volume 269, 543
Nesterov Y (1983) A method for solving the convex progra mming problem with convergence rate o(1/k 2). Dokl Akad Nauk Sssr , volume 269, 543
work page 1983
-
[33]
Bioinformat- ics 34(12):2069–2076
Park S, Zhao H (2018) Spectral clustering based on learn ing similarity matrix. Bioinformat- ics 34(12):2069–2076
work page 2018
-
[34]
SIAM Jour- nal on Scientific Computing 38(1):A1–A27
Peng L, Mohseni K (2016) Symplectic model reduction of H amiltonian systems. SIAM Jour- nal on Scientific Computing 38(1):A1–A27. 42
work page 2016
-
[35]
Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, Berlin, Germany)
work page 1998
-
[36]
SIAM Journal on Optimization 34(1):654–681
Si W, Absil PA, Huang W, Jiang R, Vary S (2024) A Riemannia n proximal Newton method. SIAM Journal on Optimization 34(1):654–681
work page 2024
-
[37]
Journal of Machine Learning Research 3(Dec):583–617
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge r euse framework for combining multiple partitions. Journal of Machine Learning Research 3(Dec):583–617
work page 2002
-
[38]
arXiv preprint arXiv:2503.02494
Wang L, Liu X, Chen X (2025) The distributionally robust optimization model of sparse principal component analysis. arXiv preprint arXiv:2503.02494
-
[39]
Journal of Scientific Computing 95(2):39
Wang Q, Yang WH (2023) Proximal quasi-Newton method for composite optimization over the Stiefel manifold. Journal of Scientific Computing 95(2):39
work page 2023
-
[40]
Computational Optimization and Applications 89(2):419–457
Wang Q, Yang WH (2024) An adaptive regularized proximal Newton-type methods for com- posite optimization over the Stiefel manifold. Computational Optimization and Applications 89(2):419–457
work page 2024
-
[41]
INFORMS Journal on Optimization 4(2):200–214
Wang Z, Liu B, Chen S, Ma S, Xue L, Zhao H (2022) A manifold p roximal linear method for sparse spectral clustering with application to single- cell RNA sequencing data analysis. INFORMS Journal on Optimization 4(2):200–214
work page 2022
-
[42]
Mathematical Programming 142(1):397–434
Wen Z, Yin W (2013) A feasible method for optimization wi th orthogonality constraints. Mathematical Programming 142(1):397–434
work page 2013
-
[43]
SIAM Journal on Optimization 31(4):3097–3126
Xiao N, Liu X, Yuan YX (2021) Exact penalty function for ℓ2,1 norm minimization over the Stiefel manifold. SIAM Journal on Optimization 31(4):3097–3126
work page 2021
-
[44]
Xu M, Jiang B, Liu YF, So AMC (2025) On the oracle complexi ty of a Riemannian in- exact augmented Lagrangian method for nonsmooth composite problems over Riemannian submanifolds. Optimization Letters 1–19
work page 2025
-
[45]
Mathematics of Operations Research 49(3):1710–1733
Zhang C, Chen X, Ma S (2024) A Riemannian smoothing steep est descent method for non-Lipschitz optimization on embedded submanifolds of Rn. Mathematics of Operations Research 49(3):1710–1733
work page 2024
-
[46]
SIAM Journal on Optimization 20(4):1737–1765
Zhao XY, Sun D, Toh KC (2010) A Newton-CG augmented Lagra ngian method for semidef- inite programming. SIAM Journal on Optimization 20(4):1737–1765
work page 2010
-
[47]
Mathematical Program- ming 201(1):1–61
Zhou Y, Bao C, Ding C, Zhu J (2023) A semismooth Newton bas ed augmented La- grangian method for nonsmooth optimization on matrix manif olds. Mathematical Program- ming 201(1):1–61. 43
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.