pith. sign in

arxiv: 2508.12003 · v2 · submitted 2025-08-16 · 🧮 math.OC

An inexact variable metric proximal linearization method for composite optimization on manifolds

Pith reviewed 2026-05-18 23:06 UTC · model grok-4.3

classification 🧮 math.OC
keywords composite optimizationmanifold optimizationproximal linearizationvariable metricoracle complexityKurdyka-Lojasiewicz propertynonsmooth nonconvex optimizationretraction
0
0 comments X

The pith

An inexact variable metric proximal linearization method achieves O(ε^{-3}) oracle complexity for composite optimization on manifolds under bounded iterates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops an algorithm for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping over a C^2-smooth closed embedded manifold. The method applies a variable metric proximal linearization step that respects the manifold geometry through retractions and first-order information, solving each subproblem only inexactly via a practical criterion. Under the assumption that the generated sequence of points remains bounded, the approach reaches an approximate stationary point after order ε^{-3} calls to an oracle when a dual fast gradient method is used inside, and every cluster point satisfies the stationarity condition. Readers would care because many applied problems in data analysis and engineering involve nonsmooth objectives on curved domains where such complexity and convergence guarantees have been missing.

Core claim

We propose an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping F over a C^2-smooth embedded closed submanifold M. By leveraging the composite structure together with retractions and first-order information on M, each iteration solves an inexact subspace-constrained strongly convex problem according to a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, the method establishes O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver and proves that any cluster point is a stationary point. When the constructed potential function additionally满足

What carries the argument

Inexact variable metric proximal linearization method that at each step seeks an inexact solution to a subspace-constrained strongly convex subproblem using manifold retractions and first-order information.

Load-bearing premise

The sequence of iterates generated by the method remains bounded.

What would settle it

A concrete composite manifold problem on which the generated iterates become unbounded or on which more than order ε^{-3} oracle calls are required to reach stationarity.

Figures

Figures reproduced from arXiv: 2508.12003 by Hao He, Ruyu Liu, Shaohua Pan, Yitian Qian.

Figure 1
Figure 1. Figure 1: The total iterations and time (s) for all subproble [PITH_FULL_IMAGE:figures/full_fig_p032_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The objective values, running time and NMI scores b [PITH_FULL_IMAGE:figures/full_fig_p034_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The objective value and running time of RiVMPL for ( [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The objective values and running time of three meth [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The infeasibility and running time of three methods for (4) under different ρ. when ρ is greater than 2.5, the one by RiVMPL and RiALM increases to 1 sharply, but the one by RADMM decreases to 0 abnormally. We also observe that as λ increases in [0.5, 3.2], the running time of RiALM increases remarkably, but that of RiVMPL and RADMM has no too much variation. 0.5 1 1.5 2 2.5 3 3.5 0 0.2 0.4 0.6 0.8 1 spars… view at source ↗
Figure 6
Figure 6. Figure 6: The row sparsity and running time of three methods under different λ. By Figures 5-6, we test the three methods for solving (4) for (m, λ, ρ) = (50, 2.0, 0.5) with different n and r [PITH_FULL_IMAGE:figures/full_fig_p038_6.png] view at source ↗
read the original abstract

This paper concerns the minimization of the composition of a nonsmooth convex function and a $\mathcal{C}^{1,1}$ mapping $F$ over a $\mathcal{C}^2$-smooth embedded closed submanifold $\mathcal{M}$. For this class of nonconvex and nonsmooth problems, we propose an inexact variable metric proximal linearization method by leveraging its composite structure and the retraction and first-order information of $\mathcal{M}$, which at each iteration seeks an inexact solution to a subspace constrained strongly convex problem by a practical inexactness criterion. Under the boundedness assumption on the iterate sequence, we establish the $O(\epsilon^{-3})$ oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point. If in addition the constructed potential function has the Kurdyka-Lojasiewicz (KL) property on the set of cluster points, the iterate sequence converges to a stationary point, and if the potential function has the KL property of exponent $q\in[\frac{1}{2},1)$, the local convergence rate is characterized. We also provide a condition only involving the original data to identify the KL property of the potential function with an exponent $q\in[0,1)$. Numerical comparisons with the existing methods validate the efficiency of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper introduces an inexact variable metric proximal linearization method for minimizing the composition of a nonsmooth convex function and a C^{1,1} mapping over a C^2-smooth embedded closed submanifold. The method uses retractions and first-order information on the manifold to solve inexact subspace-constrained strongly convex subproblems at each iteration. Under the boundedness assumption on the iterate sequence, it establishes an O(ε^{-3}) oracle complexity result when a dual fast gradient method is used as the inner solver, proves that cluster points are stationary points, and derives convergence to a stationary point (with rates) when a constructed potential function satisfies the Kurdyka-Łojasiewicz property. A data-only condition is also supplied to guarantee the KL property with exponent in [0,1).

Significance. If the boundedness assumption can be justified or replaced by verifiable conditions on the problem data, the paper would offer a useful algorithmic framework with explicit complexity guarantees for composite nonsmooth nonconvex problems on manifolds. The combination of variable metrics, inexact proximal linearization, and a practical inner solver is a reasonable extension of Euclidean methods. The data-dependent KL identification condition is a constructive contribution. However, the current dependence on an unverified boundedness hypothesis limits the strength of the theoretical claims.

major comments (2)
  1. [Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.
  2. [Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.
minor comments (2)
  1. [Method description (§3)] The precise definition of the variable metric and its relation to the tangent space at each iterate should be stated more explicitly to avoid ambiguity in the subproblem formulation.
  2. [Preliminaries] A few minor notation inconsistencies appear in the description of the retraction and its differential; these do not affect the results but should be standardized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below, indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and main complexity theorem: Both the O(ε^{-3}) oracle complexity (via dual fast gradient inner solver) and the stationarity of cluster points are explicitly conditioned on the boundedness assumption for the iterate sequence {x_k}. No coercivity, level-set boundedness, or other condition involving only the original objective and manifold data is supplied that would imply this boundedness in general; without it the claims do not hold if the sequence diverges while the potential decreases.

    Authors: We acknowledge that the O(ε^{-3}) complexity and stationarity results are conditioned on boundedness of {x_k}, which is required to prevent potential divergence from invalidating the conclusions even as the potential decreases. This is a standard hypothesis in nonconvex manifold optimization analyses. In the revision we will state the assumption more prominently in the abstract and Theorem 4.1, and add a remark with verifiable sufficient conditions on the data (e.g., coercivity of the composite objective relative to the manifold or compactness of level sets) that guarantee bounded iterates. A fully general coercivity condition covering all nonsmooth nonconvex instances is difficult to obtain without further restrictions, but the added discussion will clarify the scope of the claims. revision: partial

  2. Referee: [Main convergence theorem (§4)] Stationarity argument: The proof that any cluster point is stationary (stated after the complexity result) relies on boundedness to guarantee the existence of convergent subsequences on the compact manifold; the argument is therefore incomplete without either a proof of boundedness or an explicit sufficient condition on the composite function F and the nonsmooth term.

    Authors: The referee is correct that the stationarity argument uses boundedness to ensure convergent subsequences exist, since the manifold is closed but not necessarily compact. We will revise the proof in §4 to explicitly note this dependence and to clarify that boundedness supplies the needed compactness for subsequence extraction. The same sufficient conditions on the problem data added in response to the first comment will be referenced here to strengthen the result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are conditional on explicit external assumption

full rationale

The paper's central results—the O(ε^{-3}) oracle complexity and stationarity of cluster points—are explicitly derived under the boundedness assumption on the iterate sequence, which is stated as an input hypothesis rather than obtained from the algorithm or prior steps. The subsequent KL-property analysis, local convergence rates, and data-only condition for identifying the KL exponent are standard consequences of manifold geometry and nonsmooth optimization theory, without any self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the claims to their own inputs. The derivation chain remains self-contained once the boundedness hypothesis is granted, with no evidence of ansatzes smuggled via citation or renaming of known results as novel unifications.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on two standard but nontrivial domain assumptions common in nonsmooth optimization: boundedness of iterates and the Kurdyka-Lojasiewicz property of a constructed potential function. No free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Boundedness of the iterate sequence
    Invoked to obtain the O(ε^{-3}) oracle complexity and to guarantee that cluster points are stationary.
  • domain assumption Kurdyka-Lojasiewicz property of the potential function on the set of cluster points
    Required for global convergence of the sequence and for characterizing local convergence rates.

pith-pipeline@v0.9.0 · 5775 in / 1409 out tokens · 52938 ms · 2026-05-18T23:06:59.804931+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Under the boundedness assumption on the iterate sequence, we establish the O(ε^{-3}) oracle complexity with a dual fast gradient method as the inner solver, and prove that any cluster point of the iterate sequence is a stationary point.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Absil PA, Mahony R, Sepulchre R (2008) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ)

  2. [2]

    Mathematical Programming 116(1):5–16

    Attouch H, Bolte J (2009) On the convergence of the proxim al algorithm for nonsmooth functions involving analytic features. Mathematical Programming 116(1):5–16

  3. [3]

    Mathematics of Operations Research 35(2):438–457

    Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka- Łojasiewicz inequality. Mathematics of Operations Research 35(2):438–457

  4. [4]

    Mathematical Programming 137(1):91–129

    Attouch H, Bolte J, Svaiter BF (2013) Convergence of desc ent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Mathematical Programming 137(1):91–129

  5. [5]

    IMA Journal of Nu- merical Analysis 8(1):141–148

    Barzilai J, Borwein JM (1988) Two-point step size gradie nt methods. IMA Journal of Nu- merical Analysis 8(1):141–148

  6. [6]

    Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia, US)

  7. [7]

    SIAM Journal on Optimization 33(3):1473–1493

    Beck A, Rosset I (2023) A dynamic smoothing technique for a class of nonsmooth optimiza- tion problems on manifolds. SIAM Journal on Optimization 33(3):1473–1493

  8. [8]

    arXiv preprint arXiv:2108.12447

    Bendokat T, Zimmermann R (2021) The real symplectic Stie fel and Grassmann manifolds: metrics, geodesics and applications. arXiv preprint arXiv:2108.12447

  9. [9]

    Mathematical Pro- gramming 117(1):5–19

    Bolte J, Daniilidis A, Lewis A (2009) Tame functions are s emismooth. Mathematical Pro- gramming 117(1):5–19

  10. [10]

    Mathematical Programming 146(1):459–494

    Bolte J, Sabach S, Teboulle M (2014) Proximal alternati ng linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1):459–494

  11. [11]

    Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge Uni- versity Press, Cambridge, UK)

  12. [12]

    IMA Journal of Numerical Analysis 39(1):1–33

    Boumal N, Absil PA, Cartis C (2019) Global rates of conve rgence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis 39(1):1–33

  13. [13]

    SIAM Journal on Optimization 30(1):210–239

    Chen S, Ma S, Man-Cho So A, Zhang T (2020) Proximal gradie nt method for nonsmooth optimization over the stiefel manifold. SIAM Journal on Optimization 30(1):210–239

  14. [14]

    Clarke FH (1983) Optimization and Nonsmooth Analysis (Society for Industrial and Applied Mathematics, Philadelphia, US). 41

  15. [15]

    Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complemen - tarity Problems (Springer, New York, US)

  16. [16]

    International Conference on Geometric Science of In- formation, 789–796 (Springer)

    Gao B, Son NT, Absil PA, Stykel T (2021) Geometry of the sy mplectic Stiefel manifold endowed with the Euclidean metric. International Conference on Geometric Science of In- formation, 789–796 (Springer)

  17. [17]

    Linear Algebra and its Applications 682:50–85

    Gao B, Son NT, Stykel T (2024) Optimization on the symple ctic Stiefel manifold: SR decomposition-based retraction and applications. Linear Algebra and its Applications 682:50–85

  18. [18]

    Applied Mathematics and Opti- mization 11(1):43–56

    Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Gener alized Hessian matrix and second- order optimality conditions for problems with C1,1 data. Applied Mathematics and Opti- mization 11(1):43–56

  19. [19]

    SIAM Journal on Optimization 28(1):596–619

    Hosseini S, Huang W, Yousefpour R (2018) Line search alg orithms for locally Lipschitz functions on Riemannian manifolds. SIAM Journal on Optimization 28(1):596–619

  20. [20]

    Mathematical Program- ming 194(1):371–413

    Huang W, Wei K (2022) Riemannian proximal gradient meth ods. Mathematical Program- ming 194(1):371–413

  21. [21]

    Computational Optimization and Applications 85(1):1–32

    Huang W, Wei K (2023) An inexact Riemannian proximal gra dient method. Computational Optimization and Applications 85(1):1–32

  22. [22]

    SIAM Journal on Optimization 19(4):1894–1917

    Ioffe AD (2009) An invitation to tame optimization. SIAM Journal on Optimization 19(4):1894–1917

  23. [23]

    arXiv preprint arXiv:2404.08463

    Jensen R, Zimmermann R (2024) Riemannian optimization on the symplectic Stiefel mani- fold using second-order information. arXiv preprint arXiv:2404.08463

  24. [24]

    Foundations of Computational Mathematics 18(5):1199–1232

    Li G, Pong TK (2018) Calculus of the exponent of Kurdyka– Łojasiewicz inequality and its applications to linear convergence of first-order metho ds. Foundations of Computational Mathematics 18(5):1199–1232

  25. [25]

    Mathematics of Operations Research

    Li J, Ma S, Srivastava T (2024) A Riemannian alternating direction method of multipliers. Mathematics of Operations Research

  26. [26]

    arXiv preprint arXiv:2411.15776

    Li Q, Zhang N, Yan H, Feng J (2024) Proximal methods for st ructured nonsmooth opti- mization over Remannian submanifolds. arXiv preprint arXiv:2411.15776

  27. [27]

    Mathematical Programming 178(1):215–262

    Liu H, So AMC, Wu W (2019) Quadratic optimization with or thogonality constraint: explicit Łojasiewicz exponent and linear convergence of re traction-based line-search and stochastic variance-reduced gradient methods. Mathematical Programming 178(1):215–262

  28. [28]

    Journal of Scientific Computing 99(2):30

    Liu X, Xiao N, Yuan YX (2024) A penalty-free infeasible a pproach for a class of nonsmooth optimization problems over the Stiefel manifold. Journal of Scientific Computing 99(2):30

  29. [29]

    IEEE Transactions on Image Processing 25(6):2833–2843

    Lu C, Yan S, Lin Z (2016) Convex sparse spectral clusteri ng: single-view to multi-view. IEEE Transactions on Image Processing 25(6):2833–2843

  30. [30]

    Mathematical Programming 135(1):149–193

    Lu Z, Zhang Y (2012) An augmented Lagrangian approach fo r sparse principal component analysis. Mathematical Programming 135(1):149–193

  31. [31]

    Optimization Methods and Software 31(3):645–678

    Necoara I, Patrascu A (2016) Iteration complexity anal ysis of dual first-order methods for conic convex programming. Optimization Methods and Software 31(3):645–678

  32. [32]

    Dokl Akad Nauk Sssr , volume 269, 543

    Nesterov Y (1983) A method for solving the convex progra mming problem with convergence rate o(1/k 2). Dokl Akad Nauk Sssr , volume 269, 543

  33. [33]

    Bioinformat- ics 34(12):2069–2076

    Park S, Zhao H (2018) Spectral clustering based on learn ing similarity matrix. Bioinformat- ics 34(12):2069–2076

  34. [34]

    SIAM Jour- nal on Scientific Computing 38(1):A1–A27

    Peng L, Mohseni K (2016) Symplectic model reduction of H amiltonian systems. SIAM Jour- nal on Scientific Computing 38(1):A1–A27. 42

  35. [35]

    Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, Berlin, Germany)

  36. [36]

    SIAM Journal on Optimization 34(1):654–681

    Si W, Absil PA, Huang W, Jiang R, Vary S (2024) A Riemannia n proximal Newton method. SIAM Journal on Optimization 34(1):654–681

  37. [37]

    Journal of Machine Learning Research 3(Dec):583–617

    Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge r euse framework for combining multiple partitions. Journal of Machine Learning Research 3(Dec):583–617

  38. [38]

    arXiv preprint arXiv:2503.02494

    Wang L, Liu X, Chen X (2025) The distributionally robust optimization model of sparse principal component analysis. arXiv preprint arXiv:2503.02494

  39. [39]

    Journal of Scientific Computing 95(2):39

    Wang Q, Yang WH (2023) Proximal quasi-Newton method for composite optimization over the Stiefel manifold. Journal of Scientific Computing 95(2):39

  40. [40]

    Computational Optimization and Applications 89(2):419–457

    Wang Q, Yang WH (2024) An adaptive regularized proximal Newton-type methods for com- posite optimization over the Stiefel manifold. Computational Optimization and Applications 89(2):419–457

  41. [41]

    INFORMS Journal on Optimization 4(2):200–214

    Wang Z, Liu B, Chen S, Ma S, Xue L, Zhao H (2022) A manifold p roximal linear method for sparse spectral clustering with application to single- cell RNA sequencing data analysis. INFORMS Journal on Optimization 4(2):200–214

  42. [42]

    Mathematical Programming 142(1):397–434

    Wen Z, Yin W (2013) A feasible method for optimization wi th orthogonality constraints. Mathematical Programming 142(1):397–434

  43. [43]

    SIAM Journal on Optimization 31(4):3097–3126

    Xiao N, Liu X, Yuan YX (2021) Exact penalty function for ℓ2,1 norm minimization over the Stiefel manifold. SIAM Journal on Optimization 31(4):3097–3126

  44. [44]

    Optimization Letters 1–19

    Xu M, Jiang B, Liu YF, So AMC (2025) On the oracle complexi ty of a Riemannian in- exact augmented Lagrangian method for nonsmooth composite problems over Riemannian submanifolds. Optimization Letters 1–19

  45. [45]

    Mathematics of Operations Research 49(3):1710–1733

    Zhang C, Chen X, Ma S (2024) A Riemannian smoothing steep est descent method for non-Lipschitz optimization on embedded submanifolds of Rn. Mathematics of Operations Research 49(3):1710–1733

  46. [46]

    SIAM Journal on Optimization 20(4):1737–1765

    Zhao XY, Sun D, Toh KC (2010) A Newton-CG augmented Lagra ngian method for semidef- inite programming. SIAM Journal on Optimization 20(4):1737–1765

  47. [47]

    Mathematical Program- ming 201(1):1–61

    Zhou Y, Bao C, Ding C, Zhu J (2023) A semismooth Newton bas ed augmented La- grangian method for nonsmooth optimization on matrix manif olds. Mathematical Program- ming 201(1):1–61. 43