An Adaptive Smoothing Algorithm for Non-Lipschitz Optimization on Manifolds with Complexity Guarantees

Lei Wang; Xiaojun Chen

arxiv: 2604.18325 · v1 · submitted 2026-04-20 · 🧮 math.OC

An Adaptive Smoothing Algorithm for Non-Lipschitz Optimization on Manifolds with Complexity Guarantees

Lei Wang , Xiaojun Chen This is my paper

Pith reviewed 2026-05-10 04:23 UTC · model grok-4.3

classification 🧮 math.OC

keywords Riemannian optimizationnon-Lipschitz functionssmoothing methodsAdaGrad algorithmiteration complexityquasi-norm regularizationglobal convergence

0 comments

The pith

A smoothing Riemannian gradient algorithm with AdaGrad-type stepsizes solves non-Lipschitz optimization problems on manifolds with iteration complexity O(ε^{p-4}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors consider optimization problems on Riemannian manifolds whose objective functions combine a smooth component with quasi-norm penalties raised to a power p between 0 and 1. These penalties render the objective non-Lipschitz continuous, so standard Riemannian optimization methods do not apply directly. The paper introduces a general smoothing framework that approximates the non-smooth terms while preserving essential properties such as differentiability and error bounds on the manifold. Building on this framework, they develop a Riemannian gradient algorithm that incorporates a smoothing-aware AdaGrad-type stepsize rule, and they prove that the algorithm converges globally with an iteration complexity of O(ε^{p-4}). This bound specializes to the optimal O(ε^{-3}) rate known for Lipschitz problems when p equals 1, establishing the first such guarantee for the non-Lipschitz setting.

Core claim

We construct a general smoothing framework for non-Lipschitz objective functions consisting of a smooth term and quasi-norm penalties on Riemannian manifolds. Using this, we propose a smoothing Riemannian gradient algorithm with a smoothing-aware AdaGrad-type stepsize rule, establish its global convergence, and derive an iteration complexity of O(ε^{p-4}) for p in (0,1], including O(ε^{-3}) for p=1.

What carries the argument

The smoothing framework that approximates the quasi-norm penalties while maintaining differentiability and approximation properties on the manifold, paired with a smoothing-aware AdaGrad-type stepsize rule.

If this is right

The proposed algorithm is guaranteed to converge globally to a stationary point.
The number of iterations required is bounded by O(ε^{p-4}).
For the special case p=1 the complexity improves to O(ε^{-3}), matching existing results for Lipschitz problems.
The method demonstrates practical efficiency in applications from machine learning and data science.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The smoothing technique could be adapted to other manifold optimization problems involving similar non-smooth terms beyond quasi-norms.
Similar adaptive rules might improve performance in stochastic optimization on manifolds.
Implementation on concrete manifolds such as the sphere could provide empirical validation of the theoretical rates.

Load-bearing premise

The constructed smoothing framework must satisfy the fundamental properties needed for the convergence analysis to hold on the manifold.

What would settle it

A numerical experiment on a manifold problem with known solution where the observed iteration count grows faster than O(ε^{p-4}) as ε decreases would falsify the complexity claim.

Figures

Figures reproduced from arXiv: 2604.18325 by Lei Wang, Xiaojun Chen.

**Figure 2.** Figure 2: Illustration of two frames from the KITTI dataset. The first row corresponds to the raw [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Numerical comparison among ASRGA, DSGM, and ManPG for the SPCA problem ( [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

read the original abstract

We study a class of optimization problems on Riemannian manifolds, where the objective function consists of a smooth term and quasi-norm type penalties with exponent $p \in (0, 1]$. The essential difficulty lies in the fact that the objective function may not be locally Lipschitz continuous, which places this type of problems beyond the reach of existing Riemannian techniques. To overcome this obstacle, this paper constructs a general smoothing framework and establishes fundamental properties for developing efficient algorithms. In particular, we propose a smoothing Riemannian gradient algorithm equipped with a smoothing-aware AdaGrad-type stepsize rule. Its global convergence is demonstrated together with an iteration complexity of $O (\epsilon^{p - 4})$, which includes the best available iteration complexity of $O (\epsilon^{- 3})$ for Lipschitz problems with $p = 1$ as a special case. To the best of our knowledge, this is the first complexity result for non-Lipschitz optimization on Riemannian manifolds. Preliminary numerical experiments corroborate the practical efficiency of the proposed approach in real-world applications arsing from machine learning and data science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper gives the first explicit complexity bound for non-Lipschitz Riemannian problems by smoothing quasi-norm penalties and using an adaptive stepsize, but the retraction and curvature error terms are the part that needs the closest check. They handle objectives that split into a smooth term plus p-quasi-norm penalties for p in (0,1], build a smoothing framework that preserves the needed approximation properties, and run a Riemannian gradient method whose stepsize adapts to the current smoothing level. The result is global convergence with an O(ε^{p-4}) iteration bound that recovers the known O(ε^{-3}) rate for the Lipschitz case p=1 as a special instance. That combination looks new relative to prior Riemannian smoothing work and Euclidean non-Lipschitz results. The adaptive rule is a practical plus because it avoids having to tune the smoothing parameter by hand in advance. The numerical tests on ML-type problems are mentioned only briefly, but they at least show the method runs on real data without obvious breakdown. The soft spot is exactly the one the stress-test flags: whether the smoothing error and gradient approximation stay controlled once you replace Euclidean lines with retractions and add the curvature terms that appear in the manifold setting. For p<1 the quasi-norm is quite singular, so any extra geometric error that is not absorbed cleanly could push the exponent worse than p-4. The abstract states that the fundamental properties are established, but the letter of the proof will decide whether those extra terms are handled tightly enough. This is for people who work on Riemannian optimization with non-smooth regularizers that arise in sparse or low-rank models on manifolds. A reader who wants non-asymptotic rates rather than just convergence will get concrete value if the geometric bounds hold. It deserves a serious referee because the claim is new, the setting is motivated by applications, and the analysis is at least formally grounded enough to be worth checking in detail. Send it to review with attention paid to the retraction lemmas.

Referee Report

2 major / 2 minor

Summary. The paper constructs a smoothing framework for non-Lipschitz optimization on Riemannian manifolds, where the objective is a smooth term plus quasi-norm penalties with exponent p ∈ (0,1]. It proposes a smoothing Riemannian gradient algorithm with a smoothing-aware AdaGrad-type stepsize rule, proves global convergence, and derives an iteration complexity of O(ε^{p-4}), recovering the O(ε^{-3}) rate for the Lipschitz case p=1 as a special case. This is presented as the first complexity result of its kind on manifolds, supported by preliminary numerical experiments on machine learning applications.

Significance. If the smoothing approximation properties hold with respect to retractions and manifold geometry, the result would fill an important gap by providing the first iteration complexity guarantees for non-Lipschitz problems on Riemannian manifolds. The adaptive stepsize and recovery of the known Lipschitz rate are strengths; the work has clear relevance to sparse optimization and data science applications on manifolds.

major comments (2)

[Smoothing framework (Section 3) and main convergence theorem] The central claim that the constructed smoothing framework satisfies the fundamental properties (uniform approximation of the objective and Riemannian gradient) needed for the O(ε^{p-4}) guarantee must be established explicitly with respect to the retraction operator rather than Euclidean lines. Any unaccounted error from curvature or retraction would invalidate the complexity bound, especially for p<1.
[Iteration complexity theorem (likely Theorem 4.x)] In the complexity analysis, the derivation of the p-4 exponent relies on the smoothing error being controlled tightly enough that the total iteration count remains O(ε^{p-4}). The proof must bound the discrepancy between the smoothed gradient and the true Riemannian gradient under the retraction; without this, the global convergence rate for the non-Lipschitz case is not supported.

minor comments (2)

[Abstract] Abstract contains a typo: 'arsing' should read 'arising'.
[Preliminaries and notation] Clarify the precise definition of the smoothing operator on the manifold (e.g., whether it is defined via the exponential map or a general retraction) and ensure all notation for the Riemannian gradient and retraction is introduced consistently before the algorithm statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of our smoothing framework and complexity analysis on manifolds. We address each major comment point by point below.

read point-by-point responses

Referee: [Smoothing framework (Section 3) and main convergence theorem] The central claim that the constructed smoothing framework satisfies the fundamental properties (uniform approximation of the objective and Riemannian gradient) needed for the O(ε^{p-4}) guarantee must be established explicitly with respect to the retraction operator rather than Euclidean lines. Any unaccounted error from curvature or retraction would invalidate the complexity bound, especially for p<1.

Authors: We agree that all approximation properties must be stated intrinsically with respect to the retraction. In Section 3 the smoothing is defined by composing the Euclidean smoothing operator with the retraction R_x, so that the smoothed function is f_μ(x) = g(R_x(·)) smoothed along tangent vectors. Lemmas 3.1 and 3.2 already derive the uniform bounds |f_μ(x) - f(x)| ≤ Cμ^p and ||grad f_μ(x) - grad f(x)|| ≤ Cμ^{p-1} by using the second-order expansion of the retraction (dR_x(0) = Id and the Hessian term bounded by sectional curvature via the standard comparison theorem). These estimates replace Euclidean line segments with retraction curves and therefore incorporate curvature effects. To make the retraction dependence fully explicit we will add a short remark after Lemma 3.2 and a new corollary that restates the gradient approximation directly in terms of R_x, confirming that no additional curvature-induced error terms appear beyond those already absorbed in the O(·) constants. revision: partial
Referee: [Iteration complexity theorem (likely Theorem 4.x)] In the complexity analysis, the derivation of the p-4 exponent relies on the smoothing error being controlled tightly enough that the total iteration count remains O(ε^{p-4}). The proof must bound the discrepancy between the smoothed gradient and the true Riemannian gradient under the retraction; without this, the global convergence rate for the non-Lipschitz case is not supported.

Authors: The proof of the main complexity result (Theorem 4.2) already contains an explicit bound on the gradient discrepancy under the retraction. After invoking the smoothing property from Lemma 3.2, we apply the retraction error estimate ||R_x(v) - exp_x(v)|| = O(||v||^2) together with the fact that the algorithm step-sizes remain inside a neighborhood controlled by the injectivity radius. The resulting discrepancy term is O(μ^{p-1} + s_k^2), where s_k is the step length; this term is then absorbed into the AdaGrad descent inequality without changing the leading exponent. Balancing the smoothing parameter μ ∼ ε^{(4-p)/p} with the iteration count yields the stated O(ε^{p-4}) rate. To address the referee’s request for transparency we will insert a dedicated lemma (new Lemma 4.1) that isolates the retraction-based gradient discrepancy bound and reference it explicitly in the proof of Theorem 4.2. revision: yes

Circularity Check

0 steps flagged

No circularity: complexity bound derived from independent convergence analysis of new smoothing framework.

full rationale

The paper constructs a smoothing framework on manifolds, proves its fundamental approximation properties, and then analyzes the proposed smoothing Riemannian gradient algorithm with AdaGrad-type stepsizes to obtain global convergence and the iteration complexity O(ε^{p-4}). This rate is obtained directly from the proof steps (including handling of non-Lipschitz quasi-norm terms for p in (0,1]) rather than by fitting parameters to data or renaming inputs. No self-citation chain, self-definitional loop, or ansatz smuggling is load-bearing for the central claim; the framework and analysis are self-contained within the manuscript. The 'first complexity result' statement is a novelty claim, not a derivation step that reduces to prior inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence and properties of a general smoothing framework for quasi-norm penalties on manifolds and on standard Riemannian geometry assumptions (e.g., completeness, retraction properties) that are not detailed in the abstract.

axioms (1)

domain assumption The objective is the sum of a smooth function and a quasi-norm penalty with p in (0,1] that is not locally Lipschitz.
Stated in the abstract as the problem class under study.

pith-pipeline@v0.9.0 · 5485 in / 1281 out tokens · 45462 ms · 2026-05-10T04:23:22.926634+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization Algorithms on Matrix Manifolds. Prince- ton University Press, Princeton, 2008

work page 2008
[2]

Beck and I

A. Beck and I. Rosset. A dynamic smoothing technique for a class of nonsmooth optimization problems on manifolds.SIAM Journal on Optimization, 33(3):1473–1493, 2023

work page 2023
[3]

Bian and X

W. Bian and X. Chen. Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization.SIAM Journal on Optimization, 23(3):1718–1741, 2013

work page 2013
[4]

W. Bian, X. Chen, and Y. Ye. Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization.Mathematical Programming, 149(1):301–327, 2015

work page 2015
[5]

B¨ ohm and S

A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

work page 2021
[6]

R. I. Bot ¸ and C. Hendrich. A variable smoothing algorithm for solving convex optimization problems.TOP, 23:124–150, 2015

work page 2015
[7]

Boumal.An Introduction to Optimization on Smooth Manifolds

N. Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge, 2023

work page 2023
[8]

Boumal, P.-A

N. Boumal, P.-A. Absil, and C. Cartis. Global rates of convergence for nonconvex optimization on manifolds.IMA Journal of Numerical Analysis, 39(1):1–33, 2018

work page 2018
[9]

S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Proximal gradient method for nonsmooth optimization over the Stiefel manifold.SIAM Journal on Optimization, 30(1):210–239, 2020

work page 2020
[10]

S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Nonsmooth optimization over the Stiefel manifold and beyond: Proximal gradient method and recent variants.SIAM Review, 66(2):319–352, 2024. 19

work page 2024
[11]

W. Chen, H. Ji, and Y. You. An augmented Lagrangian method forℓ 1-regularized optimization problems with orthogonality constraints.SIAM Journal on Scientific Computing, 38(4):B570– B592, 2016

work page 2016
[12]

X. Chen. Smoothing methods for nonsmooth, nonconvex minimization.Mathematical Program- ming, 134:71–99, 2012

work page 2012
[13]

X. Chen, Y. He, and Z. Zhang. Tight error bounds for the sign-constrained Stiefel manifold. SIAM Journal on Optimization, 35(1):302–329, 2025

work page 2025
[14]

X. Chen, W. Li, and Q. Luo. Orthogonal nonnegative matrix factorization via minimization over the null space.to appear in SIAM Journal on Matrix Analysis and Applications, 2026

work page 2026
[15]

Chen and P

X. Chen and P. L. Toint. High-order evaluation complexity for convexly-constrained optimization with non-Lipschitzian group sparsity terms.Mathematical Programming, 187(1):47–78, 2021

work page 2021
[16]

X. Chen, P. L. Toint, and H. Wang. Complexity of partially separable convexly constrained optimization with non-Lipschitzian singularities.SIAM Journal on Optimization, 29(1):874–903, 2019

work page 2019
[17]

X. Chen, F. Xu, and Y. Ye. Lower bound theory of nonzero entries in solutions ofℓ 2-ℓp minimiza- tion.SIAM Journal on Scientific Computing, 32(5):2832–2852, 2010

work page 2010
[18]

K. Deng, J. Hu, J. Wu, and Z. Wen. Oracle complexities of augmented Lagrangian methods for nonsmooth composite optimization on a compact submanifold.Mathematics of Operations Research, pages 1–27, 2025

work page 2025
[19]

T. Ding, Z. Zhu, T. Ding, Y. Yang, D. P. Robinson, M. C. Tsakiris, and R. Vidal. Noisy dual principal component pursuit. InProceedings of the 36th International Conference on Machine Learning, volume 97, pages 1617–1625. PMLR, 2019

work page 2019
[20]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochas- tic optimization.Journal of Machine Learning Research, 12(61):2121–2159, 2011

work page 2011
[21]

Garmanjani and L

R. Garmanjani and L. N. Vicente. Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization.IMA Journal of Numerical Analysis, 33(3):1008–1028, 2013

work page 2013
[22]

Geiger, P

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

work page 2013
[23]

Gratton, S

S. Gratton, S. Jerad, and P. L. Toint. Complexity of a class of first-order objective-function-free optimization algorithms.Optimization Methods and Software, pages 1–31, 2024

work page 2024
[24]

Gratton and P

S. Gratton and P. L. Toint. A simple first-order algorithm for full-rank equality constrained optimization.arXiv:2510.16390, 2025

work page arXiv 2025
[25]

Huang and K

W. Huang and K. Wei. Riemannian proximal gradient methods.Mathematical Programming, 194(1-2):371–413, 2022

work page 2022
[26]

Huang and K

W. Huang and K. Wei. An inexact Riemannian proximal gradient method.Computational Opti- mization and Applications, 85(1):1–32, 2023

work page 2023
[27]

Jiang, X

B. Jiang, X. Meng, Z. Wen, and X. Chen. An exact penalty approach for optimization with nonnegative orthogonality constraints.Mathematical Programming, 198(1):855–897, 2023

work page 2023
[28]

Jiang, M

B. Jiang, M. Xu, X. Cai, and Y.-F. Liu. An inexact proximal framework for nonsmooth Rieman- nian difference-of-convex optimization.arXiv:2509.08561, 2025

work page arXiv 2025
[29]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. 20

work page 1998
[30]

J. Li, S. Ma, and T. Srivastava. A Riemannian alternating direction method of multipliers. Mathematics of Operations Research, 50(4):3222–3242, 2025

work page 2025
[31]

X. Li, S. Chen, Z. Deng, Q. Qu, Z. Zhu, and A. M.-C. So. Weakly convex optimization over Stiefel manifold using Riemannian subgradient-type methods.SIAM Journal on Optimization, 31(3):1605–1634, 2021

work page 2021
[32]

Y. Li, D. Sun, and L. Zhang. Unsupervised feature selection via nonnegative orthogonal con- strained regularized minimization.Journal of Machine Learning Research, 27(39):1–44, 2026

work page 2026
[33]

Y.-F. Liu, S. Ma, Y.-H. Dai, and S. Zhang. A smoothing SQP framework for a class of composite Lq minimization over polyhedron.Mathematical Programming, 158(1):467–500, 2016

work page 2016
[34]

Z. Peng, W. Wu, J. Hu, and K. Deng. Riemannian smoothing gradient type algorithms for nonsmooth optimization problem on compact Riemannian submanifold embedded in Euclidean space.Applied Mathematics&Optimization, 88(3):85, 2023

work page 2023
[35]

Y. Qian, S. Pan, and L. Xiao. Error bound and exact penalty method for optimization problems with nonnegative orthogonal constraint.IMA Journal of Numerical Analysis, 44(1):120–156, 2024

work page 2024
[36]

M. C. Tsakiris and R. Vidal. Dual principal component pursuit.Journal of Machine Learning Research, 19(18):1–49, 2018

work page 2018
[37]

L. Wang, L. Bao, and X. Liu. A decentralized proximal gradient tracking algorithm for composite optimization on Riemannian manifolds.Journal of Machine Learning Research, 26(106):1–37, 2025

work page 2025
[38]

Wang and X

L. Wang and X. Liu. Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function.IEEE Transactions on Signal Processing, 70:3029–3041, 2022

work page 2022
[39]

Wang and X

L. Wang and X. Liu. Smoothing gradient tracking for decentralized optimization over the Stiefel manifold with non-smooth regularizers. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 126–132. IEEE, 2023

work page 2023
[40]

L. Wang, X. Liu, and X. Chen. The distributionally robust optimization model of sparse principal component analysis.arXiv:2503.02494, 2025

work page arXiv 2025
[41]

L. Wang, X. Liu, and X. Chen. A support-set algorithm for optimization problems with nonneg- ative and orthogonal constraints.arXiv:2511.03443, 2025

work page arXiv 2025
[42]

L. Wang, X. Liu, and Y. Zhang. Seeking consensus on subspaces in federated principal component analysis.Journal of Optimization Theory and Applications, 203:529–561, 2024

work page 2024
[43]

Y. Zhai, Z. Yang, Z. Liao, J. Wright, and Y. Ma. Complete dictionary learning viaℓ 4-norm maximization over the orthogonal group.Journal of Machine Learning Research, 21(165):1–68, 2020

work page 2020
[44]

Zhang, X

C. Zhang, X. Chen, and S. Ma. A Riemannian smoothing steepest descent method for non- Lipschitz optimization on embedded submanifolds ofR n.Mathematics of Operations Research, 49(3):1710–1733, 2024

work page 2024
[45]

Zheng, X

Z. Zheng, X. Yu, S. Ma, and L. Xue. A new inexact manifold proximal linear algorithm with adaptive stopping criteria.arXiv:2508.19234, 2025. 21

work page arXiv 2025

[1] [1]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization Algorithms on Matrix Manifolds. Prince- ton University Press, Princeton, 2008

work page 2008

[2] [2]

Beck and I

A. Beck and I. Rosset. A dynamic smoothing technique for a class of nonsmooth optimization problems on manifolds.SIAM Journal on Optimization, 33(3):1473–1493, 2023

work page 2023

[3] [3]

Bian and X

W. Bian and X. Chen. Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization.SIAM Journal on Optimization, 23(3):1718–1741, 2013

work page 2013

[4] [4]

W. Bian, X. Chen, and Y. Ye. Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization.Mathematical Programming, 149(1):301–327, 2015

work page 2015

[5] [5]

B¨ ohm and S

A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

work page 2021

[6] [6]

R. I. Bot ¸ and C. Hendrich. A variable smoothing algorithm for solving convex optimization problems.TOP, 23:124–150, 2015

work page 2015

[7] [7]

Boumal.An Introduction to Optimization on Smooth Manifolds

N. Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge, 2023

work page 2023

[8] [8]

Boumal, P.-A

N. Boumal, P.-A. Absil, and C. Cartis. Global rates of convergence for nonconvex optimization on manifolds.IMA Journal of Numerical Analysis, 39(1):1–33, 2018

work page 2018

[9] [9]

S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Proximal gradient method for nonsmooth optimization over the Stiefel manifold.SIAM Journal on Optimization, 30(1):210–239, 2020

work page 2020

[10] [10]

S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Nonsmooth optimization over the Stiefel manifold and beyond: Proximal gradient method and recent variants.SIAM Review, 66(2):319–352, 2024. 19

work page 2024

[11] [11]

W. Chen, H. Ji, and Y. You. An augmented Lagrangian method forℓ 1-regularized optimization problems with orthogonality constraints.SIAM Journal on Scientific Computing, 38(4):B570– B592, 2016

work page 2016

[12] [12]

X. Chen. Smoothing methods for nonsmooth, nonconvex minimization.Mathematical Program- ming, 134:71–99, 2012

work page 2012

[13] [13]

X. Chen, Y. He, and Z. Zhang. Tight error bounds for the sign-constrained Stiefel manifold. SIAM Journal on Optimization, 35(1):302–329, 2025

work page 2025

[14] [14]

X. Chen, W. Li, and Q. Luo. Orthogonal nonnegative matrix factorization via minimization over the null space.to appear in SIAM Journal on Matrix Analysis and Applications, 2026

work page 2026

[15] [15]

Chen and P

X. Chen and P. L. Toint. High-order evaluation complexity for convexly-constrained optimization with non-Lipschitzian group sparsity terms.Mathematical Programming, 187(1):47–78, 2021

work page 2021

[16] [16]

X. Chen, P. L. Toint, and H. Wang. Complexity of partially separable convexly constrained optimization with non-Lipschitzian singularities.SIAM Journal on Optimization, 29(1):874–903, 2019

work page 2019

[17] [17]

X. Chen, F. Xu, and Y. Ye. Lower bound theory of nonzero entries in solutions ofℓ 2-ℓp minimiza- tion.SIAM Journal on Scientific Computing, 32(5):2832–2852, 2010

work page 2010

[18] [18]

K. Deng, J. Hu, J. Wu, and Z. Wen. Oracle complexities of augmented Lagrangian methods for nonsmooth composite optimization on a compact submanifold.Mathematics of Operations Research, pages 1–27, 2025

work page 2025

[19] [19]

T. Ding, Z. Zhu, T. Ding, Y. Yang, D. P. Robinson, M. C. Tsakiris, and R. Vidal. Noisy dual principal component pursuit. InProceedings of the 36th International Conference on Machine Learning, volume 97, pages 1617–1625. PMLR, 2019

work page 2019

[20] [20]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochas- tic optimization.Journal of Machine Learning Research, 12(61):2121–2159, 2011

work page 2011

[21] [21]

Garmanjani and L

R. Garmanjani and L. N. Vicente. Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization.IMA Journal of Numerical Analysis, 33(3):1008–1028, 2013

work page 2013

[22] [22]

Geiger, P

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

work page 2013

[23] [23]

Gratton, S

S. Gratton, S. Jerad, and P. L. Toint. Complexity of a class of first-order objective-function-free optimization algorithms.Optimization Methods and Software, pages 1–31, 2024

work page 2024

[24] [24]

Gratton and P

S. Gratton and P. L. Toint. A simple first-order algorithm for full-rank equality constrained optimization.arXiv:2510.16390, 2025

work page arXiv 2025

[25] [25]

Huang and K

W. Huang and K. Wei. Riemannian proximal gradient methods.Mathematical Programming, 194(1-2):371–413, 2022

work page 2022

[26] [26]

Huang and K

W. Huang and K. Wei. An inexact Riemannian proximal gradient method.Computational Opti- mization and Applications, 85(1):1–32, 2023

work page 2023

[27] [27]

Jiang, X

B. Jiang, X. Meng, Z. Wen, and X. Chen. An exact penalty approach for optimization with nonnegative orthogonality constraints.Mathematical Programming, 198(1):855–897, 2023

work page 2023

[28] [28]

Jiang, M

B. Jiang, M. Xu, X. Cai, and Y.-F. Liu. An inexact proximal framework for nonsmooth Rieman- nian difference-of-convex optimization.arXiv:2509.08561, 2025

work page arXiv 2025

[29] [29]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. 20

work page 1998

[30] [30]

J. Li, S. Ma, and T. Srivastava. A Riemannian alternating direction method of multipliers. Mathematics of Operations Research, 50(4):3222–3242, 2025

work page 2025

[31] [31]

X. Li, S. Chen, Z. Deng, Q. Qu, Z. Zhu, and A. M.-C. So. Weakly convex optimization over Stiefel manifold using Riemannian subgradient-type methods.SIAM Journal on Optimization, 31(3):1605–1634, 2021

work page 2021

[32] [32]

Y. Li, D. Sun, and L. Zhang. Unsupervised feature selection via nonnegative orthogonal con- strained regularized minimization.Journal of Machine Learning Research, 27(39):1–44, 2026

work page 2026

[33] [33]

Y.-F. Liu, S. Ma, Y.-H. Dai, and S. Zhang. A smoothing SQP framework for a class of composite Lq minimization over polyhedron.Mathematical Programming, 158(1):467–500, 2016

work page 2016

[34] [34]

Z. Peng, W. Wu, J. Hu, and K. Deng. Riemannian smoothing gradient type algorithms for nonsmooth optimization problem on compact Riemannian submanifold embedded in Euclidean space.Applied Mathematics&Optimization, 88(3):85, 2023

work page 2023

[35] [35]

Y. Qian, S. Pan, and L. Xiao. Error bound and exact penalty method for optimization problems with nonnegative orthogonal constraint.IMA Journal of Numerical Analysis, 44(1):120–156, 2024

work page 2024

[36] [36]

M. C. Tsakiris and R. Vidal. Dual principal component pursuit.Journal of Machine Learning Research, 19(18):1–49, 2018

work page 2018

[37] [37]

L. Wang, L. Bao, and X. Liu. A decentralized proximal gradient tracking algorithm for composite optimization on Riemannian manifolds.Journal of Machine Learning Research, 26(106):1–37, 2025

work page 2025

[38] [38]

Wang and X

L. Wang and X. Liu. Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function.IEEE Transactions on Signal Processing, 70:3029–3041, 2022

work page 2022

[39] [39]

Wang and X

L. Wang and X. Liu. Smoothing gradient tracking for decentralized optimization over the Stiefel manifold with non-smooth regularizers. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 126–132. IEEE, 2023

work page 2023

[40] [40]

L. Wang, X. Liu, and X. Chen. The distributionally robust optimization model of sparse principal component analysis.arXiv:2503.02494, 2025

work page arXiv 2025

[41] [41]

L. Wang, X. Liu, and X. Chen. A support-set algorithm for optimization problems with nonneg- ative and orthogonal constraints.arXiv:2511.03443, 2025

work page arXiv 2025

[42] [42]

L. Wang, X. Liu, and Y. Zhang. Seeking consensus on subspaces in federated principal component analysis.Journal of Optimization Theory and Applications, 203:529–561, 2024

work page 2024

[43] [43]

Y. Zhai, Z. Yang, Z. Liao, J. Wright, and Y. Ma. Complete dictionary learning viaℓ 4-norm maximization over the orthogonal group.Journal of Machine Learning Research, 21(165):1–68, 2020

work page 2020

[44] [44]

Zhang, X

C. Zhang, X. Chen, and S. Ma. A Riemannian smoothing steepest descent method for non- Lipschitz optimization on embedded submanifolds ofR n.Mathematics of Operations Research, 49(3):1710–1733, 2024

work page 2024

[45] [45]

Zheng, X

Z. Zheng, X. Yu, S. Ma, and L. Xue. A new inexact manifold proximal linear algorithm with adaptive stopping criteria.arXiv:2508.19234, 2025. 21

work page arXiv 2025