pith. sign in

arxiv: 2604.18325 · v1 · submitted 2026-04-20 · 🧮 math.OC

An Adaptive Smoothing Algorithm for Non-Lipschitz Optimization on Manifolds with Complexity Guarantees

Pith reviewed 2026-05-10 04:23 UTC · model grok-4.3

classification 🧮 math.OC
keywords Riemannian optimizationnon-Lipschitz functionssmoothing methodsAdaGrad algorithmiteration complexityquasi-norm regularizationglobal convergence
0
0 comments X

The pith

A smoothing Riemannian gradient algorithm with AdaGrad-type stepsizes solves non-Lipschitz optimization problems on manifolds with iteration complexity O(ε^{p-4}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors consider optimization problems on Riemannian manifolds whose objective functions combine a smooth component with quasi-norm penalties raised to a power p between 0 and 1. These penalties render the objective non-Lipschitz continuous, so standard Riemannian optimization methods do not apply directly. The paper introduces a general smoothing framework that approximates the non-smooth terms while preserving essential properties such as differentiability and error bounds on the manifold. Building on this framework, they develop a Riemannian gradient algorithm that incorporates a smoothing-aware AdaGrad-type stepsize rule, and they prove that the algorithm converges globally with an iteration complexity of O(ε^{p-4}). This bound specializes to the optimal O(ε^{-3}) rate known for Lipschitz problems when p equals 1, establishing the first such guarantee for the non-Lipschitz setting.

Core claim

We construct a general smoothing framework for non-Lipschitz objective functions consisting of a smooth term and quasi-norm penalties on Riemannian manifolds. Using this, we propose a smoothing Riemannian gradient algorithm with a smoothing-aware AdaGrad-type stepsize rule, establish its global convergence, and derive an iteration complexity of O(ε^{p-4}) for p in (0,1], including O(ε^{-3}) for p=1.

What carries the argument

The smoothing framework that approximates the quasi-norm penalties while maintaining differentiability and approximation properties on the manifold, paired with a smoothing-aware AdaGrad-type stepsize rule.

If this is right

  • The proposed algorithm is guaranteed to converge globally to a stationary point.
  • The number of iterations required is bounded by O(ε^{p-4}).
  • For the special case p=1 the complexity improves to O(ε^{-3}), matching existing results for Lipschitz problems.
  • The method demonstrates practical efficiency in applications from machine learning and data science.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The smoothing technique could be adapted to other manifold optimization problems involving similar non-smooth terms beyond quasi-norms.
  • Similar adaptive rules might improve performance in stochastic optimization on manifolds.
  • Implementation on concrete manifolds such as the sphere could provide empirical validation of the theoretical rates.

Load-bearing premise

The constructed smoothing framework must satisfy the fundamental properties needed for the convergence analysis to hold on the manifold.

What would settle it

A numerical experiment on a manifold problem with known solution where the observed iteration count grows faster than O(ε^{p-4}) as ε decreases would falsify the complexity claim.

Figures

Figures reproduced from arXiv: 2604.18325 by Lei Wang, Xiaojun Chen.

Figure 1
Figure 1. Figure 1: Numerical comparison between ASRGA and RSSD for the SDL problem ( [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of two frames from the KITTI dataset. The first row corresponds to the raw [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Numerical comparison among ASRGA, DSGM, and ManPG for the SPCA problem ( [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

We study a class of optimization problems on Riemannian manifolds, where the objective function consists of a smooth term and quasi-norm type penalties with exponent $p \in (0, 1]$. The essential difficulty lies in the fact that the objective function may not be locally Lipschitz continuous, which places this type of problems beyond the reach of existing Riemannian techniques. To overcome this obstacle, this paper constructs a general smoothing framework and establishes fundamental properties for developing efficient algorithms. In particular, we propose a smoothing Riemannian gradient algorithm equipped with a smoothing-aware AdaGrad-type stepsize rule. Its global convergence is demonstrated together with an iteration complexity of $O (\epsilon^{p - 4})$, which includes the best available iteration complexity of $O (\epsilon^{- 3})$ for Lipschitz problems with $p = 1$ as a special case. To the best of our knowledge, this is the first complexity result for non-Lipschitz optimization on Riemannian manifolds. Preliminary numerical experiments corroborate the practical efficiency of the proposed approach in real-world applications arsing from machine learning and data science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs a smoothing framework for non-Lipschitz optimization on Riemannian manifolds, where the objective is a smooth term plus quasi-norm penalties with exponent p ∈ (0,1]. It proposes a smoothing Riemannian gradient algorithm with a smoothing-aware AdaGrad-type stepsize rule, proves global convergence, and derives an iteration complexity of O(ε^{p-4}), recovering the O(ε^{-3}) rate for the Lipschitz case p=1 as a special case. This is presented as the first complexity result of its kind on manifolds, supported by preliminary numerical experiments on machine learning applications.

Significance. If the smoothing approximation properties hold with respect to retractions and manifold geometry, the result would fill an important gap by providing the first iteration complexity guarantees for non-Lipschitz problems on Riemannian manifolds. The adaptive stepsize and recovery of the known Lipschitz rate are strengths; the work has clear relevance to sparse optimization and data science applications on manifolds.

major comments (2)
  1. [Smoothing framework (Section 3) and main convergence theorem] The central claim that the constructed smoothing framework satisfies the fundamental properties (uniform approximation of the objective and Riemannian gradient) needed for the O(ε^{p-4}) guarantee must be established explicitly with respect to the retraction operator rather than Euclidean lines. Any unaccounted error from curvature or retraction would invalidate the complexity bound, especially for p<1.
  2. [Iteration complexity theorem (likely Theorem 4.x)] In the complexity analysis, the derivation of the p-4 exponent relies on the smoothing error being controlled tightly enough that the total iteration count remains O(ε^{p-4}). The proof must bound the discrepancy between the smoothed gradient and the true Riemannian gradient under the retraction; without this, the global convergence rate for the non-Lipschitz case is not supported.
minor comments (2)
  1. [Abstract] Abstract contains a typo: 'arsing' should read 'arising'.
  2. [Preliminaries and notation] Clarify the precise definition of the smoothing operator on the manifold (e.g., whether it is defined via the exponential map or a general retraction) and ensure all notation for the Riemannian gradient and retraction is introduced consistently before the algorithm statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of our smoothing framework and complexity analysis on manifolds. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Smoothing framework (Section 3) and main convergence theorem] The central claim that the constructed smoothing framework satisfies the fundamental properties (uniform approximation of the objective and Riemannian gradient) needed for the O(ε^{p-4}) guarantee must be established explicitly with respect to the retraction operator rather than Euclidean lines. Any unaccounted error from curvature or retraction would invalidate the complexity bound, especially for p<1.

    Authors: We agree that all approximation properties must be stated intrinsically with respect to the retraction. In Section 3 the smoothing is defined by composing the Euclidean smoothing operator with the retraction R_x, so that the smoothed function is f_μ(x) = g(R_x(·)) smoothed along tangent vectors. Lemmas 3.1 and 3.2 already derive the uniform bounds |f_μ(x) - f(x)| ≤ Cμ^p and ||grad f_μ(x) - grad f(x)|| ≤ Cμ^{p-1} by using the second-order expansion of the retraction (dR_x(0) = Id and the Hessian term bounded by sectional curvature via the standard comparison theorem). These estimates replace Euclidean line segments with retraction curves and therefore incorporate curvature effects. To make the retraction dependence fully explicit we will add a short remark after Lemma 3.2 and a new corollary that restates the gradient approximation directly in terms of R_x, confirming that no additional curvature-induced error terms appear beyond those already absorbed in the O(·) constants. revision: partial

  2. Referee: [Iteration complexity theorem (likely Theorem 4.x)] In the complexity analysis, the derivation of the p-4 exponent relies on the smoothing error being controlled tightly enough that the total iteration count remains O(ε^{p-4}). The proof must bound the discrepancy between the smoothed gradient and the true Riemannian gradient under the retraction; without this, the global convergence rate for the non-Lipschitz case is not supported.

    Authors: The proof of the main complexity result (Theorem 4.2) already contains an explicit bound on the gradient discrepancy under the retraction. After invoking the smoothing property from Lemma 3.2, we apply the retraction error estimate ||R_x(v) - exp_x(v)|| = O(||v||^2) together with the fact that the algorithm step-sizes remain inside a neighborhood controlled by the injectivity radius. The resulting discrepancy term is O(μ^{p-1} + s_k^2), where s_k is the step length; this term is then absorbed into the AdaGrad descent inequality without changing the leading exponent. Balancing the smoothing parameter μ ∼ ε^{(4-p)/p} with the iteration count yields the stated O(ε^{p-4}) rate. To address the referee’s request for transparency we will insert a dedicated lemma (new Lemma 4.1) that isolates the retraction-based gradient discrepancy bound and reference it explicitly in the proof of Theorem 4.2. revision: yes

Circularity Check

0 steps flagged

No circularity: complexity bound derived from independent convergence analysis of new smoothing framework.

full rationale

The paper constructs a smoothing framework on manifolds, proves its fundamental approximation properties, and then analyzes the proposed smoothing Riemannian gradient algorithm with AdaGrad-type stepsizes to obtain global convergence and the iteration complexity O(ε^{p-4}). This rate is obtained directly from the proof steps (including handling of non-Lipschitz quasi-norm terms for p in (0,1]) rather than by fitting parameters to data or renaming inputs. No self-citation chain, self-definitional loop, or ansatz smuggling is load-bearing for the central claim; the framework and analysis are self-contained within the manuscript. The 'first complexity result' statement is a novelty claim, not a derivation step that reduces to prior inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence and properties of a general smoothing framework for quasi-norm penalties on manifolds and on standard Riemannian geometry assumptions (e.g., completeness, retraction properties) that are not detailed in the abstract.

axioms (1)
  • domain assumption The objective is the sum of a smooth function and a quasi-norm penalty with p in (0,1] that is not locally Lipschitz.
    Stated in the abstract as the problem class under study.

pith-pipeline@v0.9.0 · 5485 in / 1281 out tokens · 45462 ms · 2026-05-10T04:23:22.926634+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization Algorithms on Matrix Manifolds. Prince- ton University Press, Princeton, 2008

  2. [2]

    Beck and I

    A. Beck and I. Rosset. A dynamic smoothing technique for a class of nonsmooth optimization problems on manifolds.SIAM Journal on Optimization, 33(3):1473–1493, 2023

  3. [3]

    Bian and X

    W. Bian and X. Chen. Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization.SIAM Journal on Optimization, 23(3):1718–1741, 2013

  4. [4]

    W. Bian, X. Chen, and Y. Ye. Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization.Mathematical Programming, 149(1):301–327, 2015

  5. [5]

    B¨ ohm and S

    A. B¨ ohm and S. J. Wright. Variable smoothing for weakly convex composite functions.Journal of Optimization Theory and Applications, 188:628–649, 2021

  6. [6]

    R. I. Bot ¸ and C. Hendrich. A variable smoothing algorithm for solving convex optimization problems.TOP, 23:124–150, 2015

  7. [7]

    Boumal.An Introduction to Optimization on Smooth Manifolds

    N. Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge, 2023

  8. [8]

    Boumal, P.-A

    N. Boumal, P.-A. Absil, and C. Cartis. Global rates of convergence for nonconvex optimization on manifolds.IMA Journal of Numerical Analysis, 39(1):1–33, 2018

  9. [9]

    S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Proximal gradient method for nonsmooth optimization over the Stiefel manifold.SIAM Journal on Optimization, 30(1):210–239, 2020

  10. [10]

    S. Chen, S. Ma, A. M.-C. So, and T. Zhang. Nonsmooth optimization over the Stiefel manifold and beyond: Proximal gradient method and recent variants.SIAM Review, 66(2):319–352, 2024. 19

  11. [11]

    W. Chen, H. Ji, and Y. You. An augmented Lagrangian method forℓ 1-regularized optimization problems with orthogonality constraints.SIAM Journal on Scientific Computing, 38(4):B570– B592, 2016

  12. [12]

    X. Chen. Smoothing methods for nonsmooth, nonconvex minimization.Mathematical Program- ming, 134:71–99, 2012

  13. [13]

    X. Chen, Y. He, and Z. Zhang. Tight error bounds for the sign-constrained Stiefel manifold. SIAM Journal on Optimization, 35(1):302–329, 2025

  14. [14]

    X. Chen, W. Li, and Q. Luo. Orthogonal nonnegative matrix factorization via minimization over the null space.to appear in SIAM Journal on Matrix Analysis and Applications, 2026

  15. [15]

    Chen and P

    X. Chen and P. L. Toint. High-order evaluation complexity for convexly-constrained optimization with non-Lipschitzian group sparsity terms.Mathematical Programming, 187(1):47–78, 2021

  16. [16]

    X. Chen, P. L. Toint, and H. Wang. Complexity of partially separable convexly constrained optimization with non-Lipschitzian singularities.SIAM Journal on Optimization, 29(1):874–903, 2019

  17. [17]

    X. Chen, F. Xu, and Y. Ye. Lower bound theory of nonzero entries in solutions ofℓ 2-ℓp minimiza- tion.SIAM Journal on Scientific Computing, 32(5):2832–2852, 2010

  18. [18]

    K. Deng, J. Hu, J. Wu, and Z. Wen. Oracle complexities of augmented Lagrangian methods for nonsmooth composite optimization on a compact submanifold.Mathematics of Operations Research, pages 1–27, 2025

  19. [19]

    T. Ding, Z. Zhu, T. Ding, Y. Yang, D. P. Robinson, M. C. Tsakiris, and R. Vidal. Noisy dual principal component pursuit. InProceedings of the 36th International Conference on Machine Learning, volume 97, pages 1617–1625. PMLR, 2019

  20. [20]

    Duchi, E

    J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochas- tic optimization.Journal of Machine Learning Research, 12(61):2121–2159, 2011

  21. [21]

    Garmanjani and L

    R. Garmanjani and L. N. Vicente. Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization.IMA Journal of Numerical Analysis, 33(3):1008–1028, 2013

  22. [22]

    Geiger, P

    A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The KITTI dataset.The International Journal of Robotics Research, 32(11):1231–1237, 2013

  23. [23]

    Gratton, S

    S. Gratton, S. Jerad, and P. L. Toint. Complexity of a class of first-order objective-function-free optimization algorithms.Optimization Methods and Software, pages 1–31, 2024

  24. [24]

    Gratton and P

    S. Gratton and P. L. Toint. A simple first-order algorithm for full-rank equality constrained optimization.arXiv:2510.16390, 2025

  25. [25]

    Huang and K

    W. Huang and K. Wei. Riemannian proximal gradient methods.Mathematical Programming, 194(1-2):371–413, 2022

  26. [26]

    Huang and K

    W. Huang and K. Wei. An inexact Riemannian proximal gradient method.Computational Opti- mization and Applications, 85(1):1–32, 2023

  27. [27]

    Jiang, X

    B. Jiang, X. Meng, Z. Wen, and X. Chen. An exact penalty approach for optimization with nonnegative orthogonality constraints.Mathematical Programming, 198(1):855–897, 2023

  28. [28]

    Jiang, M

    B. Jiang, M. Xu, X. Cai, and Y.-F. Liu. An inexact proximal framework for nonsmooth Rieman- nian difference-of-convex optimization.arXiv:2509.08561, 2025

  29. [29]

    LeCun, L

    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998. 20

  30. [30]

    J. Li, S. Ma, and T. Srivastava. A Riemannian alternating direction method of multipliers. Mathematics of Operations Research, 50(4):3222–3242, 2025

  31. [31]

    X. Li, S. Chen, Z. Deng, Q. Qu, Z. Zhu, and A. M.-C. So. Weakly convex optimization over Stiefel manifold using Riemannian subgradient-type methods.SIAM Journal on Optimization, 31(3):1605–1634, 2021

  32. [32]

    Y. Li, D. Sun, and L. Zhang. Unsupervised feature selection via nonnegative orthogonal con- strained regularized minimization.Journal of Machine Learning Research, 27(39):1–44, 2026

  33. [33]

    Y.-F. Liu, S. Ma, Y.-H. Dai, and S. Zhang. A smoothing SQP framework for a class of composite Lq minimization over polyhedron.Mathematical Programming, 158(1):467–500, 2016

  34. [34]

    Z. Peng, W. Wu, J. Hu, and K. Deng. Riemannian smoothing gradient type algorithms for nonsmooth optimization problem on compact Riemannian submanifold embedded in Euclidean space.Applied Mathematics&Optimization, 88(3):85, 2023

  35. [35]

    Y. Qian, S. Pan, and L. Xiao. Error bound and exact penalty method for optimization problems with nonnegative orthogonal constraint.IMA Journal of Numerical Analysis, 44(1):120–156, 2024

  36. [36]

    M. C. Tsakiris and R. Vidal. Dual principal component pursuit.Journal of Machine Learning Research, 19(18):1–49, 2018

  37. [37]

    L. Wang, L. Bao, and X. Liu. A decentralized proximal gradient tracking algorithm for composite optimization on Riemannian manifolds.Journal of Machine Learning Research, 26(106):1–37, 2025

  38. [38]

    Wang and X

    L. Wang and X. Liu. Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function.IEEE Transactions on Signal Processing, 70:3029–3041, 2022

  39. [39]

    Wang and X

    L. Wang and X. Liu. Smoothing gradient tracking for decentralized optimization over the Stiefel manifold with non-smooth regularizers. InProceedings of the 62nd IEEE Conference on Decision and Control (CDC), pages 126–132. IEEE, 2023

  40. [40]

    L. Wang, X. Liu, and X. Chen. The distributionally robust optimization model of sparse principal component analysis.arXiv:2503.02494, 2025

  41. [41]

    L. Wang, X. Liu, and X. Chen. A support-set algorithm for optimization problems with nonneg- ative and orthogonal constraints.arXiv:2511.03443, 2025

  42. [42]

    L. Wang, X. Liu, and Y. Zhang. Seeking consensus on subspaces in federated principal component analysis.Journal of Optimization Theory and Applications, 203:529–561, 2024

  43. [43]

    Y. Zhai, Z. Yang, Z. Liao, J. Wright, and Y. Ma. Complete dictionary learning viaℓ 4-norm maximization over the orthogonal group.Journal of Machine Learning Research, 21(165):1–68, 2020

  44. [44]

    Zhang, X

    C. Zhang, X. Chen, and S. Ma. A Riemannian smoothing steepest descent method for non- Lipschitz optimization on embedded submanifolds ofR n.Mathematics of Operations Research, 49(3):1710–1733, 2024

  45. [45]

    Zheng, X

    Z. Zheng, X. Yu, S. Ma, and L. Xue. A new inexact manifold proximal linear algorithm with adaptive stopping criteria.arXiv:2508.19234, 2025. 21