A generalized canonical metric for optimization on the indefinite Stiefel manifold
Pith reviewed 2026-05-18 15:24 UTC · model grok-4.3
The pith
A generalized canonical metric simplifies Riemannian optimization on the indefinite Stiefel manifold.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the generalized canonical metric equips the indefinite Stiefel manifold with a Riemannian structure in which the gradient of the objective function admits a closed-form expression that avoids the computational burden of solving a Lyapunov equation at each iteration, while still permitting the construction of a quasi-geodesic and an associated retraction that can be used to extend Euclidean optimization methods to the manifold.
What carries the argument
The generalized canonical metric, a Riemannian metric on the indefinite Stiefel manifold chosen so that the orthogonal projection and gradient formula become simpler and cheaper to evaluate.
If this is right
- The Riemannian gradient descent algorithm on this manifold becomes cheaper to run because each gradient step avoids a matrix equation solve.
- Quasi-geodesics provide a new way to move along the manifold that respects the new geometry.
- Retractions derived from the quasi-geodesic enable practical implementation of the optimization method.
- Performance comparisons indicate the new approach is at least as effective as the prior Lyapunov-based method.
Where Pith is reading between the lines
- If the metric generalizes well, it could reduce costs for larger-scale problems in scientific computing modeled on this manifold.
- The construction might suggest similar canonical metrics for related matrix manifolds with indefinite constraints.
- Testable extension: apply the method to specific problems like orthogonal Procrustes with indefinite signatures and measure wall-clock time savings.
Load-bearing premise
That the proposed generalized canonical metric is positive definite and compatible with the manifold structure so that it truly defines a Riemannian metric.
What would settle it
A numerical example or algebraic counterexample in which the new metric fails to produce a positive definite inner product on the tangent space or in which the resulting optimization algorithm diverges while the previous method converges.
read the original abstract
Various tasks in scientific computing can be modeled as an optimization problem on the indefinite Stiefel manifold. We address this using the Riemannian approach, which basically consists of equipping the feasible set with a Riemannian metric, preparing geometric tools such as orthogonal projections, formulae for Riemannian gradient, retraction and then extending an unconstrained optimization algorithm on the Euclidean space to the established manifold. The choice for the metric undoubtedly has a great influence on the method. In the previous work [D.V. Tiep and N.T. Son, A Riemannian gradient descent method for optimization on the indefinite Stiefel manifold, arXiv:2410.22068v2[math.OC]], a tractable metric, which is indeed a family of Riemannian metrics defined by a symmetric positive-definite matrix depending on the contact point, has been used. In general, it requires solving a Lyapunov matrix equation every time when the gradient of the cost function is needed, which might significantly contribute to the computational cost. To address this issue, we propose a new Riemannian metric for the indefinite Stiefel manifold. Furthermore, we construct the associated geometric structure, including a so-called quasi-geodesic and propose a retraction based on this curve. We then numerically verify the performance of the Riemannian gradient descent method associated with the new geometry and compare it with the previous work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a generalized canonical Riemannian metric on the indefinite Stiefel manifold as a computationally cheaper alternative to the Lyapunov-equation-based family from the authors' prior work. It develops the associated geometric objects, including a quasi-geodesic and a retraction based on this curve, derives the Riemannian gradient under the new metric, and numerically compares the performance of the resulting Riemannian gradient descent algorithm.
Significance. If the new metric defines a valid positive-definite structure and the derived tools are correct, the work could reduce per-iteration cost in Riemannian optimization on this manifold while preserving convergence behavior. The numerical experiments provide preliminary evidence of practical gains. The explicit construction of the quasi-geodesic adds a useful geometric contribution, but the absence of a definiteness verification limits the strength of the central claim.
major comments (1)
- [Metric definition section] The section defining the generalized canonical metric does not establish positive definiteness of g_X(ξ,η) for nonzero tangent vectors ξ. The manuscript derives the cheaper Riemannian gradient formula under the assumption that the metric is valid, yet supplies no eigenvalue analysis, lower bound, or explicit check (in contrast to the prior Lyapunov construction that enforced this via a positive-definite matrix solution). This property is load-bearing for the gradient, quasi-geodesic, and retraction.
minor comments (2)
- [Abstract and Introduction] The abstract and introduction refer to the previous Lyapunov-based family but could include one sentence recalling its explicit positive-definiteness guarantee for context.
- [Metric definition section] Notation for the operator A(X) in the metric definition should be cross-referenced to any earlier equation that defines it.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need to verify positive definiteness of the proposed metric. This is a substantive point that strengthens the foundation of the work, and we address it directly below.
read point-by-point responses
-
Referee: [Metric definition section] The section defining the generalized canonical metric does not establish positive definiteness of g_X(ξ,η) for nonzero tangent vectors ξ. The manuscript derives the cheaper Riemannian gradient formula under the assumption that the metric is valid, yet supplies no eigenvalue analysis, lower bound, or explicit check (in contrast to the prior Lyapunov construction that enforced this via a positive-definite matrix solution). This property is load-bearing for the gradient, quasi-geodesic, and retraction.
Authors: We agree that the original manuscript does not contain an explicit proof or eigenvalue bound establishing positive definiteness of the generalized canonical metric, in contrast to the Lyapunov-based construction in our prior work. This is a valid observation. In the revised version we will add a dedicated paragraph (or short lemma) in the metric definition section that proves g_X(ξ,ξ) ≥ c‖ξ‖_F² for some c>0 and all nonzero tangent vectors ξ. The argument proceeds by writing the metric as a perturbation of the standard Frobenius inner product whose perturbation term is controlled by the signature matrix and the defining relation XᵀJX = I_p; we then obtain a uniform lower bound on the eigenvalues of the associated self-adjoint operator. A brief numerical verification for small (p,n) will also be included for illustration. With this addition the derivations of the Riemannian gradient, quasi-geodesic, and retraction rest on a rigorously verified Riemannian structure. revision: yes
Circularity Check
Minor self-citation to authors' prior Lyapunov metric work; new generalized canonical metric presented as independent proposal.
specific steps
-
self citation load bearing
[Abstract]
"In the previous work [D.V. Tiep and N.T. Son, A Riemannian gradient descent method for optimization on the indefinite Stiefel manifold, arXiv:2410.22068v2[math.OC]], a tractable metric, which is indeed a family of Riemannian metrics defined by a symmetric positive-definite matrix depending on the contact point, has been used. In general, it requires solving a Lyapunov matrix equation every time when the gradient of the cost function is needed, which might significantly contribute to the computational cost. To address this issue, we propose a new Riemannian metric for the indefinite Stiefel man"
The citation is to prior work by overlapping authors (Tiep and Son) describing the old metric's cost. However, because the current paper explicitly proposes a distinct new metric to replace it, the self-citation functions only as background motivation and does not make the new metric or its gradient formula equivalent to the prior construction by definition.
full rationale
The paper cites its own earlier arXiv:2410.22068 for the computational drawback of the Lyapunov-based family but introduces the new metric and associated quasi-geodesic/retraction as a fresh construction. No equation reduces the claimed cheaper gradient or validity to a fitted parameter or assumption taken from the cited paper. The derivation chain for the new geometry stands on its own definitions and is not forced by self-citation. This is a normal minor self-reference that does not affect the central independent content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A family of Riemannian metrics on the indefinite Stiefel manifold can be defined by a symmetric positive-definite matrix that depends on the base point.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we propose a new Riemannian metric for the indefinite Stiefel manifold... gρ,X⊥(Z1,Z2) := tr(Z1^T MX Z2) = (1/ρ)tr(W1^T W2) + tr(K1^T Γ3^{-1} K2)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 3.3... grad gcf(X) = ρ X J skew(J X^T ∇f̄(X)) + ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Princeton University Press, Princeton, NJ (2008)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Mani- folds. Princeton University Press, Princeton, NJ (2008)
work page 2008
-
[2]
Adler, R., Dedieu, J.P., Margulies, J., Martens, M., Shub, M.: Newton’s method on Riemannian manifolds and a geometric model for human spine. IMA J. Numer. Anal., 22, 359–390 (2002)
work page 2002
-
[3]
Bartels, R., Stewart, G.: Solution of the equationAX+XB=C. Comm. ACM, 15(9), 820–826 (1972)
work page 1972
-
[4]
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal.,8(1), 141–148 (1988)
work page 1988
- [5]
-
[6]
Cambridge University Press (2023)
Boumal, N.: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press (2023)
work page 2023
-
[7]
Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res.,15, 1455–1459 (2014)
work page 2014
-
[8]
Springer Berlin, Heidelberg (1977)
Deimling, K.: Ordinary Differential Equations in Banach Spaces. Springer Berlin, Heidelberg (1977)
work page 1977
-
[9]
Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl.,20(2), 303–353 (1998)
work page 1998
-
[10]
Gabay, D.: Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl.,37(2), 177–219 (1982)
work page 1982
-
[11]
Gao, B., Son, N.T., Absil, P.A., Stykel, T.: Geometry of the symplectic Stiefel man- ifold endowed with the Euclidean metric. In: F. Nielsen, F. Barbaresco (eds.) Ge- ometric Science of Information: GSI 2021,Lecture Notes in Computer Science, vol. 12829, 789–796. Springer Nature, Cham, Switzerland (2021)
work page 2021
-
[12]
Gao, B., Son, N.T., Absil, P.A., Stykel, T.: Riemannian optimization on the sym- plectic Stiefel manifold. SIAM J. Optim.,31(2), 1546–1575 (2021)
work page 2021
- [13]
-
[14]
G¨ uttel, S., Nakatsukasa, Y.: Scaled and squared subdiagonal Pad´ e approximation for the matrix exponential. SIAM J. Matrix Anal. Appl.,37(1), 145–170 (2016)
work page 2016
-
[15]
Neural Comput.,16(12), 2639–2664 (2004)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: An overview with application to learning methods. Neural Comput.,16(12), 2639–2664 (2004)
work page 2004
-
[16]
He, X., Niyogi, P.: Locality preserving projections. In: S. Thrun, L. Saul, B. Sch¨ olkopf (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press (2003)
work page 2003
-
[17]
Cambridge University Press, Cambridge, UK (1991)
Horn, R., Johnson, C.: Topics in Matrix Analysis. Cambridge University Press, Cambridge, UK (1991)
work page 1991
-
[18]
Hotelling, H.: Relations between two sets of variates. In: S. Kotz, N.L. Johnson (eds.) Breakthroughs in Statistics, Springer Ser. Statist., 162–190. Springer, New York, NY (1992)
work page 1992
-
[19]
Hu, J., Liu, X., Wen, Z.W., Yuan, Y.X.: A brief introduction to manifold optimiza- tion. J. Oper. Res. Soc. China,18, 199–248 (2020) Generalized canonical metric for indefinite Stiefel manifold 23
work page 2020
-
[20]
Iannazzo, B., Porcelli, M.: The Riemannian Barzilai–Borwein method with nonmono- tone line search and the matrix geometric mean computation. IMA J. Numer. Anal., 38, 495–517 (2018)
work page 2018
-
[21]
Jurdjevic, V., Markina, I., Silva Leite, F.: Extremal curves on Stiefel and Grassmann manifolds. J. Geom. Anal.,30, 3948–3978 (2020)
work page 2020
-
[22]
Linear Algebra Appl.,216, 139–158 (1995)
Kovaˇ c-Striko, J., Veseli´ c, K.: Trace minimization and definiteness of symmetric pen- cils. Linear Algebra Appl.,216, 139–158 (1995)
work page 1995
-
[23]
Krakowski, K.A., Machado, L., Silva Leite, F., Batista, J.: A modified Casteljau algorithm to solve interpolation problems on Stiefel manifolds. J. Comput. Appl. Math.,311, 84–99 (2017)
work page 2017
-
[24]
Mishra, B., Sepulchre, R.: Riemannian preconditioning. SIAM J. Optim.,26(1), 635–660 (2016)
work page 2016
-
[25]
Neurocomputing,67, 106–135 (2005)
Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing,67, 106–135 (2005)
work page 2005
-
[26]
Sameh, A.H., Wisniewski, J.A.: A trace minimization algorithm for the generalized eigenvalue problem. SIAM J. Numer. Anal.,19(6), 1243–1259 (1982)
work page 1982
-
[27]
Sato, H.: Riemannian Optimization and Its Applications. SpringerBriefs Electr. Com- put. Eng. Springer (2021)
work page 2021
-
[28]
Sato, H., Aihara, K.: Cholesky QR-based retraction on the generalized Stiefel mani- fold. Comput. Optim. Appl.,72, 293–308 (2019)
work page 2019
-
[29]
Shustin, B., Avron, H.: Riemannian optimization with a preconditioning scheme on the generalized Stiefel manifold. J. Comput. Appl. Math.,423, 114953 (2023)
work page 2023
-
[30]
Smith, S.T.: Optimization techniques on Riemannian manifolds. Fields Inst. Com- mun.,3, 113–136 (1994)
work page 1994
-
[31]
A Riemannian gradient descent method for optimization on the indefinite Stiefel manifold
Tiep, D.V., Son, N.T.: A Riemannian gradient descent method for optimization on the indefinite Stiefel manifold. https://doi.org/10.48550/arXiv.2410.22068 (2025)
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2410.22068 2025
-
[32]
Mathematics and Its Applications
Udriste, C.: Convex Functions and Optimization Methods on Riemannian Manifolds. Mathematics and Its Applications. Springer (1994)
work page 1994
-
[33]
Wang, X., Deng, K., Peng, Z., Yan, C.: New vector transport operators extending a Riemannian CG algorithm to generalized Stiefel manifold with low-rank applications. J. Comput. Appl. Math.,451, 116024 (2024)
work page 2024
-
[34]
Wilcox, R.M.: Exponential operators and parameter differentiation in quantum physics. J. Math. Phys.,8(4), 962–982 (1967)
work page 1967
-
[35]
Yger, F., Berar, M., Gasso, G., Rakotomamonjy, A.: Adaptive canonical correlation analysis based on matrix manifolds. In: J. Langford, J. Pineau (eds.) ICML ’12 Pro- ceedings of the 29th International Coference on International Conference on Machine Learning, 1071–1078. Omnipress, New York, NY, USA (2012)
work page 2012
-
[36]
Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim.,14(4), 1043–1056 (2004)
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.