A Theory of the NEPv Approach for Optimization On the Stiefel Manifold
Pith reviewed 2026-05-24 08:44 UTC · model grok-4.3
The pith
A unifying framework guarantees global convergence to a stationary point for NEPv optimization on the Stiefel manifold when the objective meets basic assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If the objective satisfies the basic assumptions, the SCF iteration for the associated NEPv or NPDo produces a monotonically increasing sequence of objective values and converges globally to a stationary point. Atomic functions that cover traces of linear and quadratic forms satisfy the assumptions, as do convex compositions of atomic functions, thereby justifying the NEPv/NPDo approach for this large class of objectives.
What carries the argument
The unifying framework built on basic assumptions, with the notion of atomic functions as the central examples that meet those assumptions.
Load-bearing premise
The objective function must satisfy certain basic assumptions whose precise requirements and verification procedure remain unspecified.
What would settle it
An explicit objective function that meets the basic assumptions yet produces an SCF sequence that fails to increase the objective monotonically or fails to reach a stationary point.
read the original abstract
The NEPv approach has been increasingly used lately for optimization on the Stiefel manifold arising from machine learning. General speaking, the approach first turns the first order optimality condition, also known as the KKT condition, into a nonlinear eigenvalue problem with eigenvector dependency (NEPv) or a nonlinear polar decomposition with orthogonal factor dependency (NPDo) and then solve the nonlinear problem via some variations of the self-consistent-field (SCF) iteration. The difficulty, however, lies in designing a proper SCF iteration so that a maximizer is found at the end. Currently, each use of the approach is very much individualized, especially in its convergence analysis to show that the approach does work or otherwise. In this paper, a unifying framework is established. The framework is built upon some basic assumptions. If the basic assumptions are satisfied, globally convergence is guaranteed to a stationary point and during the SCF iterative process that leads to the stationary point, the objective function increases monotonically. Also a notion of atomic functions is proposed, which include commonly used matrix traces of linear and quadratic forms as special ones. It is shown that the basic assumptions are satisfied by atomic functions and by convex compositions of atomic functions. Together they provide a large collection of objectives for which the NEPv/NPDo approach is guaranteed to work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a unifying framework for the NEPv/NPDo approach to optimization on the Stiefel manifold. Under a set of basic assumptions on the objective, the SCF iteration is proven to produce a monotonic increase in the objective value and to converge globally to a stationary point. The paper introduces the class of atomic functions (encompassing trace-linear and trace-quadratic forms as special cases) and shows that the basic assumptions hold for atomic functions as well as for convex compositions of atomic functions, thereby covering a broad collection of objectives arising in machine learning.
Significance. If the central claims hold, the work supplies a general convergence theory that replaces case-by-case analyses for NEPv methods on the Stiefel manifold. The explicit verification that the assumptions are satisfied by atomic functions and their convex compositions is a concrete strength, as it identifies a sizable, readily applicable class of objectives for which the SCF iteration is guaranteed to behave as claimed.
minor comments (2)
- [Abstract] The abstract states that the framework is 'built upon some basic assumptions' but does not list them; a brief enumeration of the assumptions (with forward references to their precise statements in §2 or §3) would improve readability for readers who wish to check applicability without reading the full proofs first.
- [§1] Notation for the Stiefel manifold and the orthogonal factor in the NPDo formulation should be introduced once with a single consistent symbol (e.g., X or Q) rather than switching between several symbols across sections.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the manuscript, the accurate summary of its contributions, and the recommendation of minor revision. The report correctly identifies the unifying framework based on basic assumptions, the guaranteed monotonic increase and global convergence to stationary points, and the concrete class of atomic functions together with their convex compositions as the main strengths.
Circularity Check
No significant circularity
full rationale
The paper defines a set of basic assumptions on the objective function, proves that the SCF iteration yields monotonic increase and global convergence to a stationary point whenever the assumptions hold, and separately verifies that the assumptions are satisfied by the explicitly defined class of atomic functions (including trace-linear and trace-quadratic cases) together with their convex compositions. This structure is self-contained: the convergence theorem is conditional on the assumptions, and the verification step is an independent check rather than a reduction of the result to its own inputs, fitted parameters, or self-citations. No load-bearing step reduces by construction to a prior result from the same authors or to a renaming of known patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Basic assumptions on the objective function and SCF iteration for global convergence and monotonicity
invented entities (1)
-
Atomic functions
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du C roz, A. Greenbaum, S. Ham- marling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide . SIAM, Philadelphia, 3rd edition, 1999
work page 1999
-
[3]
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (e ditors). Templates for the solution of Algebraic Eigenvalue Problems: A Practical Gui de. SIAM, Philadelphia, 2000
work page 2000
- [4]
- [5]
- [6]
-
[7]
P. Benner and X. Liang. Convergence analysis of vector extend ed locally optimal block pre- conditioned extended conjugate gradient method for computing e xtreme eigenvalues. Numer. Linear Algebra Appl. , 29(6):e2445, 2022. 24 pages
work page 2022
-
[8]
R. Bhatia. Matrix Analysis . Graduate Texts in Mathematics, vol. 169. Springer, New York, 1996
work page 1996
- [9]
- [10]
-
[11]
I. Borg and J. Lingoes. Multidimensional Similarity Structure Analysis . Springer-Verlag, New York, 1987
work page 1987
- [12]
-
[13]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004
work page 2004
- [14]
-
[15]
M. T. Chu and N. T. Trendafilov. The orthogonally constrained r egression revisited. J. Comput. Graph. Stat. , 10(4):746–771, 2001
work page 2001
-
[16]
J. P. Cunningham and Z. Ghahramani. Linear dimensionality reduc tion: Survey, insights, and generalizations. J. Mach. Learning Res. , 16:2859–2900, 2015
work page 2015
-
[17]
J. Demmel. Applied Numerical Linear Algebra . SIAM, Philadelphia, PA, 1997
work page 1997
-
[18]
A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorit hms with orthogonality constraints. SIAM J. Matrix Anal. Appl. , 20(2):303–353, 1999. 84
work page 1999
-
[19]
L. Eld´ en and H. Park. A procrustes problem on the Stiefel man ifold. Numer. Math. , 82:599– 619, 1999
work page 1999
-
[20]
K. Fan. On a theorem of Weyl concerning eigenvalues of linear tr ansformations. I. Proc. Natl. Acad. Sci. USA , 35(11):pp. 652–655, 1949
work page 1949
-
[21]
R. A. Fisher. The use of multiple measurements in taxonomic prob lems. Ann. Eugenics , 7(2):179–188, 1936
work page 1936
-
[22]
B. Gao, X. Liu, X. Chen, and Y.-X. Yuan. A new first-order algor ithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. , 28(1):302–332, 2018
work page 2018
-
[23]
G. Golub and Q. Ye. An inverse free preconditioned Krylov subsp ace methods for symmetric eigenvalue problems. SIAM J. Sci. Comput. , 24:312–334, 2002
work page 2002
-
[24]
G. H. Golub and C. F. Van Loan. Matrix Computations . Johns Hopkins University Press, Baltimore, Maryland, 4th edition, 2013
work page 2013
-
[25]
J. C. Gower and G. B. Dijksterhuis. Procrustes Problems. Oxford University Press, New York, 2004
work page 2004
-
[26]
N. J. Higham. Functions of Matrices: Theory and Computation . Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008
work page 2008
-
[27]
P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Phys. Rev. , 136:B864–B871, 1964
work page 1964
-
[28]
R. A. Horn and C. R. Johnson. Topics in Matrix Analysis . Cambridge University Press, Cambridge, 1991
work page 1991
-
[29]
R. A. Horn and C. R. Johnson. Matrix Analysis . Cambridge University Press, New York, NY, 2nd edition, 2013
work page 2013
-
[30]
J. R. Hurley and R. B. Cattell. The Procrustes program: produ cing direct rotation to test a hypothesized factor structure. Comput. Behav. Sci. , 7:258–262, 1962
work page 1962
-
[31]
A. Imakura, R.-C. Li, and S.-L. Zhang. Locally optimal and heavy ball GMRES methods. Japan J. Indust. Appl. Math. , 33:471–499, 2016
work page 2016
-
[32]
C. Kanzow and H.-D. Qi. A QP-free constrained Newton-type me thod for variational inequal- ity problems. Math. Program., 85:81–106, 1999
work page 1999
-
[33]
A. V. Knyazev. Toward the optimal preconditioned eigensolver : Locally optimal block pre- conditioned conjugate gradient method. SIAM J. Sci. Comput. , 23(2):517–541, 2001
work page 2001
-
[34]
W. Kohn and L. J. Sham. Self-consistent equations including exc hange and correlation effects. Phys. Rev. , 140:A1133–A1138, 1965
work page 1965
-
[35]
J. Kovaˇ c-Striko and K. Veseli´ c. Some remarks on the spect ra of Hermitian matrices. Linear Algebra Appl., 145:221–229, 1991
work page 1991
-
[36]
R. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK User’s Guide . SIAM, Philadelphia, 1998
work page 1998
-
[37]
R.-C. Li. A perturbation bound for the generalized polar decomp osition. BIT, 33:304–308, 1993
work page 1993
-
[38]
R.-C. Li. New perturbation bounds for the unitary polar factor . SIAM J. Matrix Anal. Appl. , 16:327–332, 1995. 85
work page 1995
-
[39]
R.-C. Li. Accuracy of computed eigenvectors via optimizing a Ray leigh quotient. BIT, 44(3):585–593, 2004
work page 2004
-
[40]
R.-C. Li. Matrix perturbation theory. In L. Hogben, R. Brualdi, and G. W. Stewart, editors, Handbook of Linear Algebra, page Chapter 21. CRC Press, Boca Raton, FL, 2nd edition, 2014
work page 2014
-
[41]
R.-C. Li. Rayleigh quotient based optimization methods for eigenv alue problems. In Z. Bai, Weiguo Gao, and Yangfeng Su, editors, Matrix Functions and Matrix Equations , volume 19 of Series in Contemporary Applied Mathematics , pages 76–108. World Scientific, Singapore, 2015
work page 2015
-
[42]
X. Liang and R.-C. Li. On generalizing trace minimization principles, I I. Linear Algebra Appl., 687:8–37, 2024
work page 2024
-
[43]
X. Liang, R.-C. Li, and Z. Bai. Trace minimization principles for posit ive semi-definite pencils. Linear Algebra Appl. , 438:3085–3106, 2013
work page 2013
- [44]
- [45]
-
[46]
D. Lu and R.-C. Li. Locally unitarily invariantizable NEPv and conver gence analysis of SCF. Math. Comp. , 93(349):2291–2329, 2024. Published electronically: January 9, 2 024
work page 2024
-
[47]
J. Mor´ e and D. Sorensen. Computing a trust region step. SIAM J. Sci. Statist. Comput. , 4(3):553–572, 1983
work page 1983
-
[48]
T. Ngo, M. Bellalij, and Y. Saad. The trace ratio optimization prob lem for dimensionality reduction. SIAM J. Matrix Anal. Appl. , 31(5):2950–2971, 2010
work page 2010
-
[49]
F. Nie, R. Zhang, and X. Li. A generalized power iteration method for solving quadratic problem on the Stiefel manifold. SCIENCE CHINA Info. Sci. , 60(11):1–10, 2017
work page 2017
-
[50]
J. Nocedal and S. Wright. Numerical Optimization . Springer, 2nd edition, 2006
work page 2006
-
[51]
B. N. Parlett. The Symmetric Eigenvalue Problem . SIAM, Philadelphia, 1998. This SIAM edition is an unabridged, corrected reproduction of the work first published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1980
work page 1998
-
[52]
B. T. Polyak. Introduction to Optimization . Optimization Software, New York, 1987
work page 1987
-
[53]
P. Quillen and Q. Ye. A block inverse-free preconditioned Krylov s ubspace method for sym- metric generalized eigenvalue problems. J. Comput. Appl. Math. , 233(5):1298–1313, 2010
work page 2010
- [54]
-
[55]
J. D. Rutter. A serial implementation of Cuppen’s divide and conq uer algorithm for the symmetric eigenvalue problem. Technical Report UCB/CSD-94-799 , EECS Department, Uni- versity of California, Berkeley, February 1994
work page 1994
-
[56]
Y. Saad. Numerical Methods for Large Eigenvalue Problems . Manchester University Press, Manchester, UK, 1992
work page 1992
-
[57]
Y. Saad, J. R. Chelikowsky, and S. M. Shontz. Numerical metho ds for electronic structure calculations of materials. SIAM Rev. , 52(1):3–54, 2010. 86
work page 2010
-
[58]
G. W. Stewart and J. G. Sun. Matrix Perturbation Theory . Academic Press, Boston, 1990
work page 1990
-
[59]
J. G. Sun. Matrix Perturbation Analysis . Graduate Texts (Academia, Sinica). Science Pub- lisher, Beijing, 2nd edition, November 2001. in Chinese
work page 2001
- [60]
-
[61]
J. M. F. Ten Berge. Generalized approaches to the MAXBET pro blem and the MAXDIFF problem, with applications to canonical correlations. Psychometrika, 53(4):487–494, 1984
work page 1984
-
[62]
Z. Teng and R.-C. Li. Variations of orthonormal basis matrices o f subspaces. Numer. Alg., Contr. Optim. , 2024. to appear
work page 2024
-
[63]
J. P. Van de Geer. Linear relations among k sets of variables. Psychometrika, 49(1):70–94, 1984
work page 1984
-
[64]
J. von Neumann. Some matrix-inequalities and metrization of mat rix-space. Tomck. Univ. Rev., 1:286–300, 1937
work page 1937
-
[65]
L. Wang, B. Gao, and X. Liu. Multipliers correction methods for o ptimization problems over the Stiefel manifold. CSIAM Trans. Appl. Math. , 2(3):508–531, 2021
work page 2021
-
[66]
L. Wang, L.-H. Zhang, and R.-C. Li. Maximizing sum of coupled trac es with applications. Numer. Math. , 152:587–629, 2022. doi.org/10.1007/s00211-022-01322-y
-
[67]
L. Wang, L.-H. Zhang, and R.-C. Li. Trace ratio optimization with a n application to multi- view learning. Math. Program., 201:97–131, 2023. doi.org/10.1007/s10107-022-01900-w
-
[68]
H. F. Weinberger. Variational Methods for Eigenvalue Approximation , volume 15 of CBMS- NSF Regional Conference Series in Applied Mathematics . SIAM, Philadelphia, 1974
work page 1974
- [69]
-
[70]
C. Yang, J. C. Meza, B. Lee, and L.-W. Wang. KSSOL V—a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Software , 36(2):1–35, 2009
work page 2009
-
[71]
M. Yang and R.-C. Li. Heavy ball flexible GMRES method for nonsym metric linear systems. J. Comp. Math. , 40(5):715–731, 2021
work page 2021
-
[72]
L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Fast algorithms for the ge neralized Foley-Sammon discriminant analysis. SIAM J. Matrix Anal. Appl. , 31(4):1584–1605, 2010
work page 2010
-
[73]
L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Superlinear convergence of a general algorithm for the generalized Foley-Sammon discriminant analysis. J. Optim. Theory Appl. , 157(3):853–865, 2013
work page 2013
- [74]
-
[75]
L.-H. Zhang and R.-C. Li. Maximization of the sum of the trace rat io on the Stiefel manifold, I: Theory. SCIENCE CHINA Math. , 57(12):2495–2508, 2014
work page 2014
-
[76]
L.-H. Zhang and R.-C. Li. Maximization of the sum of the trace rat io on the Stiefel manifold, II: Computation. SCIENCE CHINA Math. , 58(7):1549–1566, 2015
work page 2015
- [77]
-
[78]
Z. Zhang and K. Du. Successive projection method for solving t he unbalanced Procrustes problem. SCIENCE CHINA Math. , 49(7):971–986, 2006
work page 2006
- [79]
-
[80]
H. Zhao, Z. Wang, and F. Nie. Orthogonal least squares regre ssion for feature extraction. Neurocomputing, 216:200–207, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.