pith. sign in

arxiv: 2305.00091 · v6 · submitted 2023-04-28 · 🧮 math.OC

A Theory of the NEPv Approach for Optimization On the Stiefel Manifold

Pith reviewed 2026-05-24 08:44 UTC · model grok-4.3

classification 🧮 math.OC
keywords Stiefel manifoldNEPvSCF iterationoptimizationconvergenceatomic functionsmanifold optimization
0
0 comments X

The pith

A unifying framework guarantees global convergence to a stationary point for NEPv optimization on the Stiefel manifold when the objective meets basic assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general theory to justify the NEPv approach, which converts the first-order optimality condition into a nonlinear eigenvalue problem solved by SCF iteration. Under the basic assumptions the iteration is shown to increase the objective monotonically at every step while converging globally to a stationary point. Atomic functions, which include common matrix traces of linear and quadratic forms, and convex compositions of atomic functions are proven to satisfy the assumptions, supplying a sizable collection of objectives for which the method is reliable.

Core claim

If the objective satisfies the basic assumptions, the SCF iteration for the associated NEPv or NPDo produces a monotonically increasing sequence of objective values and converges globally to a stationary point. Atomic functions that cover traces of linear and quadratic forms satisfy the assumptions, as do convex compositions of atomic functions, thereby justifying the NEPv/NPDo approach for this large class of objectives.

What carries the argument

The unifying framework built on basic assumptions, with the notion of atomic functions as the central examples that meet those assumptions.

Load-bearing premise

The objective function must satisfy certain basic assumptions whose precise requirements and verification procedure remain unspecified.

What would settle it

An explicit objective function that meets the basic assumptions yet produces an SCF sequence that fails to increase the objective monotonically or fails to reach a stationary point.

read the original abstract

The NEPv approach has been increasingly used lately for optimization on the Stiefel manifold arising from machine learning. General speaking, the approach first turns the first order optimality condition, also known as the KKT condition, into a nonlinear eigenvalue problem with eigenvector dependency (NEPv) or a nonlinear polar decomposition with orthogonal factor dependency (NPDo) and then solve the nonlinear problem via some variations of the self-consistent-field (SCF) iteration. The difficulty, however, lies in designing a proper SCF iteration so that a maximizer is found at the end. Currently, each use of the approach is very much individualized, especially in its convergence analysis to show that the approach does work or otherwise. In this paper, a unifying framework is established. The framework is built upon some basic assumptions. If the basic assumptions are satisfied, globally convergence is guaranteed to a stationary point and during the SCF iterative process that leads to the stationary point, the objective function increases monotonically. Also a notion of atomic functions is proposed, which include commonly used matrix traces of linear and quadratic forms as special ones. It is shown that the basic assumptions are satisfied by atomic functions and by convex compositions of atomic functions. Together they provide a large collection of objectives for which the NEPv/NPDo approach is guaranteed to work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript develops a unifying framework for the NEPv/NPDo approach to optimization on the Stiefel manifold. Under a set of basic assumptions on the objective, the SCF iteration is proven to produce a monotonic increase in the objective value and to converge globally to a stationary point. The paper introduces the class of atomic functions (encompassing trace-linear and trace-quadratic forms as special cases) and shows that the basic assumptions hold for atomic functions as well as for convex compositions of atomic functions, thereby covering a broad collection of objectives arising in machine learning.

Significance. If the central claims hold, the work supplies a general convergence theory that replaces case-by-case analyses for NEPv methods on the Stiefel manifold. The explicit verification that the assumptions are satisfied by atomic functions and their convex compositions is a concrete strength, as it identifies a sizable, readily applicable class of objectives for which the SCF iteration is guaranteed to behave as claimed.

minor comments (2)
  1. [Abstract] The abstract states that the framework is 'built upon some basic assumptions' but does not list them; a brief enumeration of the assumptions (with forward references to their precise statements in §2 or §3) would improve readability for readers who wish to check applicability without reading the full proofs first.
  2. [§1] Notation for the Stiefel manifold and the orthogonal factor in the NPDo formulation should be introduced once with a single consistent symbol (e.g., X or Q) rather than switching between several symbols across sections.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the accurate summary of its contributions, and the recommendation of minor revision. The report correctly identifies the unifying framework based on basic assumptions, the guaranteed monotonic increase and global convergence to stationary points, and the concrete class of atomic functions together with their convex compositions as the main strengths.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines a set of basic assumptions on the objective function, proves that the SCF iteration yields monotonic increase and global convergence to a stationary point whenever the assumptions hold, and separately verifies that the assumptions are satisfied by the explicitly defined class of atomic functions (including trace-linear and trace-quadratic cases) together with their convex compositions. This structure is self-contained: the convergence theorem is conditional on the assumptions, and the verification step is an independent check rather than a reduction of the result to its own inputs, fitted parameters, or self-citations. No load-bearing step reduces by construction to a prior result from the same authors or to a renaming of known patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the basic assumptions (unspecified in abstract) and the newly introduced atomic functions as the class for which the assumptions hold; no free parameters or external benchmarks are mentioned.

axioms (1)
  • domain assumption Basic assumptions on the objective function and SCF iteration for global convergence and monotonicity
    The convergence guarantee is conditional on these assumptions being satisfied, which are the load-bearing premises of the framework.
invented entities (1)
  • Atomic functions no independent evidence
    purpose: To define a broad class of objective functions (including traces of linear and quadratic forms) that satisfy the basic assumptions
    Newly proposed concept in the paper to cover commonly used objectives in the NEPv approach.

pith-pipeline@v0.9.0 · 5756 in / 1315 out tokens · 52319 ms · 2026-05-24T08:44:40.971950+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages

  1. [1]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms On Matrix Manifolds . Princeton University Press, Princeton, NJ, 2008

  2. [2]

    Anderson, Z

    E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du C roz, A. Greenbaum, S. Ham- marling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide . SIAM, Philadelphia, 3rd edition, 1999

  3. [3]

    Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (e ditors). Templates for the solution of Algebraic Eigenvalue Problems: A Practical Gui de. SIAM, Philadelphia, 2000

  4. [4]

    Bai, R.-C

    Z. Bai, R.-C. Li, and D. Lu. Sharp estimation of convergence rate for self-consistent field iteration to solve eigenvector-dependent nonlinear eigenvalue pro blems. SIAM J. Matrix Anal. Appl., 43(1):301–327, 2022

  5. [5]

    Bai and D

    Z. Bai and D. Lu. Variational characterization of monotone non linear eigenvector problems and geometry of self-consistent-field iteration. SIAM J. Matrix Anal. Appl. , 46(1):84–111, 2024

  6. [6]

    Balogh, T

    J. Balogh, T. Csendes, and T. Rapcs’a. Some global optimization p roblems on Stiefel mani- folds. J. Global Optim. , 30:91–101, 2004

  7. [7]

    Benner and X

    P. Benner and X. Liang. Convergence analysis of vector extend ed locally optimal block pre- conditioned extended conjugate gradient method for computing e xtreme eigenvalues. Numer. Linear Algebra Appl. , 29(6):e2445, 2022. 24 pages

  8. [8]

    R. Bhatia. Matrix Analysis . Graduate Texts in Mathematics, vol. 169. Springer, New York, 1996

  9. [9]

    Birtea, I

    P. Birtea, I. Ca¸ su, and D. Com˘ anescu. First order optimality c onditions and steepest descent algorithm on orthogonal Stiefel manifolds. Opt. Lett. , 13:1773–1791, 2019

  10. [10]

    Bolla, G

    M. Bolla, G. Michaletzky, G. Tusn´ ady, and M. Ziermann. Extrem a of sums of heterogeneous quadratic forms. Linear Algebra Appl. , 269(1):331–365, 1998

  11. [11]

    Borg and J

    I. Borg and J. Lingoes. Multidimensional Similarity Structure Analysis . Springer-Verlag, New York, 1987

  12. [12]

    Boumal, B

    N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Ma tlab toolbox for opti- mization on manifolds. J. Mach. Learning Res. , 15(42):1455–1459, 2014

  13. [13]

    Boyd and L

    S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004

  14. [14]

    Cai, L.-H

    Y. Cai, L.-H. Zhang, Z. Bai, and R.-C. Li. On an eigenvector-depe ndent nonlinear eigenvalue problem. SIAM J. Matrix Anal. Appl. , 39(3):1360–1382, 2018

  15. [15]

    M. T. Chu and N. T. Trendafilov. The orthogonally constrained r egression revisited. J. Comput. Graph. Stat. , 10(4):746–771, 2001

  16. [16]

    J. P. Cunningham and Z. Ghahramani. Linear dimensionality reduc tion: Survey, insights, and generalizations. J. Mach. Learning Res. , 16:2859–2900, 2015

  17. [17]

    J. Demmel. Applied Numerical Linear Algebra . SIAM, Philadelphia, PA, 1997

  18. [18]

    Edelman, T

    A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorit hms with orthogonality constraints. SIAM J. Matrix Anal. Appl. , 20(2):303–353, 1999. 84

  19. [19]

    Eld´ en and H

    L. Eld´ en and H. Park. A procrustes problem on the Stiefel man ifold. Numer. Math. , 82:599– 619, 1999

  20. [20]

    K. Fan. On a theorem of Weyl concerning eigenvalues of linear tr ansformations. I. Proc. Natl. Acad. Sci. USA , 35(11):pp. 652–655, 1949

  21. [21]

    R. A. Fisher. The use of multiple measurements in taxonomic prob lems. Ann. Eugenics , 7(2):179–188, 1936

  22. [22]

    B. Gao, X. Liu, X. Chen, and Y.-X. Yuan. A new first-order algor ithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. , 28(1):302–332, 2018

  23. [23]

    Golub and Q

    G. Golub and Q. Ye. An inverse free preconditioned Krylov subsp ace methods for symmetric eigenvalue problems. SIAM J. Sci. Comput. , 24:312–334, 2002

  24. [24]

    G. H. Golub and C. F. Van Loan. Matrix Computations . Johns Hopkins University Press, Baltimore, Maryland, 4th edition, 2013

  25. [25]

    J. C. Gower and G. B. Dijksterhuis. Procrustes Problems. Oxford University Press, New York, 2004

  26. [26]

    N. J. Higham. Functions of Matrices: Theory and Computation . Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008

  27. [27]

    Hohenberg and W

    P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Phys. Rev. , 136:B864–B871, 1964

  28. [28]

    R. A. Horn and C. R. Johnson. Topics in Matrix Analysis . Cambridge University Press, Cambridge, 1991

  29. [29]

    R. A. Horn and C. R. Johnson. Matrix Analysis . Cambridge University Press, New York, NY, 2nd edition, 2013

  30. [30]

    J. R. Hurley and R. B. Cattell. The Procrustes program: produ cing direct rotation to test a hypothesized factor structure. Comput. Behav. Sci. , 7:258–262, 1962

  31. [31]

    Imakura, R.-C

    A. Imakura, R.-C. Li, and S.-L. Zhang. Locally optimal and heavy ball GMRES methods. Japan J. Indust. Appl. Math. , 33:471–499, 2016

  32. [32]

    Kanzow and H.-D

    C. Kanzow and H.-D. Qi. A QP-free constrained Newton-type me thod for variational inequal- ity problems. Math. Program., 85:81–106, 1999

  33. [33]

    A. V. Knyazev. Toward the optimal preconditioned eigensolver : Locally optimal block pre- conditioned conjugate gradient method. SIAM J. Sci. Comput. , 23(2):517–541, 2001

  34. [34]

    Kohn and L

    W. Kohn and L. J. Sham. Self-consistent equations including exc hange and correlation effects. Phys. Rev. , 140:A1133–A1138, 1965

  35. [35]

    Kovaˇ c-Striko and K

    J. Kovaˇ c-Striko and K. Veseli´ c. Some remarks on the spect ra of Hermitian matrices. Linear Algebra Appl., 145:221–229, 1991

  36. [36]

    Lehoucq, D

    R. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK User’s Guide . SIAM, Philadelphia, 1998

  37. [37]

    R.-C. Li. A perturbation bound for the generalized polar decomp osition. BIT, 33:304–308, 1993

  38. [38]

    R.-C. Li. New perturbation bounds for the unitary polar factor . SIAM J. Matrix Anal. Appl. , 16:327–332, 1995. 85

  39. [39]

    R.-C. Li. Accuracy of computed eigenvectors via optimizing a Ray leigh quotient. BIT, 44(3):585–593, 2004

  40. [40]

    R.-C. Li. Matrix perturbation theory. In L. Hogben, R. Brualdi, and G. W. Stewart, editors, Handbook of Linear Algebra, page Chapter 21. CRC Press, Boca Raton, FL, 2nd edition, 2014

  41. [41]

    R.-C. Li. Rayleigh quotient based optimization methods for eigenv alue problems. In Z. Bai, Weiguo Gao, and Yangfeng Su, editors, Matrix Functions and Matrix Equations , volume 19 of Series in Contemporary Applied Mathematics , pages 76–108. World Scientific, Singapore, 2015

  42. [42]

    Liang and R.-C

    X. Liang and R.-C. Li. On generalizing trace minimization principles, I I. Linear Algebra Appl., 687:8–37, 2024

  43. [43]

    Liang, R.-C

    X. Liang, R.-C. Li, and Z. Bai. Trace minimization principles for posit ive semi-definite pencils. Linear Algebra Appl. , 438:3085–3106, 2013

  44. [44]

    Liang, L

    X. Liang, L. Wang, L.-H. Zhang, and R.-C. Li. On generalizing trac e minimization principles. Linear Algebra Appl. , 656:483–509, 2023

  45. [45]

    Liu, X.-F

    X.-G. Liu, X.-F. Wang, and W.-G. Wang. Maximization of matrix trac e function of product Stiefel manifolds. SIAM J. Matrix Anal. Appl. , 36(4):1489–1506, 2015

  46. [46]

    Lu and R.-C

    D. Lu and R.-C. Li. Locally unitarily invariantizable NEPv and conver gence analysis of SCF. Math. Comp. , 93(349):2291–2329, 2024. Published electronically: January 9, 2 024

  47. [47]

    Mor´ e and D

    J. Mor´ e and D. Sorensen. Computing a trust region step. SIAM J. Sci. Statist. Comput. , 4(3):553–572, 1983

  48. [48]

    T. Ngo, M. Bellalij, and Y. Saad. The trace ratio optimization prob lem for dimensionality reduction. SIAM J. Matrix Anal. Appl. , 31(5):2950–2971, 2010

  49. [49]

    F. Nie, R. Zhang, and X. Li. A generalized power iteration method for solving quadratic problem on the Stiefel manifold. SCIENCE CHINA Info. Sci. , 60(11):1–10, 2017

  50. [50]

    Nocedal and S

    J. Nocedal and S. Wright. Numerical Optimization . Springer, 2nd edition, 2006

  51. [51]

    B. N. Parlett. The Symmetric Eigenvalue Problem . SIAM, Philadelphia, 1998. This SIAM edition is an unabridged, corrected reproduction of the work first published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1980

  52. [52]

    B. T. Polyak. Introduction to Optimization . Optimization Software, New York, 1987

  53. [53]

    Quillen and Q

    P. Quillen and Q. Ye. A block inverse-free preconditioned Krylov s ubspace method for sym- metric generalized eigenvalue problems. J. Comput. Appl. Math. , 233(5):1298–1313, 2010

  54. [54]

    Rapcs´ ak

    T. Rapcs´ ak. On minimization on Stiefel manifolds. European J. Oper. Res. , 143(2):365–376, 2002

  55. [55]

    J. D. Rutter. A serial implementation of Cuppen’s divide and conq uer algorithm for the symmetric eigenvalue problem. Technical Report UCB/CSD-94-799 , EECS Department, Uni- versity of California, Berkeley, February 1994

  56. [56]

    Y. Saad. Numerical Methods for Large Eigenvalue Problems . Manchester University Press, Manchester, UK, 1992

  57. [57]

    Y. Saad, J. R. Chelikowsky, and S. M. Shontz. Numerical metho ds for electronic structure calculations of materials. SIAM Rev. , 52(1):3–54, 2010. 86

  58. [58]

    G. W. Stewart and J. G. Sun. Matrix Perturbation Theory . Academic Press, Boston, 1990

  59. [59]

    J. G. Sun. Matrix Perturbation Analysis . Graduate Texts (Academia, Sinica). Science Pub- lisher, Beijing, 2nd edition, November 2001. in Chinese

  60. [60]

    Takahashi

    I. Takahashi. A note on the conjugate gradient method. Inform. Process. Japan , 5:45–49, 1965

  61. [61]

    J. M. F. Ten Berge. Generalized approaches to the MAXBET pro blem and the MAXDIFF problem, with applications to canonical correlations. Psychometrika, 53(4):487–494, 1984

  62. [62]

    Teng and R.-C

    Z. Teng and R.-C. Li. Variations of orthonormal basis matrices o f subspaces. Numer. Alg., Contr. Optim. , 2024. to appear

  63. [63]

    J. P. Van de Geer. Linear relations among k sets of variables. Psychometrika, 49(1):70–94, 1984

  64. [64]

    von Neumann

    J. von Neumann. Some matrix-inequalities and metrization of mat rix-space. Tomck. Univ. Rev., 1:286–300, 1937

  65. [65]

    L. Wang, B. Gao, and X. Liu. Multipliers correction methods for o ptimization problems over the Stiefel manifold. CSIAM Trans. Appl. Math. , 2(3):508–531, 2021

  66. [66]

    Wang, L.-H

    L. Wang, L.-H. Zhang, and R.-C. Li. Maximizing sum of coupled trac es with applications. Numer. Math. , 152:587–629, 2022. doi.org/10.1007/s00211-022-01322-y

  67. [67]

    Wang, L.-H

    L. Wang, L.-H. Zhang, and R.-C. Li. Trace ratio optimization with a n application to multi- view learning. Math. Program., 201:97–131, 2023. doi.org/10.1007/s10107-022-01900-w

  68. [68]

    H. F. Weinberger. Variational Methods for Eigenvalue Approximation , volume 15 of CBMS- NSF Regional Conference Series in Applied Mathematics . SIAM, Philadelphia, 1974

  69. [69]

    Wen and W

    Z. Wen and W. Yin. A feasible method for optimization with orthogo nality constraints. Math. Program., 142(1-2):397–434, 2013

  70. [70]

    C. Yang, J. C. Meza, B. Lee, and L.-W. Wang. KSSOL V—a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Software , 36(2):1–35, 2009

  71. [71]

    Yang and R.-C

    M. Yang and R.-C. Li. Heavy ball flexible GMRES method for nonsym metric linear systems. J. Comp. Math. , 40(5):715–731, 2021

  72. [72]

    Zhang, L.-Z

    L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Fast algorithms for the ge neralized Foley-Sammon discriminant analysis. SIAM J. Matrix Anal. Appl. , 31(4):1584–1605, 2010

  73. [73]

    Zhang, L.-Z

    L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Superlinear convergence of a general algorithm for the generalized Foley-Sammon discriminant analysis. J. Optim. Theory Appl. , 157(3):853–865, 2013

  74. [74]

    Zhang, W

    L.-H. Zhang, W. H. Yang, C. Shen, and J. Ying. An eigenvalue-ba sed method for the unbal- anced Procrustes problem. SIAM J. Matrix Anal. Appl. , 41(3):957–983, 2020

  75. [75]

    Zhang and R.-C

    L.-H. Zhang and R.-C. Li. Maximization of the sum of the trace rat io on the Stiefel manifold, I: Theory. SCIENCE CHINA Math. , 57(12):2495–2508, 2014

  76. [76]

    Zhang and R.-C

    L.-H. Zhang and R.-C. Li. Maximization of the sum of the trace rat io on the Stiefel manifold, II: Computation. SCIENCE CHINA Math. , 58(7):1549–1566, 2015

  77. [77]

    Zhang, L

    L.-H. Zhang, L. Wang, Z. Bai, and R.-C. Li. A self-consistent-fie ld iteration for orthogonal canonical correlation analysis. IEEE Trans. Pattern Anal. Mach. Intell. , 44(2):890–904, 2022. 87

  78. [78]

    Zhang and K

    Z. Zhang and K. Du. Successive projection method for solving t he unbalanced Procrustes problem. SCIENCE CHINA Math. , 49(7):971–986, 2006

  79. [79]

    Zhang, Z

    Z. Zhang, Z. Zhai, and L. Li. Uniform projection for multi-view lea rning. IEEE Trans. Pattern Anal. Mach. Intell. , 39(8):1675–1689, 2017

  80. [80]

    H. Zhao, Z. Wang, and F. Nie. Orthogonal least squares regre ssion for feature extraction. Neurocomputing, 216:200–207, 2016

Showing first 80 references.