pith. machine review for the scientific record. sign in

arxiv: 2602.13759 · v2 · submitted 2026-02-14 · 💻 cs.LG · cs.NA· math.NA· math.OC

Recognition: 2 theorem links

· Lean Theorem

Discrete Double-Bracket Flows for Isotropic-Noise Invariant Eigendecomposition

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:07 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAmath.OC
keywords eigendecompositiondouble-bracket flowisotropic noiseSO(n)invariancestreaming algorithmStiefel manifoldLyapunov function
0
0 comments X

The pith

A discrete double-bracket flow on SO(n) performs eigendecomposition whose trajectory and step size depend only on the trace-free signal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a discrete flow that extracts eigenvectors from streaming matrices containing arbitrarily large isotropic noise of the form sigma squared times the identity. The construction uses a skew-symmetric generator in the Lie algebra so(n) so that any multiple of the identity is annihilated by antisymmetry and never enters the dynamics. As a result the entire trajectory, the Lyapunov function that certifies progress, and the largest stable step size are functions solely of the trace-free part of the observed matrix. The method proves input-to-state stability whose noise ball is governed only by the trace-free error and establishes global convergence to eigenvectors via a discrete strict-saddle argument.

Core claim

Under the streaming model C_k equals C_sig plus sigma_k squared I plus E_k, the discrete double-bracket iteration produces a trajectory on SO(n) whose evolution, associated Lyapunov function, and maximal stable step size eta_max equals 1 over L_C are determined exclusively by the trace-free signal C_e. The same generator yields input-to-state stability with a ball controlled solely by trace-free perturbations and guarantees global convergence to the eigendecomposition by strict-saddle geometry together with a discrete Lojasiewicz inequality.

What carries the argument

The skew-symmetric generator Omega equals [A, diag(A)] in the tangent Lie algebra so(n), whose antisymmetry forces every scalar multiple of the identity to vanish identically.

If this is right

  • Maximal stable step size and contraction rates stay fixed as the isotropic noise floor rises.
  • Input-to-state stability ball size is controlled only by the trace-free component E_k.
  • Global convergence to eigenvectors holds for any time-varying but isotropic noise level.
  • The same generator extends directly to top-k tracking on the Stiefel manifold at linear cost in k.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The invariance could let the algorithm run in sensor streams where a large common-mode offset varies over time without retuning.
  • Similar cancellations might be engineered on other matrix Lie groups whose center contains the identity.
  • Numerical checks with deliberately time-varying sigma squared would confirm that the discrete path itself never feels the isotropic level.

Load-bearing premise

The perturbation must be exactly a scalar multiple of the identity plus a small trace-free matrix so the algebraic cancellation in the generator remains exact.

What would settle it

Run the iteration on two observation sequences that share the same trace-free signal but differ by a constant isotropic offset and check whether the computed eigenvectors or the sequence of step sizes remain identical.

Figures

Figures reproduced from arXiv: 2602.13759 by JiaHe Feng, Zhiming Li.

Figure 1
Figure 1. Figure 1: Iterations to convergence vs. σ 2 (n = 10, 5 seeds). Left: All methods. Commutator methods remain flat at ≈ 1600 iterations; Raw Oja fails for σ 2 ≥ 10. Right: Commutator only, confirming O(1) complexity. 6.4. Minimax Lower Bound We formalize the iteration complexity gap as a minimax statement over the class of non-filtering algorithms. Definition 6.9 (Non-Filtering Algorithm Class). An algo￾rithm A belong… view at source ↗
Figure 2
Figure 2. Figure 2: provides diagnostic evidence for the theoretical mechanisms: (a) Direction Independence. Oja’s direction CM and commutator M[A, D] are essentially orthogonal: P(cos θ < 0) = 0.498 ± 0.016. The commutator is not a “better Oja” but a fundamentally different descent direc￾tion that happens to be σ-invariant. (b) Baseline Signal Vanishing. SI contraction ratio ρ = (λ2 + σ 2 )/(λ1 + σ 2 ) matches theory: ρ = 0.… view at source ↗
Figure 3
Figure 3. Figure 3: E1: Convergence trajectories for σ 2 ∈ {0, 103 , 106 }. All three curves overlap exactly (within machine precision), demonstrating pathwise σ 2 -invariance. Results [PITH_FULL_IMAGE:figures/full_fig_p062_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: E14: Iterations to convergence vs σ 2 . Commutator methods achieve O(1) complexity (flat line); Raw Oja exhibits O(σ 2 ) degradation and fails for σ 2 ≥ 10. H.3. Convergence Analysis H.3.1. ISS STEADY-STATE ERROR BALL (E3) Objective. Verify Theorem 4.3: steady-state error ball radius scales linearly with trace-free noise amplitude εE, indepen￾dent of σ 2 . Protocol. 1. For each (εE, σ2 ) pair with εE ∈ {0.… view at source ↗
Figure 5
Figure 5. Figure 5: E3: ISS steady-state error scales linearly with noise amplitude εE, independent of σ 2 . All three σ 2 curves overlap with R 2 > 0.93. Protocol. Two variants distinguish dynamics-level vs sampling-level σ 2 dependence: • Variant A (Matrix noise): Ck = Csig + σ 2 I + Ek with fixed Ek distribution. • Variant B (Sample covariance): Ck is sample covariance from xi ∼ N (0, Csig + σ 2 I). Use c = 0.5, k0 = 100, … view at source ↗
Figure 6
Figure 6. Figure 6: E4: Log-log convergence plot. The slope matches the theoretical O(1/k) rate (dashed line). 3. Compute monotone fraction across 10 seeds. Results. With n = 20 and ∥Ce∥2 ≈ 0.95, theoretical ηmax ≈ 0.03 [PITH_FULL_IMAGE:figures/full_fig_p066_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: E9: Spectral separation δ(Mk) trajectories for different noise levels. Low noise (εE ≤ 0.05) stays above the escape threshold; high noise crosses it. Results (n = 5, K = 1000, 10 seeds): • Entry success rate: 100% • Mean Tenter: 247 ± 89 iterations • Theoretical upper bound: O(n · poly(log(1/δ))) H.5. Baseline Failure Analysis H.5.1. COUNTEREXAMPLE VERIFICATION We verify that baseline methods fail both (P1… view at source ↗
Figure 8
Figure 8. Figure 8: E6: Normalized initial spectral separation E[δ(M0)]/g across eigenvalue gaps. The ratio is nearly constant (≈ 0.016), confirming linear scaling. Protocol. 1. For n ∈ {5, 10, 20}, sample 1000 pairs of nearby matrices M, M′ ∈ SO(n) with ∥M − M′∥F = ε = 10−4 . 2. Estimate Lest = max ∥∇f(M) − ∇f(M′ )∥F /∥M − M′∥F . 3. Compare to theoretical bound Lbound = cn∥Ce∥ 2 2 [PITH_FULL_IMAGE:figures/full_fig_p072_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: E8: Distribution of cos θ between Oja and commutator directions. The symmetric distribution centered at 0 confirms P(cos θ < 0) ≈ 0.5. H.7. MVP Cost Analysis: TF-Oja + Hutchinson A reviewer raised an important concern: the TF-Oja baseline “presumes access to trace per-iteration; this is not matrix-free unless one pays extra oracle cost.” We address this by comparing three approaches: 73 [PITH_FULL_IMAGE:f… view at source ↗
read the original abstract

We study eigendecomposition on $SO(n)$ under streaming observations $C_k = C_{\mathrm{sig}} + \sigma_k^2 I + E_k$, where the isotropic background $\sigma_k^2 I$ may be time-varying and arbitrarily large. Standard algorithms couple their stability to $\lVert C_k \rVert_2 \approx \sigma^2$, forcing step sizes, contraction rates, and iteration counts to degrade with the noise floor. We observe that $\sigma^2 I$ lies in the center of the matrix algebra and therefore *should never enter* the eigenspace dynamics. We construct a discrete double-bracket flow whose skew-symmetric generator $\Omega = [A, \operatorname{diag}(A)]$ operates in the tangent Lie algebra $\mathfrak{so}(n)$, where scalar multiples of the identity vanish by antisymmetry. The resulting trajectory, Lyapunov function, and maximal stable step size $\eta_{\max} = 1/L_C$ depend exclusively on the trace-free signal $C_e$ -- achieving pointwise, pathwise $\sigma^2$-invariance. We establish input-to-state stability with a noise ball governed solely by trace-free perturbations, prove global convergence via strict-saddle geometry and a discrete {\L}ojasiewicz argument, and extend the framework to top-$k$ eigentracking on the Stiefel manifold $\operatorname{St}(k,n)$ at cost $k$ matrix-vector products per step.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a discrete double-bracket flow on SO(n) for eigendecomposition under streaming covariances C_k = C_sig + σ_k² I + E_k. The skew-symmetric generator is defined as Ω = [A, diag(A)] with A = Q^T C_k Q; because scalar matrices commute to zero under the Lie bracket, the resulting trajectory, off-diagonal Frobenius Lyapunov function, and maximal stable step size η_max = 1/L_C depend only on the trace-free component C_e. The authors assert global convergence from strict-saddle geometry together with a discrete Łojasiewicz inequality, input-to-state stability with respect to trace-free perturbations alone, and an extension to top-k tracking on the Stiefel manifold St(k,n).

Significance. If the algebraic invariance and the accompanying stability and convergence arguments hold, the result supplies a principled way to perform streaming eigendecomposition whose iteration complexity and step-size selection are independent of the isotropic noise floor. This property is directly useful for online PCA and subspace tracking in high-noise regimes common in machine learning. The construction is parameter-free once the trace-free part is isolated, and the discrete formulation with an explicit Lipschitz-based step-size bound is a practical asset.

major comments (2)
  1. [discrete update rule and discrete Łojasiewicz section] The central invariance claim rests on the identity [σ² I, ·] ≡ 0. The manuscript must verify that this cancellation survives the discrete update exactly (i.e., that the one-step map on SO(n) remains a function of C_e alone) and that the discrete Łojasiewicz inequality inherits the same independence; otherwise the claimed pathwise σ²-invariance is only approximate.
  2. [input-to-state stability theorem] Input-to-state stability is stated with a noise ball governed solely by the trace-free part of E_k. The explicit radius of this ball and the associated ISS gain must be derived and shown to be independent of σ_k²; the current sketch leaves the quantitative bound implicit.
minor comments (2)
  1. [introduction] The notation C_e for the trace-free signal should be introduced with an explicit definition (C_e := C_k - (tr(C_k)/n) I) at first use rather than only in the abstract.
  2. [Stiefel extension paragraph] The extension to the Stiefel manifold is stated to cost k matrix-vector products per step; a short complexity comparison with existing top-k methods would help readers assess the practical overhead.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation for minor revision. The two major comments concern the exactness of the invariance under discretization and the explicit quantitative bounds in the ISS result. We address each point below and will incorporate the requested clarifications.

read point-by-point responses
  1. Referee: [discrete update rule and discrete Łojasiewicz section] The central invariance claim rests on the identity [σ² I, ·] ≡ 0. The manuscript must verify that this cancellation survives the discrete update exactly (i.e., that the one-step map on SO(n) remains a function of C_e alone) and that the discrete Łojasiewicz inequality inherits the same independence; otherwise the claimed pathwise σ²-invariance is only approximate.

    Authors: We agree that an explicit verification is useful. Because Ω_k = [A_k, diag(A_k)] with A_k = Q_k^T C_k Q_k and [σ_k² I, M] = 0 for any M, the generator Ω_k is identical to the generator computed from the trace-free part C_e alone. Consequently the one-step map Q_{k+1} = Q_k exp(η Ω_k) depends only on C_e. The discrete Lyapunov function is defined directly on the trace-free component, V(Q) = ||Q^T C_e Q - diag(Q^T C_e Q)||_F², and the discrete Łojasiewicz inequality is proved using the Lipschitz constant L_C of the map induced by C_e; both the inequality and its constants are therefore independent of σ_k². We will insert a short paragraph after the definition of the discrete flow that records this exact cancellation and the resulting independence of the Łojasiewicz constants. revision: yes

  2. Referee: [input-to-state stability theorem] Input-to-state stability is stated with a noise ball governed solely by the trace-free part of E_k. The explicit radius of this ball and the associated ISS gain must be derived and shown to be independent of σ_k²; the current sketch leaves the quantitative bound implicit.

    Authors: We accept the request for explicit constants. The ISS statement is with respect to the trace-free perturbation sequence {E_k}. Because both the vector field and the Lyapunov function are insensitive to the isotropic term, the radius of the ultimate bound is of the form r = γ · sup_k ||E_k||_F where the gain γ depends only on the step-size η and the Lipschitz constant L_C of the trace-free map; it is independent of σ_k². We will add the explicit expression for γ (γ = η / (1 - η L_C)) together with the corresponding ball radius in the revised ISS theorem statement and proof. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation chain relies on the algebraic identity [σ²I, X] ≡ 0 for any matrix X, which is a standard property of the matrix commutator and the center of the Lie algebra. This directly implies that the generator Ω = [A, diag(A)], the discrete update, the off-diagonal Frobenius Lyapunov function, and the Lipschitz bound L_C are all functions of the trace-free part C_e alone. No parameters are fitted to data and then relabeled as predictions, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled in. The input-to-state stability statement is likewise a direct consequence of the same commutator cancellation rather than a self-referential construction. The paper is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the algebraic fact that multiples of the identity lie in the center of the matrix algebra and therefore vanish under the skew-symmetric commutator; no free parameters are introduced beyond the derived Lipschitz constant L_C of the trace-free part.

axioms (2)
  • domain assumption σ² I lies in the center of the matrix algebra and produces zero under the skew-symmetric commutator [A, diag(A)]
    Invoked to obtain σ²-invariance of the trajectory and step-size bound
  • domain assumption The discrete flow admits a strict-saddle geometry and satisfies a discrete Lojasiewicz inequality
    Used to prove global convergence

pith-pipeline@v0.9.0 · 5574 in / 1349 out tokens · 35925 ms · 2026-05-15T22:07:23.635687+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Yuan, J., Xie, F., Nie, F., and Li, X

    URL https://openreview.net/forum? id=RURIyF9Vuu. Yuan, J., Xie, F., Nie, F., and Li, X. Riemannian optimiza- tion on relaxed indicator matrix manifold. InThe Four- teenth International Conference on Learning Represen- tations, 2026. URL https://openreview.net/ forum?id=ERJd7dMN6U. Zhang, Y ., Congliang, C., Li, Z., Ding, T., Wu, C., Bei, Y ., and Wang, Z....

  2. [2]

    The entrywise condition follows fromΩ ij = (Ajj −A ii)Aij

    SinceMis invertible, this holds iffΩ(M) = 0. The entrywise condition follows fromΩ ij = (Ajj −A ii)Aij. Lemma C.9(Critical Point Dichotomy).At any critical pointMoff: (a) Global Minimum:If off(A) = 0 , then f(M) = 0 and M maps eigenvectors of Ce to the standard basis (up to permutation and sign). (b)Degenerate Block:Ifoff(A)̸= 0, then there exist indicesi...

  3. [3]

    Our flow uses D= diag(A(M)) , which varies with state

    assumes ˙X= [X,[X, N]] withfixed N. Our flow uses D= diag(A(M)) , which varies with state. This prevents direct application of Brockett’s Morse–Bott framework. However, our convergence proof (Theorem C.24) relies only on: (i) Łojasiewicz inequality for real-analytic gradient flows (convergence to single critical point); (ii) Strict saddle property (negati...

  4. [4]

    computational waste

    and taking conditional expectation: E[fk+1 | F k]≤f k −η k∥Ωk∥2 F + LCη2 k 2 (∥Ωk∥2 F +σ 2 u).(81) Whenη k is small enough that1−L Cηk/2>0, applying the local PL condition∥Ω k∥2 F ≥2δ 2fk: E[fk+1 | F k]≤(1−2δ 2ηk)fk + LCη2 kσ2 u 2 . Step 2 (Recursive solution):Substitute ηk =c/(k+k 0) and use mathematical induction to prove E[fk]≤C/(k+k 0). The key is cho...

  5. [5]

    Euclidean SGD with Riemannian projection

    requires m≥ 2∥C∥2 F n2ϵ2ζ ≥ 2σ4 nϵ2ζ (120) MVP queries per iteration to achieve|ˆτ−τ| ≤ϵwith probability≥1−ζ. (iii) Algebraic filtering (commutator):The commutator generator ΩC(M) = [M ⊤CM,diag(M ⊤CM)] satisfies ΩC+αI = ΩC, achieving exactσ 2-elimination with0additional MVP cost. Proof. Part (i):By Lemmas F.16–F.17 below, non-filtering baselines exhibit s...

  6. [6]

    global phase

    (146) G.2.3. LIPSCHITZBOUND ONΩ(M) Step 1: Lipschitz bound onA: A(M)−A(N) =M ⊤Ce(M−N) + (M ⊤ −N ⊤)CeN.(147) 56 Discrete Double-Bracket Flows Using∥M∥ 2 =∥N∥ 2 = 1: ∥A(M)−A(N)∥ F ≤2∥C e∥2∥M−N∥ F . Step 2: Lipschitz bound onD: ∥D(M)−D(N)∥ F ≤ ∥A(M)−A(N)∥ F ≤2∥C e∥2∥M−N∥ F . Step 3: Lipschitz bound onΩ: Ω(M)−Ω(N) = [A(M)−A(N), D(M)] + [A(N), D(M)−D(N)].(148)...

  7. [7]

    FixC sig,∆,M 0 with deterministic seed (seed=42)

  8. [8]

    For eachσ 2 ∈ {0,10 3,10 6}: run Cayley method withuse trace free=True

  9. [9]

    Compute pairwise trajectory differencesmax k ∥M(σ1) k −M (σ2) k ∥F

  10. [10]

    61 Discrete Double-Bracket Flows Table 3.E1: Trajectory differences for Cayley method (n= 20,K= 5000)

    Run baselines (Orthogonal Iteration, Oja) for comparison. 61 Discrete Double-Bracket Flows Table 3.E1: Trajectory differences for Cayley method (n= 20,K= 5000). Pair(σ 2 1 vsσ 2 2)Max∥∆M k∥F Mean∥∆M k∥F 0vs10 3 2.1×10 −12 1.8×10 −12 0vs10 6 3.5×10 −12 2.9×10 −12 103 vs10 6 2.8×10 −12 2.4×10 −12 /uni00000013/uni00000014/uni00000013/uni00000013/uni00000013/...

  11. [11]

    Run each method to convergence (f <10 −6) or untilK max = 30000

  12. [12]

    Record iterations to convergence and convergence success rate

  13. [13]

    Results.Table 5 presents the mean iterations to convergence

    Repeat with 5 random seeds. Results.Table 5 presents the mean iterations to convergence. Table 5.E14: Iterations to convergence vsσ 2 (n= 10, 5 seeds). σ2 Cayley Riem-Polar Riem-QR tf-Oja Raw Oja 0 1594 1594 1600 1880 9372 1 1594 1594 1600 1880 11317 5 1594 1594 1600 1880 20939 10 1594 1594 1600 1880 FAIL 100 1594 1594 1600 1880 FAIL 1000 1594 1594 1600 1...

  14. [14]

    For each(ε E, σ2)pair withε E ∈ {0.1,0.2,0.5,1.0,2.0}andσ 2 ∈ {0,10 3,10 6}:

  15. [15]

    trace-free noiseE k at each step

    RunK= 5000iterations with i.i.d. trace-free noiseE k at each step

  16. [16]

    Compute steady-state estimatelim sup p f(M k)from last 10% of trajectory

  17. [17]

    Results.Table 6 shows linear regression fits forlim sup √fvsε E

    Repeat with 10 seeds per configuration. Results.Table 6 shows linear regression fits forlim sup √fvsε E. Table 6.E3: Linear regression of steady-state error vs noise amplitude (n= 20). σ2 Slope InterceptR 2 0 0.206−0.0010.998 103 0.207−0.0020.997 106 0.205−0.0010.998 Conclusion:The slopes are nearly identical across σ2 values (coefficient of variation <1%...

  18. [18]

    Run deterministic Cayley flow (no noise) with fixed step size

  19. [19]

    Count non-monotone events:#{k:f(M k+1)> f(M k)(1 + 10−6)}. 65 Discrete Double-Bracket Flows /uni00000014/uni00000013/uni00000016 /uni00000014/uni00000013/uni00000017 /uni0000002c/uni00000057/uni00000048/uni00000055/uni00000044/uni00000057/uni0000004c/uni00000052/uni00000051/uni00000003k /uni00000014/uni00000013/uni00000016 /uni00000014/uni00000013/uni0000...

  20. [20]

    Results.Withn= 20and∥C e∥2 ≈0.95, theoreticalη max ≈0.03

    Compute monotone fraction across 10 seeds. Results.Withn= 20and∥C e∥2 ≈0.95, theoreticalη max ≈0.03. Table 7.E5: Monotonicity vs step size (n= 20,K= 1000, 10 seeds). ηMonotone Fraction Mean Non-Monotone Count 0.001 1.00 0 0.005 1.00 0 0.01 1.00 0 0.02 1.00 0 0.03 0.90 2.3 0.05 0.00 47.8 Conclusion:Sharp phase transition nearη= 0.03matches theoretical pred...

  21. [21]

    66 Discrete Double-Bracket Flows

    SampleM 0 ∼Haar(SO(n))for 100 independent seeds. 66 Discrete Double-Bracket Flows

  22. [22]

    Run Cayley iteration untilf(M k)<10 −8 orK= 10000

  23. [23]

    Record convergence rate and iteration count distribution. Results (n= 10,K= 5000, 50 seeds): •Convergence rate:100% (50/50 trajectories reachedf <10 −8) •Mean iterations:2847±412 •Max finalf:3.2×10 −10 Conclusion:No trajectory was trapped at a saddle point, consistent with the measure-zero stable manifold of strict saddles. H.4. Domain Stability H.4.1. NO...

  24. [24]

    Initialize near eigenbasis:M 0 =Q true ·Cay(εK)with small skew-symmetricK, soδ(M 0)≈g

  25. [25]

    For noise levelsε E ∈ {0.01,0.02,0.05,0.1,0.2,0.5}:

  26. [26]

    RunK= 5000iterations with bounded trace-free noise

  27. [27]

    Table 8.E9: Escape probability vs noise level (20 seeds per level)

    Trackδ(M k)and detect escape events (δ < g/4). Table 8.E9: Escape probability vs noise level (20 seeds per level). Noiseε E P(escape) Meanδ min Mean finalδ 0.01 0.000.0996±0.0001 0.0999±0.0001 0.02 0.000.0991±0.0002 0.0999±0.0001 0.05 0.000.0982±0.0005 0.0998±0.0002 0.1 0.050.0721±0.0183 0.0987±0.0051 0.2 0.350.0412±0.0287 0.0891±0.0234 0.5 0.950.0089±0.0...

  28. [28]

    SampleM 0 ∼Haar(SO(n))for 20 seeds

  29. [29]

    Track first iterationT enter wheref(M k)≤f enter = (g−δ )2/8

  30. [30]

    signal vanishing

    Compare to theoretical bound. 67 Discrete Double-Bracket Flows /uni00000013/uni00000014/uni00000013/uni00000015/uni00000013/uni00000016/uni00000013/uni00000017/uni00000013/uni00000018/uni00000013 /uni0000002c/uni00000057/uni00000048/uni00000055/uni00000044/uni00000057/uni0000004c/uni00000052/uni00000051/uni00000003k /uni00000013/uni00000011/uni00000013/un...

  31. [31]

    Fixn= 20,g= 1, convergence thresholdf(M k)<10 −6

  32. [32]

    Sweepσ 2 ∈ {0,1,10,100,500,1000}

  33. [33]

    Measure wall-clock time and iteration count for each method

  34. [34]

    FAIL” if convergence not reached within10 6 iterations. Table 11.Wall-clock time (seconds) across methods. “—

    Declare “FAIL” if convergence not reached within10 6 iterations. Table 11.Wall-clock time (seconds) across methods. “—” indicates failure (>10 6 iterations). Methodσ 2 = 0 1 10 100 500 1000 Cayley-Neumann 2.3 2.3 2.3 2.3 2.3 2.3 Riem-QR 2.4 2.4 2.4 2.4 2.4 2.4 Riem-Polar 2.5 2.5 2.5 2.5 2.5 2.5 Raw Oja 0.18 0.52 — — — — Subspace Iter 0.21 0.89 4.7 — — — E...

  35. [35]

    SampleM 0 ∼Haar(SO(n))forn∈ {5,10,20,50}

  36. [36]

    Computeδ(M 0)/gwhereg= 1is the true spectral gap

  37. [37]

    Table 14.Haar initialization:δ(M 0)/gstatistics

    Repeat 1000 times per dimension; estimate mean and 5th percentile. Table 14.Haar initialization:δ(M 0)/gstatistics. nE[δ(M 0)/g]5th Percentile 5 0.031 0.008 10 0.016 0.004 20 0.008 0.002 50 0.003 0.0007 Results. Interpretation.The ratio E[δ(M0)/g]≈O(1/n) confirms that random initialization places M0 far from eigenbasis (small δ). Combined with E11 (100% g...

  38. [38]

    Forn∈ {5,10,20}, sample 1000 pairs of nearby matricesM, M ′ ∈SO(n)with∥M−M ′∥F =ε= 10 −4

  39. [39]

    EstimateL est = max∥∇f(M)− ∇f(M ′)∥F /∥M−M ′∥F

  40. [40]

    Table 15.Lipschitz constant: empirical vs

    Compare to theoretical boundL bound =c n∥Ce∥2 2. Table 15.Lipschitz constant: empirical vs. theoretical bound. n L est Lbound Ratio 5 18.3 494 0.037 10 42.7 1421 0.030 20 127 3782 0.034 Results. Interpretation.The theoretical bound is conservative (ratio ≈3% ), but crucially it remainsvalidacross all tested dimensions. The conservativeness arises from wor...

  41. [41]

    SampleCfrom symmetric distribution (Wishart) andM∼Haar(SO(n))

  42. [42]

    Compute Oja directionG Oja =CMand commutator directionG Ω =MΩwhereΩ = [A, D]

  43. [43]

    Computecosθ=⟨G Oja, GΩ⟩F /(∥GOja∥F ∥GΩ∥F )

  44. [44]

    better Oja

    Repeat 10000 times; estimateP(cosθ <0). Table 16.Direction alignment:P(cosθ <0)across dimensions. n P(cosθ <0)95% CI 5 0.498[0.488,0.508] 10 0.501[0.491,0.511] 20 0.499[0.489,0.509] Results. Interpretation.The probability is statistically indistinguishable from 0.5 (p >0.8 for all n), confirming that the commutator is not a “better Oja” but a fundamentall...