arxiv: 2602.13759 · v2 · submitted 2026-02-14 · 💻 cs.LG · cs.NA· math.NA· math.OC

Recognition: 2 theorem links

· Lean Theorem

Discrete Double-Bracket Flows for Isotropic-Noise Invariant Eigendecomposition

Zhiming Li , JiaHe Feng

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:07 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAmath.OC

keywords eigendecompositiondouble-bracket flowisotropic noiseSO(n)invariancestreaming algorithmStiefel manifoldLyapunov function

0 comments

The pith

A discrete double-bracket flow on SO(n) performs eigendecomposition whose trajectory and step size depend only on the trace-free signal.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a discrete flow that extracts eigenvectors from streaming matrices containing arbitrarily large isotropic noise of the form sigma squared times the identity. The construction uses a skew-symmetric generator in the Lie algebra so(n) so that any multiple of the identity is annihilated by antisymmetry and never enters the dynamics. As a result the entire trajectory, the Lyapunov function that certifies progress, and the largest stable step size are functions solely of the trace-free part of the observed matrix. The method proves input-to-state stability whose noise ball is governed only by the trace-free error and establishes global convergence to eigenvectors via a discrete strict-saddle argument.

Core claim

Under the streaming model C_k equals C_sig plus sigma_k squared I plus E_k, the discrete double-bracket iteration produces a trajectory on SO(n) whose evolution, associated Lyapunov function, and maximal stable step size eta_max equals 1 over L_C are determined exclusively by the trace-free signal C_e. The same generator yields input-to-state stability with a ball controlled solely by trace-free perturbations and guarantees global convergence to the eigendecomposition by strict-saddle geometry together with a discrete Lojasiewicz inequality.

What carries the argument

The skew-symmetric generator Omega equals [A, diag(A)] in the tangent Lie algebra so(n), whose antisymmetry forces every scalar multiple of the identity to vanish identically.

If this is right

Maximal stable step size and contraction rates stay fixed as the isotropic noise floor rises.
Input-to-state stability ball size is controlled only by the trace-free component E_k.
Global convergence to eigenvectors holds for any time-varying but isotropic noise level.
The same generator extends directly to top-k tracking on the Stiefel manifold at linear cost in k.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The invariance could let the algorithm run in sensor streams where a large common-mode offset varies over time without retuning.
Similar cancellations might be engineered on other matrix Lie groups whose center contains the identity.
Numerical checks with deliberately time-varying sigma squared would confirm that the discrete path itself never feels the isotropic level.

Load-bearing premise

The perturbation must be exactly a scalar multiple of the identity plus a small trace-free matrix so the algebraic cancellation in the generator remains exact.

What would settle it

Run the iteration on two observation sequences that share the same trace-free signal but differ by a constant isotropic offset and check whether the computed eigenvectors or the sequence of step sizes remain identical.

Figures

Figures reproduced from arXiv: 2602.13759 by JiaHe Feng, Zhiming Li.

**Figure 1.** Figure 1: Iterations to convergence vs. σ 2 (n = 10, 5 seeds). Left: All methods. Commutator methods remain flat at ≈ 1600 iterations; Raw Oja fails for σ 2 ≥ 10. Right: Commutator only, confirming O(1) complexity. 6.4. Minimax Lower Bound We formalize the iteration complexity gap as a minimax statement over the class of non-filtering algorithms. Definition 6.9 (Non-Filtering Algorithm Class). An algorithm A belong… view at source ↗

**Figure 2.** Figure 2: provides diagnostic evidence for the theoretical mechanisms: (a) Direction Independence. Oja’s direction CM and commutator M[A, D] are essentially orthogonal: P(cos θ < 0) = 0.498 ± 0.016. The commutator is not a “better Oja” but a fundamentally different descent direction that happens to be σ-invariant. (b) Baseline Signal Vanishing. SI contraction ratio ρ = (λ2 + σ 2 )/(λ1 + σ 2 ) matches theory: ρ = 0.… view at source ↗

**Figure 3.** Figure 3: E1: Convergence trajectories for σ 2 ∈ {0, 103 , 106 }. All three curves overlap exactly (within machine precision), demonstrating pathwise σ 2 -invariance. Results [PITH_FULL_IMAGE:figures/full_fig_p062_3.png] view at source ↗

**Figure 4.** Figure 4: E14: Iterations to convergence vs σ 2 . Commutator methods achieve O(1) complexity (flat line); Raw Oja exhibits O(σ 2 ) degradation and fails for σ 2 ≥ 10. H.3. Convergence Analysis H.3.1. ISS STEADY-STATE ERROR BALL (E3) Objective. Verify Theorem 4.3: steady-state error ball radius scales linearly with trace-free noise amplitude εE, independent of σ 2 . Protocol. 1. For each (εE, σ2 ) pair with εE ∈ {0.… view at source ↗

**Figure 5.** Figure 5: E3: ISS steady-state error scales linearly with noise amplitude εE, independent of σ 2 . All three σ 2 curves overlap with R 2 > 0.93. Protocol. Two variants distinguish dynamics-level vs sampling-level σ 2 dependence: • Variant A (Matrix noise): Ck = Csig + σ 2 I + Ek with fixed Ek distribution. • Variant B (Sample covariance): Ck is sample covariance from xi ∼ N (0, Csig + σ 2 I). Use c = 0.5, k0 = 100, … view at source ↗

**Figure 6.** Figure 6: E4: Log-log convergence plot. The slope matches the theoretical O(1/k) rate (dashed line). 3. Compute monotone fraction across 10 seeds. Results. With n = 20 and ∥Ce∥2 ≈ 0.95, theoretical ηmax ≈ 0.03 [PITH_FULL_IMAGE:figures/full_fig_p066_6.png] view at source ↗

**Figure 7.** Figure 7: E9: Spectral separation δ(Mk) trajectories for different noise levels. Low noise (εE ≤ 0.05) stays above the escape threshold; high noise crosses it. Results (n = 5, K = 1000, 10 seeds): • Entry success rate: 100% • Mean Tenter: 247 ± 89 iterations • Theoretical upper bound: O(n · poly(log(1/δ))) H.5. Baseline Failure Analysis H.5.1. COUNTEREXAMPLE VERIFICATION We verify that baseline methods fail both (P1… view at source ↗

**Figure 8.** Figure 8: E6: Normalized initial spectral separation E[δ(M0)]/g across eigenvalue gaps. The ratio is nearly constant (≈ 0.016), confirming linear scaling. Protocol. 1. For n ∈ {5, 10, 20}, sample 1000 pairs of nearby matrices M, M′ ∈ SO(n) with ∥M − M′∥F = ε = 10−4 . 2. Estimate Lest = max ∥∇f(M) − ∇f(M′ )∥F /∥M − M′∥F . 3. Compare to theoretical bound Lbound = cn∥Ce∥ 2 2 [PITH_FULL_IMAGE:figures/full_fig_p072_8.png] view at source ↗

**Figure 9.** Figure 9: E8: Distribution of cos θ between Oja and commutator directions. The symmetric distribution centered at 0 confirms P(cos θ < 0) ≈ 0.5. H.7. MVP Cost Analysis: TF-Oja + Hutchinson A reviewer raised an important concern: the TF-Oja baseline “presumes access to trace per-iteration; this is not matrix-free unless one pays extra oracle cost.” We address this by comparing three approaches: 73 [PITH_FULL_IMAGE:f… view at source ↗

read the original abstract

We study eigendecomposition on $SO(n)$ under streaming observations $C_k = C_{\mathrm{sig}} + \sigma_k^2 I + E_k$, where the isotropic background $\sigma_k^2 I$ may be time-varying and arbitrarily large. Standard algorithms couple their stability to $\lVert C_k \rVert_2 \approx \sigma^2$, forcing step sizes, contraction rates, and iteration counts to degrade with the noise floor. We observe that $\sigma^2 I$ lies in the center of the matrix algebra and therefore *should never enter* the eigenspace dynamics. We construct a discrete double-bracket flow whose skew-symmetric generator $\Omega = [A, \operatorname{diag}(A)]$ operates in the tangent Lie algebra $\mathfrak{so}(n)$, where scalar multiples of the identity vanish by antisymmetry. The resulting trajectory, Lyapunov function, and maximal stable step size $\eta_{\max} = 1/L_C$ depend exclusively on the trace-free signal $C_e$ -- achieving pointwise, pathwise $\sigma^2$-invariance. We establish input-to-state stability with a noise ball governed solely by trace-free perturbations, prove global convergence via strict-saddle geometry and a discrete {\L}ojasiewicz argument, and extend the framework to top-$k$ eigentracking on the Stiefel manifold $\operatorname{St}(k,n)$ at cost $k$ matrix-vector products per step.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core move is a discrete double-bracket flow on SO(n) whose skew-symmetric generator cancels isotropic noise exactly, so dynamics and step sizes depend only on the trace-free signal.

read the letter

The main takeaway is that this discrete flow achieves pointwise and pathwise invariance to additive isotropic noise. By choosing the generator Ω = [A, diag(A)] with A = Q^T C_k Q, the commutator kills any multiple of the identity because scalar matrices commute with everything. The trajectory, the off-diagonal Frobenius Lyapunov function, and the Lipschitz bound L_C used for η_max = 1/L_C therefore depend only on the trace-free part C_e. This is a direct algebraic consequence rather than an approximation or fitted trick.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a discrete double-bracket flow on SO(n) for eigendecomposition under streaming covariances C_k = C_sig + σ_k² I + E_k. The skew-symmetric generator is defined as Ω = [A, diag(A)] with A = Q^T C_k Q; because scalar matrices commute to zero under the Lie bracket, the resulting trajectory, off-diagonal Frobenius Lyapunov function, and maximal stable step size η_max = 1/L_C depend only on the trace-free component C_e. The authors assert global convergence from strict-saddle geometry together with a discrete Łojasiewicz inequality, input-to-state stability with respect to trace-free perturbations alone, and an extension to top-k tracking on the Stiefel manifold St(k,n).

Significance. If the algebraic invariance and the accompanying stability and convergence arguments hold, the result supplies a principled way to perform streaming eigendecomposition whose iteration complexity and step-size selection are independent of the isotropic noise floor. This property is directly useful for online PCA and subspace tracking in high-noise regimes common in machine learning. The construction is parameter-free once the trace-free part is isolated, and the discrete formulation with an explicit Lipschitz-based step-size bound is a practical asset.

major comments (2)

[discrete update rule and discrete Łojasiewicz section] The central invariance claim rests on the identity [σ² I, ·] ≡ 0. The manuscript must verify that this cancellation survives the discrete update exactly (i.e., that the one-step map on SO(n) remains a function of C_e alone) and that the discrete Łojasiewicz inequality inherits the same independence; otherwise the claimed pathwise σ²-invariance is only approximate.
[input-to-state stability theorem] Input-to-state stability is stated with a noise ball governed solely by the trace-free part of E_k. The explicit radius of this ball and the associated ISS gain must be derived and shown to be independent of σ_k²; the current sketch leaves the quantitative bound implicit.

minor comments (2)

[introduction] The notation C_e for the trace-free signal should be introduced with an explicit definition (C_e := C_k - (tr(C_k)/n) I) at first use rather than only in the abstract.
[Stiefel extension paragraph] The extension to the Stiefel manifold is stated to cost k matrix-vector products per step; a short complexity comparison with existing top-k methods would help readers assess the practical overhead.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation for minor revision. The two major comments concern the exactness of the invariance under discretization and the explicit quantitative bounds in the ISS result. We address each point below and will incorporate the requested clarifications.

read point-by-point responses

Referee: [discrete update rule and discrete Łojasiewicz section] The central invariance claim rests on the identity [σ² I, ·] ≡ 0. The manuscript must verify that this cancellation survives the discrete update exactly (i.e., that the one-step map on SO(n) remains a function of C_e alone) and that the discrete Łojasiewicz inequality inherits the same independence; otherwise the claimed pathwise σ²-invariance is only approximate.

Authors: We agree that an explicit verification is useful. Because Ω_k = [A_k, diag(A_k)] with A_k = Q_k^T C_k Q_k and [σ_k² I, M] = 0 for any M, the generator Ω_k is identical to the generator computed from the trace-free part C_e alone. Consequently the one-step map Q_{k+1} = Q_k exp(η Ω_k) depends only on C_e. The discrete Lyapunov function is defined directly on the trace-free component, V(Q) = ||Q^T C_e Q - diag(Q^T C_e Q)||_F², and the discrete Łojasiewicz inequality is proved using the Lipschitz constant L_C of the map induced by C_e; both the inequality and its constants are therefore independent of σ_k². We will insert a short paragraph after the definition of the discrete flow that records this exact cancellation and the resulting independence of the Łojasiewicz constants. revision: yes
Referee: [input-to-state stability theorem] Input-to-state stability is stated with a noise ball governed solely by the trace-free part of E_k. The explicit radius of this ball and the associated ISS gain must be derived and shown to be independent of σ_k²; the current sketch leaves the quantitative bound implicit.

Authors: We accept the request for explicit constants. The ISS statement is with respect to the trace-free perturbation sequence {E_k}. Because both the vector field and the Lyapunov function are insensitive to the isotropic term, the radius of the ultimate bound is of the form r = γ · sup_k ||E_k||_F where the gain γ depends only on the step-size η and the Lipschitz constant L_C of the trace-free map; it is independent of σ_k². We will add the explicit expression for γ (γ = η / (1 - η L_C)) together with the corresponding ball radius in the revised ISS theorem statement and proof. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation chain relies on the algebraic identity [σ²I, X] ≡ 0 for any matrix X, which is a standard property of the matrix commutator and the center of the Lie algebra. This directly implies that the generator Ω = [A, diag(A)], the discrete update, the off-diagonal Frobenius Lyapunov function, and the Lipschitz bound L_C are all functions of the trace-free part C_e alone. No parameters are fitted to data and then relabeled as predictions, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled in. The input-to-state stability statement is likewise a direct consequence of the same commutator cancellation rather than a self-referential construction. The paper is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the algebraic fact that multiples of the identity lie in the center of the matrix algebra and therefore vanish under the skew-symmetric commutator; no free parameters are introduced beyond the derived Lipschitz constant L_C of the trace-free part.

axioms (2)

domain assumption σ² I lies in the center of the matrix algebra and produces zero under the skew-symmetric commutator [A, diag(A)]
Invoked to obtain σ²-invariance of the trajectory and step-size bound
domain assumption The discrete flow admits a strict-saddle geometry and satisfies a discrete Lojasiewicz inequality
Used to prove global convergence

pith-pipeline@v0.9.0 · 5574 in / 1349 out tokens · 35925 ms · 2026-05-15T22:07:23.635687+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

Ω(M):=[A(M),D(M)]=AD−DA ... [αI,·]≡0 ... Ω_{C+αI}(M)=Ω_C(M) ... trajectory {Mk} ... depends exclusively on the trace-free signal Ce
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean Translation Theorem / J-uniqueness corollary echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

any generator achieving such invariance must align with the commutator Ω=[A,diag(A)] ... dissipation channel condition

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Yuan, J., Xie, F., Nie, F., and Li, X

URL https://openreview.net/forum? id=RURIyF9Vuu. Yuan, J., Xie, F., Nie, F., and Li, X. Riemannian optimiza- tion on relaxed indicator matrix manifold. InThe Four- teenth International Conference on Learning Represen- tations, 2026. URL https://openreview.net/ forum?id=ERJd7dMN6U. Zhang, Y ., Congliang, C., Li, Z., Ding, T., Wu, C., Bei, Y ., and Wang, Z....

work page arXiv 2026
[2]

The entrywise condition follows fromΩ ij = (Ajj −A ii)Aij

SinceMis invertible, this holds iffΩ(M) = 0. The entrywise condition follows fromΩ ij = (Ajj −A ii)Aij. Lemma C.9(Critical Point Dichotomy).At any critical pointMoff: (a) Global Minimum:If off(A) = 0 , then f(M) = 0 and M maps eigenvectors of Ce to the standard basis (up to permutation and sign). (b)Degenerate Block:Ifoff(A)̸= 0, then there exist indicesi...

work page 1991
[3]

Our flow uses D= diag(A(M)) , which varies with state

assumes ˙X= [X,[X, N]] withfixed N. Our flow uses D= diag(A(M)) , which varies with state. This prevents direct application of Brockett’s Morse–Bott framework. However, our convergence proof (Theorem C.24) relies only on: (i) Łojasiewicz inequality for real-analytic gradient flows (convergence to single critical point); (ii) Strict saddle property (negati...

work page 1991
[4]

computational waste

and taking conditional expectation: E[fk+1 | F k]≤f k −η k∥Ωk∥2 F + LCη2 k 2 (∥Ωk∥2 F +σ 2 u).(81) Whenη k is small enough that1−L Cηk/2>0, applying the local PL condition∥Ω k∥2 F ≥2δ 2fk: E[fk+1 | F k]≤(1−2δ 2ηk)fk + LCη2 kσ2 u 2 . Step 2 (Recursive solution):Substitute ηk =c/(k+k 0) and use mathematical induction to prove E[fk]≤C/(k+k 0). The key is cho...

work page 2018
[5]

Euclidean SGD with Riemannian projection

requires m≥ 2∥C∥2 F n2ϵ2ζ ≥ 2σ4 nϵ2ζ (120) MVP queries per iteration to achieve|ˆτ−τ| ≤ϵwith probability≥1−ζ. (iii) Algebraic filtering (commutator):The commutator generator ΩC(M) = [M ⊤CM,diag(M ⊤CM)] satisfies ΩC+αI = ΩC, achieving exactσ 2-elimination with0additional MVP cost. Proof. Part (i):By Lemmas F.16–F.17 below, non-filtering baselines exhibit s...

work page 2013
[6]

global phase

(146) G.2.3. LIPSCHITZBOUND ONΩ(M) Step 1: Lipschitz bound onA: A(M)−A(N) =M ⊤Ce(M−N) + (M ⊤ −N ⊤)CeN.(147) 56 Discrete Double-Bracket Flows Using∥M∥ 2 =∥N∥ 2 = 1: ∥A(M)−A(N)∥ F ≤2∥C e∥2∥M−N∥ F . Step 2: Lipschitz bound onD: ∥D(M)−D(N)∥ F ≤ ∥A(M)−A(N)∥ F ≤2∥C e∥2∥M−N∥ F . Step 3: Lipschitz bound onΩ: Ω(M)−Ω(N) = [A(M)−A(N), D(M)] + [A(N), D(M)−D(N)].(148)...

work page 2018
[7]

FixC sig,∆,M 0 with deterministic seed (seed=42)

work page
[8]

For eachσ 2 ∈ {0,10 3,10 6}: run Cayley method withuse trace free=True

work page
[9]

Compute pairwise trajectory differencesmax k ∥M(σ1) k −M (σ2) k ∥F

work page
[10]

61 Discrete Double-Bracket Flows Table 3.E1: Trajectory differences for Cayley method (n= 20,K= 5000)

Run baselines (Orthogonal Iteration, Oja) for comparison. 61 Discrete Double-Bracket Flows Table 3.E1: Trajectory differences for Cayley method (n= 20,K= 5000). Pair(σ 2 1 vsσ 2 2)Max∥∆M k∥F Mean∥∆M k∥F 0vs10 3 2.1×10 −12 1.8×10 −12 0vs10 6 3.5×10 −12 2.9×10 −12 103 vs10 6 2.8×10 −12 2.4×10 −12 /uni00000013/uni00000014/uni00000013/uni00000013/uni00000013/...

work page
[11]

Run each method to convergence (f <10 −6) or untilK max = 30000

work page
[12]

Record iterations to convergence and convergence success rate

work page
[13]

Results.Table 5 presents the mean iterations to convergence

Repeat with 5 random seeds. Results.Table 5 presents the mean iterations to convergence. Table 5.E14: Iterations to convergence vsσ 2 (n= 10, 5 seeds). σ2 Cayley Riem-Polar Riem-QR tf-Oja Raw Oja 0 1594 1594 1600 1880 9372 1 1594 1594 1600 1880 11317 5 1594 1594 1600 1880 20939 10 1594 1594 1600 1880 FAIL 100 1594 1594 1600 1880 FAIL 1000 1594 1594 1600 1...

work page
[14]

For each(ε E, σ2)pair withε E ∈ {0.1,0.2,0.5,1.0,2.0}andσ 2 ∈ {0,10 3,10 6}:

work page
[15]

trace-free noiseE k at each step

RunK= 5000iterations with i.i.d. trace-free noiseE k at each step

work page
[16]

Compute steady-state estimatelim sup p f(M k)from last 10% of trajectory

work page
[17]

Results.Table 6 shows linear regression fits forlim sup √fvsε E

Repeat with 10 seeds per configuration. Results.Table 6 shows linear regression fits forlim sup √fvsε E. Table 6.E3: Linear regression of steady-state error vs noise amplitude (n= 20). σ2 Slope InterceptR 2 0 0.206−0.0010.998 103 0.207−0.0020.997 106 0.205−0.0010.998 Conclusion:The slopes are nearly identical across σ2 values (coefficient of variation <1%...

work page
[18]

Run deterministic Cayley flow (no noise) with fixed step size

work page
[19]

Count non-monotone events:#{k:f(M k+1)> f(M k)(1 + 10−6)}. 65 Discrete Double-Bracket Flows /uni00000014/uni00000013/uni00000016 /uni00000014/uni00000013/uni00000017 /uni0000002c/uni00000057/uni00000048/uni00000055/uni00000044/uni00000057/uni0000004c/uni00000052/uni00000051/uni00000003k /uni00000014/uni00000013/uni00000016 /uni00000014/uni00000013/uni0000...

work page
[20]

Results.Withn= 20and∥C e∥2 ≈0.95, theoreticalη max ≈0.03

Compute monotone fraction across 10 seeds. Results.Withn= 20and∥C e∥2 ≈0.95, theoreticalη max ≈0.03. Table 7.E5: Monotonicity vs step size (n= 20,K= 1000, 10 seeds). ηMonotone Fraction Mean Non-Monotone Count 0.001 1.00 0 0.005 1.00 0 0.01 1.00 0 0.02 1.00 0 0.03 0.90 2.3 0.05 0.00 47.8 Conclusion:Sharp phase transition nearη= 0.03matches theoretical pred...

work page
[21]

66 Discrete Double-Bracket Flows

SampleM 0 ∼Haar(SO(n))for 100 independent seeds. 66 Discrete Double-Bracket Flows

work page
[22]

Run Cayley iteration untilf(M k)<10 −8 orK= 10000

work page
[23]

Record convergence rate and iteration count distribution. Results (n= 10,K= 5000, 50 seeds): •Convergence rate:100% (50/50 trajectories reachedf <10 −8) •Mean iterations:2847±412 •Max finalf:3.2×10 −10 Conclusion:No trajectory was trapped at a saddle point, consistent with the measure-zero stable manifold of strict saddles. H.4. Domain Stability H.4.1. NO...

work page
[24]

Initialize near eigenbasis:M 0 =Q true ·Cay(εK)with small skew-symmetricK, soδ(M 0)≈g

work page
[25]

For noise levelsε E ∈ {0.01,0.02,0.05,0.1,0.2,0.5}:

work page
[26]

RunK= 5000iterations with bounded trace-free noise

work page
[27]

Table 8.E9: Escape probability vs noise level (20 seeds per level)

Trackδ(M k)and detect escape events (δ < g/4). Table 8.E9: Escape probability vs noise level (20 seeds per level). Noiseε E P(escape) Meanδ min Mean finalδ 0.01 0.000.0996±0.0001 0.0999±0.0001 0.02 0.000.0991±0.0002 0.0999±0.0001 0.05 0.000.0982±0.0005 0.0998±0.0002 0.1 0.050.0721±0.0183 0.0987±0.0051 0.2 0.350.0412±0.0287 0.0891±0.0234 0.5 0.950.0089±0.0...

work page
[28]

SampleM 0 ∼Haar(SO(n))for 20 seeds

work page
[29]

Track first iterationT enter wheref(M k)≤f enter = (g−δ )2/8

work page
[30]

signal vanishing

Compare to theoretical bound. 67 Discrete Double-Bracket Flows /uni00000013/uni00000014/uni00000013/uni00000015/uni00000013/uni00000016/uni00000013/uni00000017/uni00000013/uni00000018/uni00000013 /uni0000002c/uni00000057/uni00000048/uni00000055/uni00000044/uni00000057/uni0000004c/uni00000052/uni00000051/uni00000003k /uni00000013/uni00000011/uni00000013/un...

work page
[31]

Fixn= 20,g= 1, convergence thresholdf(M k)<10 −6

work page
[32]

Sweepσ 2 ∈ {0,1,10,100,500,1000}

work page
[33]

Measure wall-clock time and iteration count for each method

work page
[34]

FAIL” if convergence not reached within10 6 iterations. Table 11.Wall-clock time (seconds) across methods. “—

Declare “FAIL” if convergence not reached within10 6 iterations. Table 11.Wall-clock time (seconds) across methods. “—” indicates failure (>10 6 iterations). Methodσ 2 = 0 1 10 100 500 1000 Cayley-Neumann 2.3 2.3 2.3 2.3 2.3 2.3 Riem-QR 2.4 2.4 2.4 2.4 2.4 2.4 Riem-Polar 2.5 2.5 2.5 2.5 2.5 2.5 Raw Oja 0.18 0.52 — — — — Subspace Iter 0.21 0.89 4.7 — — — E...

work page
[35]

SampleM 0 ∼Haar(SO(n))forn∈ {5,10,20,50}

work page
[36]

Computeδ(M 0)/gwhereg= 1is the true spectral gap

work page
[37]

Table 14.Haar initialization:δ(M 0)/gstatistics

Repeat 1000 times per dimension; estimate mean and 5th percentile. Table 14.Haar initialization:δ(M 0)/gstatistics. nE[δ(M 0)/g]5th Percentile 5 0.031 0.008 10 0.016 0.004 20 0.008 0.002 50 0.003 0.0007 Results. Interpretation.The ratio E[δ(M0)/g]≈O(1/n) confirms that random initialization places M0 far from eigenbasis (small δ). Combined with E11 (100% g...

work page
[38]

Forn∈ {5,10,20}, sample 1000 pairs of nearby matricesM, M ′ ∈SO(n)with∥M−M ′∥F =ε= 10 −4

work page
[39]

EstimateL est = max∥∇f(M)− ∇f(M ′)∥F /∥M−M ′∥F

work page
[40]

Table 15.Lipschitz constant: empirical vs

Compare to theoretical boundL bound =c n∥Ce∥2 2. Table 15.Lipschitz constant: empirical vs. theoretical bound. n L est Lbound Ratio 5 18.3 494 0.037 10 42.7 1421 0.030 20 127 3782 0.034 Results. Interpretation.The theoretical bound is conservative (ratio ≈3% ), but crucially it remainsvalidacross all tested dimensions. The conservativeness arises from wor...

work page
[41]

SampleCfrom symmetric distribution (Wishart) andM∼Haar(SO(n))

work page
[42]

Compute Oja directionG Oja =CMand commutator directionG Ω =MΩwhereΩ = [A, D]

work page
[43]

Computecosθ=⟨G Oja, GΩ⟩F /(∥GOja∥F ∥GΩ∥F )

work page
[44]

better Oja

Repeat 10000 times; estimateP(cosθ <0). Table 16.Direction alignment:P(cosθ <0)across dimensions. n P(cosθ <0)95% CI 5 0.498[0.488,0.508] 10 0.501[0.491,0.511] 20 0.499[0.489,0.509] Results. Interpretation.The probability is statistically indistinguishable from 0.5 (p >0.8 for all n), confirming that the commutator is not a “better Oja” but a fundamentall...

work page 2020