arxiv: 2605.02111 · v1 · submitted 2026-05-04 · 💻 cs.LG · math.DG

Recognition: unknown

Geometric and Spectral Alignment for Deep Neural Network II

Ziran Liu , Wei Wang , Jinhao Wang , Pengcheng Wang , Xinyi Sui , Cihan Ruan , Nam Ling , Wei Jiang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:55 UTC · model grok-4.3

classification 💻 cs.LG math.DG

keywords Geometric AlignmentSpectral AlignmentPhysical Alignment MatrixCertificate RadiusResidual Jacobian ChainsEffective Rank WindowsSingular Subspace TransportGibbs-Cartan Tail Model

0 comments

The pith

Given row groups and active supports, the Physical Alignment Matrix decomposes orthogonally into core, overlap, and noise, with a static certificate radius ensuring full and truncated transports match on key structures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes deterministic margin-verified bounds for transporting dominant singular subspaces across layers in deep neural networks using Cartan-coordinate rigidity and fitted effective-rank windows. It shows that the Physical Alignment Matrix decomposes orthogonally into core, overlap, and noise once row groups and active supports are fixed, and that gaps, margins, and noise bounds combine into a certificate radius guaranteeing identical active supports, incidence graphs, and masks between full and truncated transports. These results apply to residual Jacobian chains and distinguish source-mode incidence from physical channel incidence. A sympathetic reader would care because the bounds allow certification of geometric and spectral consistency across layers without post-hoc adjustments to the data.

Core claim

Starting from Cartan-coordinate rigidity and fitted effective-rank windows, dominant singular subspaces are transported across adjacent layers and displayed in physical channel coordinates. The error between full interface transport and its dominant-window truncation is bounded, with fitted-tail errors added so empirical spectra can be certified against the Gibbs-Cartan tail model. Given row groups and active supports, the Physical Alignment Matrix decomposes orthogonally as core plus overlap plus noise. Active-column gaps, pairwise overlap margins, and noise bounds combine into a static certificate radius under which the full transport and the truncated transport induce the same active s

What carries the argument

The Physical Alignment Matrix, which decomposes orthogonally into core, overlap, and noise components, together with the static certificate radius formed from active-column gaps, pairwise overlap margins, and noise bounds.

Load-bearing premise

Fitted effective-rank windows and the Gibbs-Cartan tail model provide accurate representations of empirical spectra so that the margin-verified certificates hold without post-hoc data selection affecting the central bounds.

What would settle it

Observe a pair of adjacent layers in a trained CNN or transformer where the computed certificate radius is satisfied yet the active supports, pairwise incidence graph, or core/overlap/noise masks differ between the full transport and the truncated transport.

Figures

Figures reproduced from arXiv: 2605.02111 by Cihan Ruan, Jinhao Wang, Nam Ling, Pengcheng Wang, Wei Jiang, Wei Wang, Xinyi Sui, Ziran Liu.

**Figure 1.** Figure 1: Physical-alignment measurement: representative four-panel interface measurement. The panels show the permuted output-realized transport matrix M = Wk+1U (R) k with physical output rows and source-mode columns, its block-energy summary Er(M), the scale-free angular transport matrix Ms = U (R) k+1(V (R) k+1) ⊤U (R) k , and its block-energy summary Er(Ms). The blockdominant pattern is consistent with the dec… view at source ↗

**Figure 2.** Figure 2: Physical-alignment measured quantity: representative block-energy matrices from the systematic alignmentM figure set. The panels are direct instances of Definition 6.1. Diagonal or block-dominant mass is the finite quantity controlled by Proposition 6.3; differences between Ms and M report how singular-value weighting and physical output realization change the scale-free angular organization; theorem-level… view at source ↗

**Figure 3.** Figure 3: Physical/rank-window bridge: the same Qwen3-8B interface is displayed under two energy-rank windows. The repeated block-energy structure is consistent with the spectral-tail window stability and static GSA stability theorems. (a) permuted Ms (b) Er(Ms) (c) permuted M (d) Er(M) view at source ↗

**Figure 4.** Figure 4: Physical-alignment measurement beyond Transformers: ResNet50 alignment measurement at layer 8 with the 25ER rank window. This figure tests Propositions 6.3, 6.6, and 6.18: the permuted matrices exhibit structured physical transport, while the Er heatmaps summarize the measured core/overlap/noise structure. 76 view at source ↗

**Figure 5.** Figure 5: Effective-rank-window measurement. These panels compare 25ER and 50ER versions of the block-energy matrix for representative LLM interfaces. The persistence of a similar block-energy pattern under a larger retained spectral window indicates that the measured channel organization is not tied to a single truncation level. Corollary 6.17 states that exact pixels need not match; the invariant quantity is the b… view at source ↗

read the original abstract

This paper develops the angular and static-channel component of Geometric and Spectral Alignment for residual Jacobian chains. Starting from Cartan-coordinate rigidity and fitted effective-rank windows, we study how dominant singular subspaces are transported across adjacent layers and how the resulting finite matrices can be displayed in physical channel coordinates. The main results are deterministic, margin-verified results. We bound the error between full interface transport and its dominant-window truncation, add fitted-tail errors so that empirical spectra can be certified against the Gibbs--Cartan tail model, and distinguish source-mode incidence from fully physical input-output channel incidence. Given row groups and active supports, the Physical Alignment Matrix decomposes orthogonally as core plus overlap plus noise. Active-column gaps, pairwise overlap margins, and noise bounds combine into a static certificate radius under which the full transport and the truncated transport induce the same active supports, pairwise incidence graph, SRS sets, hub columns, and core/overlap/noise masks. The finer SC/SA/ST labels of the Invariant Channel Mapping require additional row-energy and profile-correlation margins, stated as explicit perturbation tests. The empirical section reports the matrices and block-energy heatmaps that measure these certificate quantities across CNNs, language models, and vision/diffusion backbones. The figures are interpreted as finite-dimensional measurements; complete membership in the Physical GSA certificate domain requires checking the numerical margin protocol stated in Section 10.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a Physical Alignment Matrix and static certificate radius for layer-wise spectral transport in residuals, but the bounds depend on fitted effective-rank windows and tail errors that introduce circularity.

read the letter

The core advance is a decomposition of the Physical Alignment Matrix into core, overlap, and noise blocks, plus a margin-based radius that is claimed to guarantee the full and truncated transports produce identical active supports, incidence graphs, and masks. They also separate source-mode incidence from fully physical channel incidence and give explicit perturbation tests for the finer SC/SA/ST labels. That framing extends earlier geometric alignment work with concrete spectral transport bounds and channel coordinates, which is the genuinely new piece. The empirical section shows block-energy heatmaps and matrix visualizations across CNNs, language models, and diffusion backbones, treating the outputs as finite-dimensional measurements rather than loose visualizations. That reporting is useful for readers who want to see the quantities in practice. The main weakness is the dependence on fitted effective-rank windows and a Gibbs-Cartan tail model to certify the spectra before the margins are applied. Once those parameters are tuned to the data, the resulting certificate radius is no longer a fully independent deterministic bound; any mismatch between the model and the observed singular-value decay can shrink or invalidate the claimed guarantees. The abstract presents the results as margin-verified, but the fitting step needs explicit justification in the derivations to avoid post-hoc selection effects. This work is aimed at specialists already comfortable with Jacobian chains, singular subspaces, and robustness certificates in deep networks. A reader working on interpretability or spectral analysis of residuals could extract the new matrix construction and the perturbation tests even if the full certificate claims require tightening. It is worth sending to peer review because the formal statements are explicit and the empirical protocol is described, though the referee will need to press on the independence of the fitted components and whether the tail model holds without data-dependent adjustments.

Referee Report

3 major / 3 minor

Summary. The paper develops the angular and static-channel aspects of Geometric and Spectral Alignment for residual Jacobian chains in deep networks. Starting from Cartan-coordinate rigidity and fitted effective-rank windows, it derives deterministic bounds on the error between full interface transport and its dominant-window truncation. Fitted-tail errors are added to certify empirical spectra against the Gibbs-Cartan tail model. The Physical Alignment Matrix is shown to decompose orthogonally into core, overlap, and noise components given row groups and active supports. Active-column gaps, pairwise overlap margins, and noise bounds are combined into a static certificate radius guaranteeing that full and truncated transports produce identical active supports, pairwise incidence graphs, SRS sets, hub columns, and core/overlap/noise masks. Finer SC/SA/ST labels require additional row-energy and profile-correlation margins. Empirical results across CNNs, language models, and vision/diffusion backbones report the corresponding matrices and block-energy heatmaps, with membership in the certificate domain requiring the numerical margin protocol in Section 10.

Significance. If the central certificate construction holds without circularity, the work would supply a concrete, margin-verified framework for certifying layer-to-layer transport of dominant singular subspaces in finite-dimensional network Jacobians, distinguishing source-mode from physical channel incidence. The deterministic error bounds between full and truncated transport, together with the orthogonal decomposition of the Physical Alignment Matrix, could serve as a practical tool for verifying alignment properties across architectures. The empirical reporting of certificate quantities on real models adds falsifiability, though the reliance on fitted parameters limits the scope of the guarantees.

major comments (3)

[Abstract / main results] Abstract and main results paragraph: the static certificate radius is assembled from active-column gaps, pairwise overlap margins, and noise bounds after adding fitted-tail errors to certify spectra against the Gibbs-Cartan model. Because the effective-rank windows and tail parameters are fitted to the observed singular-value decay, the resulting radius is data-dependent; this undercuts the claim that the radius is a static, a-priori guarantee that full and truncated transports induce identical supports and masks.
[Section 10] Section 10 (numerical margin protocol): the protocol for verifying margins relies on the same fitted effective-rank windows and Gibbs-Cartan tail model. If the tail model deviates from the empirical spectrum or if window selection involves post-hoc choices, the verified margins lose their deterministic status and the certificate radius may fail to enclose the true transport behavior.
[Main results on Physical Alignment Matrix] Main results on Physical Alignment Matrix decomposition: the orthogonal decomposition into core plus overlap plus noise presupposes that the active supports and row groups are already correctly identified by the truncated transport. The circular dependence on fitted quantities means that any mismatch between the Gibbs-Cartan model and the actual tail can propagate into incorrect core/overlap/noise masks, weakening the central claim that the certificate radius preserves these structures.

minor comments (3)

[Abstract] The abstract introduces the Physical Alignment Matrix and Gibbs-Cartan tail model without a brief forward reference to their definitions; a one-sentence pointer to the relevant sections would improve readability.
[Introduction] Notation for SC/SA/ST labels and SRS sets is used before being fully expanded; a short glossary or table of acronyms in the introduction would aid readers.
[Empirical section] The empirical figures are described as 'finite-dimensional measurements'; clarifying whether the reported heatmaps include error bars or sensitivity to window choice would strengthen the presentation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and the identification of potential issues with data dependence and circularity in our claims. We address each major comment below, agreeing to make revisions to clarify the conditional nature of our deterministic certificates.

read point-by-point responses

Referee: [Abstract / main results] Abstract and main results paragraph: the static certificate radius is assembled from active-column gaps, pairwise overlap margins, and noise bounds after adding fitted-tail errors to certify spectra against the Gibbs-Cartan model. Because the effective-rank windows and tail parameters are fitted to the observed singular-value decay, the resulting radius is data-dependent; this undercuts the claim that the radius is a static, a-priori guarantee that full and truncated transports induce identical supports and masks.

Authors: The certificate is constructed as deterministic bounds once the effective-rank windows and tail parameters are fitted to the data. The 'static' descriptor indicates that the radius is fixed for a given network and fit, providing a guarantee for that instance rather than varying with the transport. However, we agree that it is not a priori in the sense of being independent of the data. We will revise the abstract and main results paragraph to explicitly note that the guarantees are conditional on the fitted model and verified margins, avoiding any implication of fully data-independent a-priori bounds. revision: yes
Referee: [Section 10] Section 10 (numerical margin protocol): the protocol for verifying margins relies on the same fitted effective-rank windows and Gibbs-Cartan tail model. If the tail model deviates from the empirical spectrum or if window selection involves post-hoc choices, the verified margins lose their deterministic status and the certificate radius may fail to enclose the true transport behavior.

Authors: Section 10 outlines a protocol that incorporates the fitted-tail errors to ensure the empirical spectrum is certified against the model. Window selection follows the effective-rank definition provided earlier in the paper, which is not post-hoc but based on the singular value decay. If the tail model deviates, the certificate does not apply, and we will add text in the revision to discuss how to assess model fit quality and the implications for the certificate's validity. revision: partial
Referee: [Main results on Physical Alignment Matrix] Main results on Physical Alignment Matrix decomposition: the orthogonal decomposition into core plus overlap plus noise presupposes that the active supports and row groups are already correctly identified by the truncated transport. The circular dependence on fitted quantities means that any mismatch between the Gibbs-Cartan model and the actual tail can propagate into incorrect core/overlap/noise masks, weakening the central claim that the certificate radius preserves these structures.

Authors: The decomposition is applied to the truncated transport after fitting the windows. The certificate radius is then used to bound the difference to the full transport, ensuring the supports and masks match if the margins hold. This avoids circularity by separating the fitting step from the certification step. We will revise the main results to clearly delineate this sequence and note that poor model fit could lead to incorrect masks, with the protocol serving as a check. revision: partial

Circularity Check

1 steps flagged

Fitted effective-rank windows and tail errors reduce certificate radius to data-dependent construction

specific steps

fitted input called prediction [Abstract]
"Starting from Cartan-coordinate rigidity and fitted effective-rank windows, we study how dominant singular subspaces are transported across adjacent layers... we bound the error between full interface transport and its dominant-window truncation, add fitted-tail errors so that empirical spectra can be certified against the Gibbs--Cartan tail model... Active-column gaps, pairwise overlap margins, and noise bounds combine into a static certificate radius under which the full transport and the truncated transport induce the same active supports, pairwise incidence graph, SRS sets, hub columns, a"

The certificate radius is assembled only after adding fitted-tail errors chosen to make the empirical spectra certifiable against the model. The resulting radius is therefore a post-fit quantity whose numerical value and enclosing property are forced by the same data-dependent fitting procedure used to represent the spectra, rather than derived independently of those fitted parameters.

full rationale

The paper's central deterministic certificate relies on starting from fitted effective-rank windows and explicitly adding fitted-tail errors to certify empirical spectra against the Gibbs-Cartan model before combining gaps, margins, and noise bounds into the static radius. This makes the guarantee of identical supports, graphs, and masks hold only after the fitting step applied to the same data, matching the fitted-input-called-prediction pattern rather than an independent first-principles bound.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The framework rests on geometric assumptions from differential geometry applied to neural Jacobians, plus fitted parameters for rank windows and spectral tails; no independent evidence is provided for the new matrix decomposition beyond the stated bounds.

free parameters (2)

effective-rank windows
Fitted to data to define truncation for dominant subspaces
tail errors
Fitted to match empirical spectra to Gibbs-Cartan model

axioms (2)

domain assumption Cartan-coordinate rigidity holds for the residual Jacobian chains
Invoked as the starting point for studying subspace transport
domain assumption Orthogonal decomposition of the Physical Alignment Matrix into core, overlap, and noise is valid under the given row groups
Central to the certificate construction

invented entities (2)

Physical Alignment Matrix no independent evidence
purpose: To represent and decompose the transport of singular subspaces in physical channel coordinates
Newly defined construct for the alignment certificates
Gibbs-Cartan tail model no independent evidence
purpose: To model the error in truncated spectra for certification
Introduced to bound fitted-tail errors

pith-pipeline@v0.9.0 · 5565 in / 1581 out tokens · 52053 ms · 2026-05-09T16:55:59.520959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Advances in Neural Information Processing Systems , volume =

Residual Alignment: Uncovering the Mechanisms of Residual Networks , author =. Advances in Neural Information Processing Systems , volume =. 2023 , eprint =

2023
[2]

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , series =

Understanding the Difficulty of Training Deep Feedforward Neural Networks , author =. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics , series =. 2010 , url =

2010
[3]

Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , author =. Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pages =. 2015 , url =

2015
[4]

2016 , eprint =

Gaussian Error Linear Units (GELUs) , author =. 2016 , eprint =

2016
[5]

2017 , eprint =

Searching for Activation Functions , author =. 2017 , eprint =

2017
[6]

2012 , isbn =

Matrix Analysis , author =. 2012 , isbn =

2012
[7]

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , author =. Proceedings of the 35th International Conference on Machine Learning , series =. 2018 , publisher =. 1802.06509 , archivePrefix =

work page Pith review arXiv 2018
[8]

Advances in Neural Information Processing Systems , volume =

Implicit Regularization in Deep Matrix Factorization , author =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

2019
[9]

2025 , eprint =

An Entropy Formula for the Deep Linear Network , author =. 2025 , eprint =

2025
[10]

Mathematics of Computation , volume =

Numerical Methods for Computing Angles Between Linear Subspaces , author =. Mathematics of Computation , volume =. 1973 , doi =

1973
[11]

1990 , isbn =

Matrix Perturbation Theory , author =. 1990 , isbn =

1990
[12]

Proceedings of the 36th International Conference on Machine Learning , series =

Parameter-Efficient Transfer Learning for NLP , author =. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , publisher =

2019
[13]

LoRA: Low-Rank Adaptation of Large Language Models

LoRA: Low-Rank Adaptation of Large Language Models , author =. International Conference on Learning Representations , year =. 2106.09685 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Sharpness-aware minimization for efficiently improving generalization.arXiv preprint arXiv:2010.01412,

Sharpness-Aware Minimization for Efficiently Improving Generalization , author =. International Conference on Learning Representations , year =. 2010.01412 , archivePrefix =

work page arXiv 2010
[15]

SIAM Review , volume =

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , author =. SIAM Review , volume =. 2011 , doi =

2011