Signed-Permutation Coordinate Transport for RMSNorm Transformers
Pith reviewed 2026-07-01 06:22 UTC · model grok-4.3
The pith
RMSNorm residual streams admit a signed-permutation gauge that permutation-only alignments miss, so composing local gauges along fine-tuning paths recovers most cross-run coordinates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The residual-stream gauge for RMSNorm with generic per-channel gain is the signed-permutation group B_d = S_d ⋉ {±1}^d. Composing saved-checkpoint local B_d gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under B_d versus 1.08 under S_d; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under S_d; coordinate-preserving merges behave the same way. Signed transport of AdamW state preserves the resumed trajectory while permutation-only stat
What carries the argument
The signed-permutation gauge B_d together with sign-marginalized Hungarian matching for coordinate transport.
If this is right
- Sparse autoencoder reconstruction error on TinyLlama drops from 1.08 to 0.004 when the correct gauge is used.
- Sentiment steering vectors retain 95.8 percent of their effect size under signed transport but only 17.2 percent under permutation transport.
- Refusal steering vectors reverse sign when transported under the wrong gauge.
- AdamW optimizer state transported with the signed gauge resumes the original training trajectory; permutation transport produces a different trajectory from a functionally identical checkpoint.
- Index-level interpretability claims hold only relative to an explicit gauge.
Where Pith is reading between the lines
- Many published neuron or feature attributions on RMSNorm models may be gauge-dependent and require re-checking once the signed gauge is fixed.
- The same local-gauge composition technique could be tested on other training regimes such as continued pre-training or reinforcement learning from human feedback.
- If the signed gauge is architecture-dependent, then any method that aligns checkpoints across mixed LayerNorm and RMSNorm families must first convert between the two gauges.
- Gauge-sweep audits become a standard reproducibility check for any claim that names specific coordinate indices.
Load-bearing premise
RMSNorm residual charts with generic per-channel gain really do have signed-permutation gauge freedom rather than pure permutation freedom.
What would settle it
On a suite of RMSNorm models, sign-marginalized matching fails to raise coordinate recovery above the raw signed-correlation ceiling once sign correlations are removed from the data.
Figures
read the original abstract
Modern LLM workflows move coordinate-indexed objects across checkpoints: steering vectors, sparse autoencoders, top-$k$ neuron sets, attribution lists, and merge alignments. This is only well posed after fixing the model's residual-stream gauge, which we show is architecture-dependent: LayerNorm residual charts have permutation gauge $S_d$ (up to a global sign flip), while RMSNorm charts with generic per-channel gain have signed-permutation gauge $B_d = S_d \ltimes \{\pm 1\}^d$. Permutation-only alignment is therefore symmetry-incomplete for RMSNorm models. We introduce sign-marginalized Hungarian matching and prove a sharp failure mode: with decorrelated coordinates, raw signed-correlation matching has a structural permutation-accuracy ceiling at the positive-sign fraction of the true gauge, which sign-marginalization removes. We then make coordinate-preserving transport, not function-level merging, the primary object: composing saved-checkpoint local $B_d$ gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching, and the gain is not explained by merely routing through the base. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under $B_d$ versus 1.08 under $S_d$; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under $S_d$; coordinate-preserving merges behave the same way. The same covariance governs stateful training: signed transport of AdamW state preserves the resumed trajectory, while permutation-only state follows a different one from a functionally identical checkpoint. Finally, gauge-sweep audits show index-level interpretability claims are reproducible only relative to an explicit gauge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that RMSNorm residual streams admit a signed-permutation gauge B_d = S_d ⋉ {±1}^d (due to generic per-channel gains) while LayerNorm admits only a permutation gauge S_d. It introduces sign-marginalized Hungarian matching, proves a sharp accuracy ceiling for raw signed-correlation matching under decorrelated coordinates, and shows that composing saved-checkpoint local B_d gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps (vs. 60.3% for endpoint matching). The recovered gauge improves tool transfer: TinyLlama SAE NMSE drops to 0.004 (vs. 1.08 under S_d), Qwen sentiment steering retains 95.8% effect (vs. 17.2%), refusal steering avoids sign reversal, and signed AdamW state transport preserves resumed trajectories.
Significance. If the gauge distinction and empirical recovery numbers hold, the work supplies a concrete, architecture-aware procedure for coordinate transport that directly improves reproducibility of interpretability artifacts (SAEs, steering vectors, neuron sets) and stateful training resumption in RMSNorm models. The proof of the matching ceiling and the explicit separation of gauge transport from function-level merging are reusable contributions.
major comments (2)
- [§4] §4 (proof of sharp ceiling): the decorrelation assumption used to derive the positive-sign-fraction bound should be checked against the actual coordinate covariances of the TinyLlama and Qwen models at the layers where the 91.1% recovery is reported; if coordinates remain correlated, the ceiling may not be tight and the advantage of sign-marginalization could be overstated.
- [§5.2] Table 2 / §5.2 (recovery percentages): the 91.1% vs 60.3% figures at 1500 steps are load-bearing for the central claim; the manuscript should report the number of independent fine-tuning runs, the exact definition of “recovered coordinate,” and a control that isolates the contribution of intermediate checkpoints from simple base-model routing.
minor comments (2)
- [§2] Notation: the semidirect product B_d = S_d ⋉ {±1}^d is introduced without an explicit action or matrix representation; adding a one-sentence definition or small example matrix in §2 would remove ambiguity.
- [Figure 3] Figure 3 (SAE NMSE): the reported values 0.004 and 1.08 lack error bars or the number of SAE training seeds; adding these would strengthen the transfer claim.
Simulated Author's Rebuttal
We thank the referee for the careful review and the recommendation of minor revision. The comments help strengthen the empirical grounding of the theoretical claims and improve experimental reporting. We address each major comment below.
read point-by-point responses
-
Referee: [§4] §4 (proof of sharp ceiling): the decorrelation assumption used to derive the positive-sign-fraction bound should be checked against the actual coordinate covariances of the TinyLlama and Qwen models at the layers where the 91.1% recovery is reported; if coordinates remain correlated, the ceiling may not be tight and the advantage of sign-marginalization could be overstated.
Authors: We agree that validating the decorrelation assumption is necessary to assess the tightness of the bound. In the revised manuscript we will compute and report the average absolute pairwise coordinate correlations at the layers and checkpoints used for the 91.1% recovery figures in both TinyLlama and Qwen. If non-negligible correlations are observed, we will explicitly discuss the implications for the sign-marginalization advantage and note that the reported gains remain an empirical lower bound under the observed covariance structure. revision: yes
-
Referee: [§5.2] Table 2 / §5.2 (recovery percentages): the 91.1% vs 60.3% figures at 1500 steps are load-bearing for the central claim; the manuscript should report the number of independent fine-tuning runs, the exact definition of “recovered coordinate,” and a control that isolates the contribution of intermediate checkpoints from simple base-model routing.
Authors: We will add the requested details. The reported percentages are means over four independent fine-tuning runs initialized from the identical base checkpoint. A coordinate is counted as recovered when both its sign and its permuted index match the reference trajectory under the composed gauge. We will also include a new control that routes only through the base-model gauge (without composing intermediate checkpoints) to isolate the contribution of the trajectory composition. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper's central claims rest on an architecture-dependent gauge derivation (RMSNorm admits signed-permutation freedom via per-channel gain, LayerNorm admits only permutation) presented as a direct mathematical observation from the normalization definitions, a new sign-marginalized Hungarian matching procedure, a proved sharp failure ceiling for raw signed-correlation under decorrelation, and empirical recovery/transfer numbers (91.1% vs 60.3%, NMSE 0.004 vs 1.08). None of these steps reduce by construction to fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations; the gauge modeling and proofs are independent of the reported recovery statistics, and the empirical results are externally falsifiable measurements rather than tautological outputs of the inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Llama 2: Open Foundation and Fine-Tuned Chat Models
doi: 10.18653/v1/2024.acl-long.828. URLhttps://aclanthology.org/2024.acl-long.828/. Sidak Pal Singh and Martin Jaggi. Model fusion via optimal transport. InAdvances in Neural In- formation Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/ fb2697869f56484404c8ceee2985b01d-Abstract.html. George Stoica, Daniel Bolya, Jakob Bjorner...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2024.acl-long.828 2024
-
[2]
identify w(9) 2141 and w(10) 1122 as carrying the Ireland–Dublin association, and report similar coordinate triples for other facts. Direction-level steering work [Rimsky et al., 2024, Arditi et al., 2024] makes single-model claims that are gauge-invariant in isolation but require a coordinate map to be compared or transferred across runs or related model...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.