Signed-Permutation Coordinate Transport for RMSNorm Transformers

John Sweeney

arxiv: 2606.31963 · v1 · pith:H4PQ55B2new · submitted 2026-06-30 · 💻 cs.LG · cs.CL· stat.ML

Signed-Permutation Coordinate Transport for RMSNorm Transformers

John Sweeney This is my paper

Pith reviewed 2026-07-01 06:22 UTC · model grok-4.3

classification 💻 cs.LG cs.CLstat.ML

keywords RMSNormgauge symmetrycoordinate alignmenttransformerssparse autoencoderssteering vectorsmodel mergingAdamW state

0 comments

The pith

RMSNorm residual streams admit a signed-permutation gauge that permutation-only alignments miss, so composing local gauges along fine-tuning paths recovers most cross-run coordinates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that LayerNorm and RMSNorm fix different residual-stream gauges: LayerNorm permits only permutations of coordinates up to global sign, while RMSNorm with per-channel gains permits signed permutations. Standard permutation matching therefore leaves an unaccounted sign degree of freedom. The authors replace raw signed-correlation matching with sign-marginalized Hungarian matching and demonstrate that chaining the local signed-permutation gauges recovered at each saved checkpoint along a fine-tuning trajectory recovers 91.1 percent of coordinates at 1500 steps. The same gauge choice determines whether downstream tools survive transport: sparse-autoencoder reconstruction error, steering-vector effect size, and even the sign of refusal behavior all degrade sharply under the wrong gauge.

Core claim

The residual-stream gauge for RMSNorm with generic per-channel gain is the signed-permutation group B_d = S_d ⋉ {±1}^d. Composing saved-checkpoint local B_d gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under B_d versus 1.08 under S_d; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under S_d; coordinate-preserving merges behave the same way. Signed transport of AdamW state preserves the resumed trajectory while permutation-only stat

What carries the argument

The signed-permutation gauge B_d together with sign-marginalized Hungarian matching for coordinate transport.

If this is right

Sparse autoencoder reconstruction error on TinyLlama drops from 1.08 to 0.004 when the correct gauge is used.
Sentiment steering vectors retain 95.8 percent of their effect size under signed transport but only 17.2 percent under permutation transport.
Refusal steering vectors reverse sign when transported under the wrong gauge.
AdamW optimizer state transported with the signed gauge resumes the original training trajectory; permutation transport produces a different trajectory from a functionally identical checkpoint.
Index-level interpretability claims hold only relative to an explicit gauge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Many published neuron or feature attributions on RMSNorm models may be gauge-dependent and require re-checking once the signed gauge is fixed.
The same local-gauge composition technique could be tested on other training regimes such as continued pre-training or reinforcement learning from human feedback.
If the signed gauge is architecture-dependent, then any method that aligns checkpoints across mixed LayerNorm and RMSNorm families must first convert between the two gauges.
Gauge-sweep audits become a standard reproducibility check for any claim that names specific coordinate indices.

Load-bearing premise

RMSNorm residual charts with generic per-channel gain really do have signed-permutation gauge freedom rather than pure permutation freedom.

What would settle it

On a suite of RMSNorm models, sign-marginalized matching fails to raise coordinate recovery above the raw signed-correlation ceiling once sign correlations are removed from the data.

Figures

Figures reproduced from arXiv: 2606.31963 by John Sweeney.

**Figure 2.** Figure 2: Per-pair transport advantage ∆ = Transport−Endpoint (percentage points) for cross-seed and cross-dataset pairs on Qwen2.5-1.5B. Each dot is a run pair; vertical bars denote group means. Transport consistently improves cross-seed recovery, while cross-dataset gains are heterogeneous and can be negative when endpoint matching is already near-ceiling [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

read the original abstract

Modern LLM workflows move coordinate-indexed objects across checkpoints: steering vectors, sparse autoencoders, top-$k$ neuron sets, attribution lists, and merge alignments. This is only well posed after fixing the model's residual-stream gauge, which we show is architecture-dependent: LayerNorm residual charts have permutation gauge $S_d$ (up to a global sign flip), while RMSNorm charts with generic per-channel gain have signed-permutation gauge $B_d = S_d \ltimes \{\pm 1\}^d$. Permutation-only alignment is therefore symmetry-incomplete for RMSNorm models. We introduce sign-marginalized Hungarian matching and prove a sharp failure mode: with decorrelated coordinates, raw signed-correlation matching has a structural permutation-accuracy ceiling at the positive-sign fraction of the true gauge, which sign-marginalization removes. We then make coordinate-preserving transport, not function-level merging, the primary object: composing saved-checkpoint local $B_d$ gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching, and the gain is not explained by merely routing through the base. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under $B_d$ versus 1.08 under $S_d$; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under $S_d$; coordinate-preserving merges behave the same way. The same covariance governs stateful training: signed transport of AdamW state preserves the resumed trajectory, while permutation-only state follows a different one from a functionally identical checkpoint. Finally, gauge-sweep audits show index-level interpretability claims are reproducible only relative to an explicit gauge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RMSNorm has a signed-permutation gauge that permutation alignment misses, and transporting local gauges along trajectories recovers coordinates and fixes downstream tool breakage where standard methods fail.

read the letter

The core point is that RMSNorm residual streams carry an extra sign freedom per coordinate on top of permutations, so plain permutation matching is incomplete for these models. The paper defines the gauge as B_d = S_d ⋉ {±1}^d when per-channel gains are generic, introduces sign-marginalized Hungarian matching to handle it, and shows that composing saved local gauges along fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching.

What stands out is the clean separation between the symmetry claim and the procedure. The proof that raw signed-correlation matching hits a structural ceiling at the positive-sign fraction under decorrelation is straightforward and useful. The empirical transfers are concrete: TinyLlama SAE NMSE drops from 1.08 to 0.004, Qwen sentiment steering holds 95.8% effect instead of 17.2%, and refusal steering avoids sign reversal. The AdamW state transport result is also direct evidence that the gauge choice affects training dynamics, not just post-hoc alignment.

The soft spots are mostly scope and verification. The recovery numbers rely on same-base trajectories; it is not yet clear how well the method extends to unrelated checkpoints or different optimizers. The audits on interpretability reproducibility are mentioned but not quantified in detail here. The TinyLlama and Qwen experiments are narrow, so the practical gain could be model- or scale-specific until more cases are shown.

This is aimed at people doing mechanistic interpretability, SAE training, or steering on RMSNorm models who already move features across checkpoints. The math and the failure-mode proof are solid enough that a serious referee should see it; the empirical claims are falsifiable and worth checking in review.

Referee Report

2 major / 2 minor

Summary. The paper claims that RMSNorm residual streams admit a signed-permutation gauge B_d = S_d ⋉ {±1}^d (due to generic per-channel gains) while LayerNorm admits only a permutation gauge S_d. It introduces sign-marginalized Hungarian matching, proves a sharp accuracy ceiling for raw signed-correlation matching under decorrelated coordinates, and shows that composing saved-checkpoint local B_d gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps (vs. 60.3% for endpoint matching). The recovered gauge improves tool transfer: TinyLlama SAE NMSE drops to 0.004 (vs. 1.08 under S_d), Qwen sentiment steering retains 95.8% effect (vs. 17.2%), refusal steering avoids sign reversal, and signed AdamW state transport preserves resumed trajectories.

Significance. If the gauge distinction and empirical recovery numbers hold, the work supplies a concrete, architecture-aware procedure for coordinate transport that directly improves reproducibility of interpretability artifacts (SAEs, steering vectors, neuron sets) and stateful training resumption in RMSNorm models. The proof of the matching ceiling and the explicit separation of gauge transport from function-level merging are reusable contributions.

major comments (2)

[§4] §4 (proof of sharp ceiling): the decorrelation assumption used to derive the positive-sign-fraction bound should be checked against the actual coordinate covariances of the TinyLlama and Qwen models at the layers where the 91.1% recovery is reported; if coordinates remain correlated, the ceiling may not be tight and the advantage of sign-marginalization could be overstated.
[§5.2] Table 2 / §5.2 (recovery percentages): the 91.1% vs 60.3% figures at 1500 steps are load-bearing for the central claim; the manuscript should report the number of independent fine-tuning runs, the exact definition of “recovered coordinate,” and a control that isolates the contribution of intermediate checkpoints from simple base-model routing.

minor comments (2)

[§2] Notation: the semidirect product B_d = S_d ⋉ {±1}^d is introduced without an explicit action or matrix representation; adding a one-sentence definition or small example matrix in §2 would remove ambiguity.
[Figure 3] Figure 3 (SAE NMSE): the reported values 0.004 and 1.08 lack error bars or the number of SAE training seeds; adding these would strengthen the transfer claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and the recommendation of minor revision. The comments help strengthen the empirical grounding of the theoretical claims and improve experimental reporting. We address each major comment below.

read point-by-point responses

Referee: [§4] §4 (proof of sharp ceiling): the decorrelation assumption used to derive the positive-sign-fraction bound should be checked against the actual coordinate covariances of the TinyLlama and Qwen models at the layers where the 91.1% recovery is reported; if coordinates remain correlated, the ceiling may not be tight and the advantage of sign-marginalization could be overstated.

Authors: We agree that validating the decorrelation assumption is necessary to assess the tightness of the bound. In the revised manuscript we will compute and report the average absolute pairwise coordinate correlations at the layers and checkpoints used for the 91.1% recovery figures in both TinyLlama and Qwen. If non-negligible correlations are observed, we will explicitly discuss the implications for the sign-marginalization advantage and note that the reported gains remain an empirical lower bound under the observed covariance structure. revision: yes
Referee: [§5.2] Table 2 / §5.2 (recovery percentages): the 91.1% vs 60.3% figures at 1500 steps are load-bearing for the central claim; the manuscript should report the number of independent fine-tuning runs, the exact definition of “recovered coordinate,” and a control that isolates the contribution of intermediate checkpoints from simple base-model routing.

Authors: We will add the requested details. The reported percentages are means over four independent fine-tuning runs initialized from the identical base checkpoint. A coordinate is counted as recovered when both its sign and its permuted index match the reference trajectory under the composed gauge. We will also include a new control that routes only through the base-model gauge (without composing intermediate checkpoints) to isolate the contribution of the trajectory composition. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims rest on an architecture-dependent gauge derivation (RMSNorm admits signed-permutation freedom via per-channel gain, LayerNorm admits only permutation) presented as a direct mathematical observation from the normalization definitions, a new sign-marginalized Hungarian matching procedure, a proved sharp failure ceiling for raw signed-correlation under decorrelation, and empirical recovery/transfer numbers (91.1% vs 60.3%, NMSE 0.004 vs 1.08). None of these steps reduce by construction to fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations; the gauge modeling and proofs are independent of the reported recovery statistics, and the empirical results are externally falsifiable measurements rather than tautological outputs of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces the mathematical group B_d as the gauge symmetry for RMSNorm but does not list numerical free parameters, additional axioms, or new physical entities. The gauge is a definitional object rather than a fitted quantity.

pith-pipeline@v0.9.1-grok · 5858 in / 1221 out tokens · 32560 ms · 2026-07-01T06:22:57.110044+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Llama 2: Open Foundation and Fine-Tuned Chat Models

doi: 10.18653/v1/2024.acl-long.828. URLhttps://aclanthology.org/2024.acl-long.828/. Sidak Pal Singh and Martin Jaggi. Model fusion via optimal transport. InAdvances in Neural In- formation Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/ fb2697869f56484404c8ceee2985b01d-Abstract.html. George Stoica, Daniel Bolya, Jakob Bjorner...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2024.acl-long.828 2024
[2]

identify w(9) 2141 and w(10) 1122 as carrying the Ireland–Dublin association, and report similar coordinate triples for other facts. Direction-level steering work [Rimsky et al., 2024, Arditi et al., 2024] makes single-model claims that are gauge-invariant in isolation but require a coordinate map to be compared or transferred across runs or related model...

2024

[1] [1]

Llama 2: Open Foundation and Fine-Tuned Chat Models

doi: 10.18653/v1/2024.acl-long.828. URLhttps://aclanthology.org/2024.acl-long.828/. Sidak Pal Singh and Martin Jaggi. Model fusion via optimal transport. InAdvances in Neural In- formation Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/ fb2697869f56484404c8ceee2985b01d-Abstract.html. George Stoica, Daniel Bolya, Jakob Bjorner...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2024.acl-long.828 2024

[2] [2]

identify w(9) 2141 and w(10) 1122 as carrying the Ireland–Dublin association, and report similar coordinate triples for other facts. Direction-level steering work [Rimsky et al., 2024, Arditi et al., 2024] makes single-model claims that are gauge-invariant in isolation but require a coordinate map to be compared or transferred across runs or related model...

2024