Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

Hongbo Wang (Stony Brook University)

arxiv: 2606.03003 · v1 · pith:W3XQS45Fnew · submitted 2026-06-02 · 💻 cs.LG · cs.AI· cs.RO

Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

Hongbo Wang (Stony Brook University) This is my paper

Pith reviewed 2026-06-28 11:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.RO

keywords equivariancezero-shot generalizationlatent world modelssymmetry groupsorthogonal representationsone-step predictionclosed-loop control

0 comments

The pith

Maintaining exact equivariance through training makes one-step prediction error invariant across the full symmetry group.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an equivariant encoder and equivariant predictor together make the relative mean-squared error of one-step predictions exactly the same for every element of the group when the underlying dynamics respect an orthogonal group action. Because the loss itself is invariant, training the model on data from only one slice of orientations fixes its behavior on the entire orbit without any further data. This invariance holds after real optimization runs that keep the encode-then-predict residual at machine precision, and it produces flat error across the group while non-equivariant models of the same class fail badly outside the training slice. The same symmetry argument carries over to closed-loop trajectories under an equivariant planner, keeping control error invariant as well.

Core claim

When the world's dynamics carry a group G acting on latents by an orthogonal representation ρ(g), the one-step prediction relMSE of the composed equivariant model is exactly invariant across G, so that fitting the dynamics on a restricted slice of orientations mathematically determines the model on the entire orbit.

What carries the argument

The composition of an exactly equivariant encoder E and equivariant predictor f, which makes the training loss (one-step relMSE) invariant under the group action.

If this is right

One-step prediction error stays flat to five or more digits across the full group orbit.
Closed-loop trajectories under a matching equivariant planner transform exactly by ρ(g), keeping error invariant.
H-step rollouts remain invariant at every horizon because equivariance is preserved under composition.
The equivariant model achieves the same across-group metric with 4.5-7.4 times fewer parameters than a non-equivariant baseline of the same hypothesis class.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Enforcing exact equivariance may let models trained on limited orientation data achieve perfect generalization in any physical system whose symmetries are captured by an orthogonal representation.
The invariance could extend to other group actions beyond rotations and translations if the representation condition is met.
Because the property is algebraic and survives optimization, it offers a route to sample-efficient learning that does not rely on data augmentation or increased scale.

Load-bearing premise

The encoder and predictor remain exactly equivariant after training, with the composed residual near machine precision.

What would settle it

Measuring that the one-step relMSE varies by more than a few digits across different group elements g after training on only one slice would show the claimed invariance does not hold.

Figures

Figures reproduced from arXiv: 2606.03003 by Hongbo Wang (Stony Brook University).

**Figure 2.** Figure 2: Where the geometric bet pays off — a near-total, data-proof-in-𝑁 win across the group, a washto-loss in-distribution. (left) The sample-efficiency frontier under an exactly SO(3) teacher: latent 1-step relMSE vs training-set size 𝑁, the VN’s whole-group curve descending while the baseline’s is a wall. (middle) The symmetry-break 𝑔 × data 𝑁 plane, scored on the across-group metric — the prior wins 24/25 ce… view at source ↗

**Figure 3.** Figure 3: The in-distribution gap does not widen with the symmetry break — tested directly at large data. We plot the in-wedge VN−MLP gap (mean ± seed std) against log2 𝑁 for 𝑁 ∈ {512, 1024, 2048}, one line per break strength 𝑔 ∈ {0, 0.4, 0.8}, under a fixed-epochs (150) budget so the 124K baseline is fully converged at every 𝑁 (more total updates at larger 𝑁, 𝑁=512 reproducing the phase-plane 600). The lines 18 [P… view at source ↗

**Figure 2.** Figure 2: Where the geometric bet pays off [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

**Figure 3.** Figure 3: In-distribution gap does not widen with the break, even at large N 19 [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Augmentation vs exact equivariance in the closed loop Closed-loop OOD/seen orientation-error ratio: the exact VN is flat (×1.000); full-SO(3) augmentation narrows the un-augmented MLP’s ×1.401 to ×1.071, but its pooled CI still excludes 1 (sign 𝑝 = 0.02) — it does not close the loop. (b) Composed equivariance residual Δeq (log): augmentation (≈ 11) is no better than no augmentation (≈ 4.4), and ∼106× the V… view at source ↗

**Figure 1.** Figure 1: The payoff as the three error bars a sceptic asks for, read straight from the per-step runs logged below. (a) OOD/seen prediction-error factor: the equivariant model is flat (≈ ×1) across every setting — SO(2) synth & real (Steps 8, 10), SO(2) latent (Step 11), SO(3) 3D (Step 13), full SE(3) (Step 15) — while the same-hypothesis-class baseline blows up ×13–×157. (b) Five independently trained (VN, MLP) 32 … view at source ↗

**Figure 5.** Figure 5: The geometric payoff in one figure pairs, real-PushT closed-loop pose control (Step 17): the VN’s seen-vs-unseen block-angle sits on 𝑦 = 𝑥 (orientation-invariant, Δ = −1.0°) while the baseline sits above (Δ = +9.6°) — the contrast is the architecture, not the lucky seed. (c) Deliberately breaking the SO(3) symmetry of the teacher (Step 16): the prior’s OOD error rises (it is not free once the world de-sym… view at source ↗

**Figure 6.** Figure 6: Where the geometric bet pays off [PITH_FULL_IMAGE:figures/full_fig_p057_6.png] view at source ↗

**Figure 7.** Figure 7: In-distribution gap does not widen with the break, even at large N 58 [PITH_FULL_IMAGE:figures/full_fig_p058_7.png] view at source ↗

**Figure 8.** Figure 8: The interpolation/extrapolation flip 60 [PITH_FULL_IMAGE:figures/full_fig_p060_8.png] view at source ↗

**Figure 5.** Figure 5: The degree ladder at constant depth/width/(near-)parameters, sweeping only the representable degree 𝑑max = 2𝐿. (left) the recovery curve: in-distribution relMSE drops once at the first cross-product rung (𝐿=1) and then saturates flat through 𝑑max = 4, 8 — a degree signature (a missing primitive), the plateau sitting well above the unconstrained MLP ceiling (no capacity ramp). (centre) the global OOD/seen r… view at source ↗

**Figure 9.** Figure 9: The tensor-product degree ladder The other axis — enriching the message saturates too (Step 42), and where the residual actually lives Step 32 swept the predictor’s representable degree and found the interaction-cap recovery saturates after one cross product. But the degree-1 VN’s deeper limitation is that a homogeneous, SO(3)-equivariant predictor cannot synthesise 1/‖𝑟‖ at any degree: from the raw relat… view at source ↗

**Figure 10.** Figure 10: The tensor-product message ladder Figure 5b. The message ladder — companion to [PITH_FULL_IMAGE:figures/full_fig_p072_10.png] view at source ↗

**Figure 11.** Figure 11: The encoder ladder + lossless oracle Figure 5c. The encoder ladder + oracle bypass — the third and decisive axis. (left) in-distribution relMSE: the encoder rungs E0–E3 (blue) saturate near the cap as internal capacity grows at a fixed 16-vector latent budget (best rung closes 29% of the gap to the MLP), while the lossless point-cloud oracle (green) collapses to ∼0.003 — closing 156%, past the MLP ceiling… view at source ↗

**Figure 12.** Figure 12: The encoder output-budget sweep Figure 5d. The output-budget sweep — does widening the readout drop the floor? (left) in-distribution relMSE vs output budget 𝑛out ∈ {16, 24, 32, 48}: the budget rungs (blue) only inch down from the 0.253 cap (grey) to 0.227 — 21% of the way to the MLP ceiling (red, 0.131) — while the lossless oracle (green) sits at 0.003; the yellow line marks 𝑛out=24, where the readout wi… view at source ↗

**Figure 13.** Figure 13: The symmetry prior is recoverable from data, and falsifiably so [PITH_FULL_IMAGE:figures/full_fig_p077_13.png] view at source ↗

**Figure 7.** Figure 7: The active-inference task win survives a noisy cue. (left, [A]) at noise floor 𝜖0 = 0.15 the exactmutual-information EFE planner (0.38) beats the reward-only hedge (0.62, above the provable hedge floor 𝑑 = 0.57) and closes to within noise of the oracle (0.32), sensing the cue 8.3 times. (centre, [B]) sweeping the noise floor traces the de-construction: as 𝜖0 → 0 the agent recovers Step 25 (EFE ≈ oracle), … view at source ↗

**Figure 14.** Figure 14: Active inference under a noisy cue: the win survives, de-constructed 27. Does the latent world model transfer across a combinatorial axis (object count) it never trains on? (Step 35) Every generalisation result so far has lived on the continuous group: rotate/translate the scene and the prediction follows. Step 35 opens an orthogonal, discrete axis — the number of interacting objects 𝑂 — and asks the shar… view at source ↗

**Figure 15.** Figure 15: Combinatorial 举一反三: one training count determines the many-body family 81 [PITH_FULL_IMAGE:figures/full_fig_p081_15.png] view at source ↗

**Figure 16.** Figure 16: A discovered symmetry, distilled into a free predictor, buys most of the hard-wired across-group generalisation [PITH_FULL_IMAGE:figures/full_fig_p083_16.png] view at source ↗

**Figure 10.** Figure 10: The active-inference win transfers to a generic 𝐾-target identification search — no mirror goals, a 𝐾-ary categorical belief, the exact categorical mutual information as the drive. (left, [A]) on 24 generic 85 [PITH_FULL_IMAGE:figures/full_fig_p085_10.png] view at source ↗

**Figure 17.** Figure 17: Active inference on a generic K-target search: the win survives the mirror’s removal 𝐾=3 POMDPs the EFE planner (0.39) beats the reward-only hedge (0.69, above the provable hedge floor 0.78) and attains the oracle floor (0.38 — the gap’s CI includes 0), reading the off-path cue 10.6× to resolve the belief to 𝑝true = 1.00. (centre, [B]) the win scales with 𝐾 (ratios 0.60/0.71/0.55 at 𝐾=3/4/5) and both fals… view at source ↗

**Figure 18.** Figure 18: Decoder-free latent-goal reaching: the only outright failure, cured and made exactly equivariant [PITH_FULL_IMAGE:figures/full_fig_p088_18.png] view at source ↗

read the original abstract

A latent world model built from an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry of its training loss: when the world's dynamics genuinely carries a group $G$ acting on latents by an orthogonal representation $\rho(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting the dynamics on a restricted slice of orientations mathematically determines it on the entire orbit (j\v{u} y\=i f\v{a}n s\=an). We verify this end-to-end at laptop scale (CPU/MPS, fully seeded). [A] The symmetry survives a real Muon/AdamW + EMA + VICReg run -- composed encode-then-predict residual $\sim 10^{-6}$ after optimisation, not just at initialisation, and under any optimiser. [B] One-step error is flat to five digits across the group, while a same-hypothesis-class non-equivariant baseline fits the slice but breaks out-of-distribution (VN $\times 1.00$ vs baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, $\times 157$ over the full $\mathrm{SE}(3)$ ladder), with the equivariant model $4.5$-$7.4\times$ smaller. [C] The same isometry argument lifts to closed loop: under a matching equivariant planner the control trajectory at orientation $g$ is exactly $\rho(g)$ applied to the seen one, so closed-loop error is invariant across the group -- float-floor-exact in 2D/$\mathrm{SO}(2)$ on real PushT and statistically flat in 3D/$\mathrm{SE}(3)$ (disjoint 95% CIs). We stress-test the prior against Sutton's Bitter Lesson: augmentation, brute-force scale, and soft-equivariance each close at most the across-group task metric, never the float-floor exactness. Because equivariance is closed under composition, the $H$-fold rollout stays flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, while the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Exact equivariance in the encoder and predictor survives training and produces float-floor exact invariance of the prediction loss across the group when the dynamics respect the same orthogonal action.

read the letter

The central result is straightforward: if the encoder E and predictor f are exactly equivariant under an orthogonal representation ρ of group G, and the latent dynamics transform the same way, then the one-step relative MSE is identical for every group element. Fitting on one slice therefore determines the behavior on the whole orbit by algebra alone. The paper verifies that this equivariance holds after full optimization, with the composed residual at roughly 10^{-6} under Muon/AdamW plus EMA and VICReg.

The experiments back the claim at small scale. One-step error stays flat to five digits across SO(2) and SE(3), while a non-equivariant baseline of the same hypothesis class degrades sharply out of distribution. The closed-loop case with an equivariant planner shows the same invariance, with trajectories at g exactly the image under ρ(g) of the observed trajectory. The H-step rollout stays flat as well because equivariance is closed under composition.

The algebraic step is direct and does not rely on extra assumptions beyond the orthogonal action and exact equivariance. The empirical part shows the property is not destroyed by gradient steps, which is the part that was not obvious from prior equivariant-network work.

The main limitation is scope. All runs are laptop-scale with fully seeded CPU/MPS setups, and the authors explicitly set task-success sweeps, planner-free cases, and scaling outside the paper. The result also assumes the true dynamics carry the group action exactly on the latents; real data may only approximate that.

This is useful for anyone building world models or dynamics predictors where symmetry is known and exact generalization across orientations matters. It has a clean argument plus reproducible numbers, so it deserves a serious referee even if later work will need to test larger tasks.

Referee Report

2 major / 2 minor

Summary. The paper claims that when an encoder E and one-step predictor f are exactly equivariant w.r.t. an orthogonal representation ρ of group G, and the latent dynamics transform under the same ρ, the one-step relative MSE loss is exactly invariant across G. Consequently, training on any symmetry slice mathematically determines the model on the full orbit. The authors supply an algebraic derivation of this invariance, report that exact equivariance is preserved after full Muon/AdamW+EMA+VICReg optimization (composed residual ~10^{-6}), and show empirically that one-step error remains flat to five digits while a non-equivariant baseline of the same hypothesis class fails out-of-distribution (factors of 13.8–157). The same isometry lifts to closed-loop trajectories under an equivariant planner, with H-step rollouts staying flat (≤2×10^{-7}) while baselines compound.

Significance. If the central algebraic claim and the empirical maintenance of equivariance hold, the result supplies a parameter-free, composition-closed guarantee of across-group generalization that is strictly stronger than what data augmentation, scale, or soft-equivariance can deliver. The work is notable for its fully seeded, laptop-scale reproducibility, explicit residual measurements, and direct comparison showing that only exact equivariance achieves float-floor invariance rather than merely improved averages.

major comments (2)

[Abstract / one-step invariance derivation] Abstract and § on one-step invariance: the derivation assumes the latent dynamics transform exactly under the same orthogonal ρ as the model; if the true dynamics only approximately commute with ρ, the invariance becomes approximate rather than exact, yet the manuscript does not quantify the sensitivity of the five-digit flatness to small deviations from orthogonality or exact equivariance of the dynamics.
[Closed-loop / planner] Closed-loop section: the claim that control trajectories transform exactly as ρ(g) applied to the seen trajectory requires an equivariant planner; the manuscript states this but does not specify how the planner is constructed or whether its equivariance is enforced to the same 10^{-6} residual as E and f.

minor comments (2)

[Abstract] Abstract contains the string “j\v{u} y\=i f\v{a}n s\=an”; this appears to be an artifact and should be removed or replaced with the intended English phrase.
[Empirical verification] The manuscript repeatedly cites “five digits” and “float-floor-exact” without stating the floating-point precision or the exact tolerance used to declare flatness; adding a short methods paragraph on numerical thresholds would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation of minor revision and the constructive comments. We address each major point below and will incorporate clarifications into the revised manuscript.

read point-by-point responses

Referee: [Abstract / one-step invariance derivation] Abstract and § on one-step invariance: the derivation assumes the latent dynamics transform exactly under the same orthogonal ρ as the model; if the true dynamics only approximately commute with ρ, the invariance becomes approximate rather than exact, yet the manuscript does not quantify the sensitivity of the five-digit flatness to small deviations from orthogonality or exact equivariance of the dynamics.

Authors: The derivation is explicitly conditioned on the assumption that the latent dynamics transform exactly under the orthogonal representation ρ, as stated in the manuscript ('when the world's dynamics genuinely carries a group G acting on latents by an orthogonal representation ρ(g)'). The empirical results use environments constructed to satisfy this exactly. We agree that sensitivity to small deviations from exact equivariance is not quantified; a concise discussion of this modeling assumption and its implications for approximate cases will be added to the revised manuscript. revision: yes
Referee: [Closed-loop / planner] Closed-loop section: the claim that control trajectories transform exactly as ρ(g) applied to the seen trajectory requires an equivariant planner; the manuscript states this but does not specify how the planner is constructed or whether its equivariance is enforced to the same 10^{-6} residual as E and f.

Authors: The closed-loop invariance is stated to hold under a matching equivariant planner. In the reported experiments the planner is constructed by composing the equivariant encoder E, predictor f and an equivariant policy head, preserving the same composed residual of ~10^{-6}. We will add a paragraph to the closed-loop section describing this construction and confirming the residual measurement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; invariance is direct algebraic consequence of definitions

full rationale

The paper derives invariance of one-step relMSE from the assumption that E and f are exactly equivariant under orthogonal ρ, which is a direct consequence of the definitions of equivariance and orthogonality preserving the metric; this algebraic step does not reduce to a fit, self-definition, or self-citation. Persistence of equivariance after training is presented as an empirical observation (residual ~10^{-6}), not a derived claim. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the derivation chain. The result is self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the dynamics possessing the stated group symmetry with orthogonal action and on the encoder/predictor remaining exactly equivariant after optimization.

axioms (1)

domain assumption The world's dynamics carries a group G acting on latents by an orthogonal representation ρ(g)
This is the explicit condition given in the abstract under which the loss invariance holds.

pith-pipeline@v0.9.1-grok · 5970 in / 1315 out tokens · 39356 ms · 2026-06-28T11:35:18.290928+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 linked inside Pith

[1]

only approximate symmetry

Honest scope, confidence, and what’s next • Mechanism (equivariance ⇒ generalisation across the group): confidence ≈ 0.9. Clean at the prediction level on exactly-equivariant dynamics — now including a real simulator (Step 10 [B]: ×16 OOD on PushT, VN flat), not only synthetic teachers. • A real system with exact interior symmetry (Step 10 [A]). PushT tur...

2048
[2]

inherent to the isotropic Gaussian

The two papers we stand on 1.1 LeJEPA (Balestriero & LeCun, arXiv:2511.08544) LeJEPA replaces the heuristic anti-collapse machinery of SSL (stop-grad, EMA targets, whitening, teacher schedules) with a single principled regulariser. • Optimal-embedding theorem (Thm 1). Among embedding distributions with a fixed scalar covariance bud- get, the isotropic Gau...

Pith/arXiv arXiv 2026
[3]

up to a global 𝑂(𝑛)

The gap Their entire theory is phrased “up to a global 𝑂(𝑛)” and then has to assume 𝑂(𝑛)-invariant costs to plan. But:
[4]

The physically meaningful object is a subgroup 𝐺 ↪ 𝑂(𝑛)(e.g

𝑂(𝑛)is the largest possible indeterminacy; for a world with a real symmetry it is far too coarse. The physically meaningful object is a subgroup 𝐺 ↪ 𝑂(𝑛)(e.g. 𝜌(SO(3))acting on type-1 latents), not all of 𝑂(𝑛)
[5]

Real costs are invariant under the world’s symmetry 𝐺 (a reaching cost is SE(3)-invariant, not invariant under scrambling unrelated latent axes)

Almost no real cost is invariant under arbitrary latent rotations — the 𝑂(𝑛)-invariant-cost hypothesis of Thm 5.4 is unrealistically strong in practice. Real costs are invariant under the world’s symmetry 𝐺 (a reaching cost is SE(3)-invariant, not invariant under scrambling unrelated latent axes)
[6]

equivariant encoder

Their model is passive: identifiability is something the data-generating process either grants or doesn’t. Equivari- ance lets us install the symmetry in the architecture and ask a sharper question — when the world has symmetry 𝐺, what does an encoder that carries 𝜌(𝐺)exactly add to the identifiability picture? This is the white space. Below, 𝐺 is a compa...
[7]

scalar on each irrep copy

C1 — Block-isotropy is the equivariant SIGReg target (proved) Proposition 1 (Schur block-isotropy). Let 𝑍 ∈ ℝ 𝑛 be mean-zero with 𝐺-invariant law under 𝜌 ∶ 𝐺 → 𝑂(𝑛), and Σ = 𝔼[𝑍𝑍 ⊤]. Decompose into real isotypic components ℝ𝑛 = ⨁𝑖 𝑉⊕𝑚𝑖 𝑖 (𝑉𝑖 the distinct real irreducibles, 𝑑𝑖 = dim 𝑉𝑖, multiplicity 𝑚𝑖). Then 𝜌(𝑔) Σ = Σ 𝜌(𝑔) ∀𝑔 ⟹ Σ = ⨁ 𝑖 (I𝑑𝑖 ⊗ 𝐵𝑖), 𝐵 𝑖 ⪰ ...
[8]

𝑄 must preserve the target law, 𝑄Σ𝑄⊤ = Σ

Law-matching only. 𝑄 must preserve the target law, 𝑄Σ𝑄⊤ = Σ . With 𝜎2 𝑖 distinct, Σ’s eigenspaces are exactly the isotypic components, so 𝑄 must preserve each: 𝑄 ∈ Stab𝑂(𝑛)(Σ) = ∏𝑖 𝑂(𝑑𝑖𝑚𝑖). This already drops the gauge from 𝑂(𝑛)(LeJEPA ’s degenerateΣ = 𝜎 2𝐼, eigenspace all of ℝ𝑛) to the within-block product — a strict, spectrum-driven reduction
[9]

Up to ∏𝑖{±1}

Equivariant recovery. If we additionally demand the recovery map ℎ = 𝑓 ∘ 𝑔 be 𝐺-equivariant (true for a matched equivariant encoder on equivariant data), then 𝑄𝑧 = ℎ(𝑧)forces 𝑄 ∈ Comm(𝜌) ∶= {𝑄 ∈ 𝑂(𝑛) ∶ 𝑄𝜌(𝑔) = 𝜌(𝑔)𝑄}. Intersecting with (1), in the real type 𝑄 = ⨁𝑖 I𝑑𝑖 ⊗ 𝑄𝑖 with 𝑄𝑖 ∈ 𝑂(𝑚𝑖), i.e. the residual gauge is the orthogonal commutant ∏ 𝑖 𝑂(𝑚𝑖) (mix...
[10]

recover the true DOF with their symmetry structure

𝜌(𝐺)itself is a third group (the image of the representation); the gauge is not 𝜌(𝐺)— it is 𝜌(𝐺)’s commu- tant. The honest one-liner is therefore: equivariance + block-isotropy + distinct scales reduces the gauge from 𝑂(𝑛)to the (finite, when multiplicity-free) commutant ∏𝑖 𝑂(𝑚𝑖), and in doing so identifies the 𝜌(𝐺)- module structure — i.e. recovers the t...
[11]

scale-sensitive task

C2 — Equivariant latent dynamics: the world model resolves the gauge SSL leaves free Their guarantee requires the world to lie in the stationary additive-noise (OU) class, and identifies the latent only up to the static nuisance 𝑄 ∈ 𝑂(𝑛). §3 sharpened the static picture but found the per-irrep scales underdetermined in pure SSL (§7): with equal scales Σ∞ ...
[12]

C3 — Planning under 𝐺-invariant (not 𝑂(𝑛)-invariant) costs Thm 5.4 needs the cost invariant under all of 𝑂(𝑛)— a hypothesis unrealistically strong in practice, since real planning costs are invariant under the world’s symmetry 𝐺, not an arbitrary latent rotation. Under an equivariant encoder whose residual identifiability is pinned to 𝜌(𝐺)(C1, distinct-sc...
[13]

alignment penalises Hermite degree

A bridge already built: the degree ladder ↔ their Hermite spectral penalty Thm 5.1’s forward direction is a Hermite-degree spectral decomposition: each degree of nonlinearity strictly reduces positive-pair correlation, so the linear map wins. We built a predictor with a tunable maximum polynomial degree, 𝑑max(𝐿) = 2𝐿 (the degree-ladder predictor), and sho...
[14]

block-SIGReg flat on the valid laws

Minimal experiment for C1 — built and run (laptop CPU, seeded) experiments/step39_block_sigreg.py (+ tests/test_step39_block_sigreg.py) realises C1 on a mixed- type SO(3) point-cloud latent: 𝑛0 = 4 invariant scalars ( 0e) and 𝑛1 = 6 vectors (1o), so 𝜌(𝑅) =I4 ⊕ (I6 ⊗ 𝑅)on ℝ22 — two inequivalent irreps, the minimal setting where vanilla and block-SIGReg gen...
[15]

Direction 3 — compositional bi-block-SIGReg on a product symmetry 𝑆𝑂 × 𝑆𝑂(3) §7 proved block-SIGReg on a single object’s SE(3)-type structure. The open question it leaves: does a product sym- metry buy a strictly finer identifiability rung that single-object block-SIGReg cannot reach? A scene of several inter- changeable, individually-rotating objects is ...
[16]

+ (24 2 ); resolving the 𝟙 ⊕ std split inside each SE(3) block reaches 184 = (2
[17]

This rung does not exist for a single object : at 𝑂 = 1, ℝ1 = 𝟙 and std = 0, so there is nothing for 𝑆𝑂 to refine — it is a genuinely compositional identifiability gain

+ (18 2 ). This rung does not exist for a single object : at 𝑂 = 1, ℝ1 = 𝟙 and std = 0, so there is nothing for 𝑆𝑂 to refine — it is a genuinely compositional identifiability gain. An orthogonal Helmert change of basis 𝑈 ⊗ I𝐷obj on the object axis (row 0 = the mean = 𝟙 ; rows 1..𝑂−1 = an orthonormal basis of 𝟙⟂ = std) makes the four blocks contiguous with...
[19]

Discussion: what is new, and where it sits in the program This work builds on the identifiability program of arXiv:2605.26379 and advances it on one axis. It (1) locates where the 𝑂(𝑛)indeterminacy is doing too much work, (2)provesthe symmetry-structured refinement (Schur), (3) instantiates the refined theorems with experiments already on the board (𝐺-inv...

Pith/arXiv arXiv 2026

[1] [1]

only approximate symmetry

Honest scope, confidence, and what’s next • Mechanism (equivariance ⇒ generalisation across the group): confidence ≈ 0.9. Clean at the prediction level on exactly-equivariant dynamics — now including a real simulator (Step 10 [B]: ×16 OOD on PushT, VN flat), not only synthetic teachers. • A real system with exact interior symmetry (Step 10 [A]). PushT tur...

2048

[2] [2]

inherent to the isotropic Gaussian

The two papers we stand on 1.1 LeJEPA (Balestriero & LeCun, arXiv:2511.08544) LeJEPA replaces the heuristic anti-collapse machinery of SSL (stop-grad, EMA targets, whitening, teacher schedules) with a single principled regulariser. • Optimal-embedding theorem (Thm 1). Among embedding distributions with a fixed scalar covariance bud- get, the isotropic Gau...

Pith/arXiv arXiv 2026

[3] [3]

up to a global 𝑂(𝑛)

The gap Their entire theory is phrased “up to a global 𝑂(𝑛)” and then has to assume 𝑂(𝑛)-invariant costs to plan. But:

[4] [4]

The physically meaningful object is a subgroup 𝐺 ↪ 𝑂(𝑛)(e.g

𝑂(𝑛)is the largest possible indeterminacy; for a world with a real symmetry it is far too coarse. The physically meaningful object is a subgroup 𝐺 ↪ 𝑂(𝑛)(e.g. 𝜌(SO(3))acting on type-1 latents), not all of 𝑂(𝑛)

[5] [5]

Real costs are invariant under the world’s symmetry 𝐺 (a reaching cost is SE(3)-invariant, not invariant under scrambling unrelated latent axes)

Almost no real cost is invariant under arbitrary latent rotations — the 𝑂(𝑛)-invariant-cost hypothesis of Thm 5.4 is unrealistically strong in practice. Real costs are invariant under the world’s symmetry 𝐺 (a reaching cost is SE(3)-invariant, not invariant under scrambling unrelated latent axes)

[6] [6]

equivariant encoder

Their model is passive: identifiability is something the data-generating process either grants or doesn’t. Equivari- ance lets us install the symmetry in the architecture and ask a sharper question — when the world has symmetry 𝐺, what does an encoder that carries 𝜌(𝐺)exactly add to the identifiability picture? This is the white space. Below, 𝐺 is a compa...

[7] [7]

scalar on each irrep copy

C1 — Block-isotropy is the equivariant SIGReg target (proved) Proposition 1 (Schur block-isotropy). Let 𝑍 ∈ ℝ 𝑛 be mean-zero with 𝐺-invariant law under 𝜌 ∶ 𝐺 → 𝑂(𝑛), and Σ = 𝔼[𝑍𝑍 ⊤]. Decompose into real isotypic components ℝ𝑛 = ⨁𝑖 𝑉⊕𝑚𝑖 𝑖 (𝑉𝑖 the distinct real irreducibles, 𝑑𝑖 = dim 𝑉𝑖, multiplicity 𝑚𝑖). Then 𝜌(𝑔) Σ = Σ 𝜌(𝑔) ∀𝑔 ⟹ Σ = ⨁ 𝑖 (I𝑑𝑖 ⊗ 𝐵𝑖), 𝐵 𝑖 ⪰ ...

[8] [8]

𝑄 must preserve the target law, 𝑄Σ𝑄⊤ = Σ

Law-matching only. 𝑄 must preserve the target law, 𝑄Σ𝑄⊤ = Σ . With 𝜎2 𝑖 distinct, Σ’s eigenspaces are exactly the isotypic components, so 𝑄 must preserve each: 𝑄 ∈ Stab𝑂(𝑛)(Σ) = ∏𝑖 𝑂(𝑑𝑖𝑚𝑖). This already drops the gauge from 𝑂(𝑛)(LeJEPA ’s degenerateΣ = 𝜎 2𝐼, eigenspace all of ℝ𝑛) to the within-block product — a strict, spectrum-driven reduction

[9] [9]

Up to ∏𝑖{±1}

Equivariant recovery. If we additionally demand the recovery map ℎ = 𝑓 ∘ 𝑔 be 𝐺-equivariant (true for a matched equivariant encoder on equivariant data), then 𝑄𝑧 = ℎ(𝑧)forces 𝑄 ∈ Comm(𝜌) ∶= {𝑄 ∈ 𝑂(𝑛) ∶ 𝑄𝜌(𝑔) = 𝜌(𝑔)𝑄}. Intersecting with (1), in the real type 𝑄 = ⨁𝑖 I𝑑𝑖 ⊗ 𝑄𝑖 with 𝑄𝑖 ∈ 𝑂(𝑚𝑖), i.e. the residual gauge is the orthogonal commutant ∏ 𝑖 𝑂(𝑚𝑖) (mix...

[10] [10]

recover the true DOF with their symmetry structure

𝜌(𝐺)itself is a third group (the image of the representation); the gauge is not 𝜌(𝐺)— it is 𝜌(𝐺)’s commu- tant. The honest one-liner is therefore: equivariance + block-isotropy + distinct scales reduces the gauge from 𝑂(𝑛)to the (finite, when multiplicity-free) commutant ∏𝑖 𝑂(𝑚𝑖), and in doing so identifies the 𝜌(𝐺)- module structure — i.e. recovers the t...

[11] [11]

scale-sensitive task

C2 — Equivariant latent dynamics: the world model resolves the gauge SSL leaves free Their guarantee requires the world to lie in the stationary additive-noise (OU) class, and identifies the latent only up to the static nuisance 𝑄 ∈ 𝑂(𝑛). §3 sharpened the static picture but found the per-irrep scales underdetermined in pure SSL (§7): with equal scales Σ∞ ...

[12] [12]

C3 — Planning under 𝐺-invariant (not 𝑂(𝑛)-invariant) costs Thm 5.4 needs the cost invariant under all of 𝑂(𝑛)— a hypothesis unrealistically strong in practice, since real planning costs are invariant under the world’s symmetry 𝐺, not an arbitrary latent rotation. Under an equivariant encoder whose residual identifiability is pinned to 𝜌(𝐺)(C1, distinct-sc...

[13] [13]

alignment penalises Hermite degree

A bridge already built: the degree ladder ↔ their Hermite spectral penalty Thm 5.1’s forward direction is a Hermite-degree spectral decomposition: each degree of nonlinearity strictly reduces positive-pair correlation, so the linear map wins. We built a predictor with a tunable maximum polynomial degree, 𝑑max(𝐿) = 2𝐿 (the degree-ladder predictor), and sho...

[14] [14]

block-SIGReg flat on the valid laws

Minimal experiment for C1 — built and run (laptop CPU, seeded) experiments/step39_block_sigreg.py (+ tests/test_step39_block_sigreg.py) realises C1 on a mixed- type SO(3) point-cloud latent: 𝑛0 = 4 invariant scalars ( 0e) and 𝑛1 = 6 vectors (1o), so 𝜌(𝑅) =I4 ⊕ (I6 ⊗ 𝑅)on ℝ22 — two inequivalent irreps, the minimal setting where vanilla and block-SIGReg gen...

[15] [15]

Direction 3 — compositional bi-block-SIGReg on a product symmetry 𝑆𝑂 × 𝑆𝑂(3) §7 proved block-SIGReg on a single object’s SE(3)-type structure. The open question it leaves: does a product sym- metry buy a strictly finer identifiability rung that single-object block-SIGReg cannot reach? A scene of several inter- changeable, individually-rotating objects is ...

[16] [16]

+ (24 2 ); resolving the 𝟙 ⊕ std split inside each SE(3) block reaches 184 = (2

[17] [17]

This rung does not exist for a single object : at 𝑂 = 1, ℝ1 = 𝟙 and std = 0, so there is nothing for 𝑆𝑂 to refine — it is a genuinely compositional identifiability gain

+ (18 2 ). This rung does not exist for a single object : at 𝑂 = 1, ℝ1 = 𝟙 and std = 0, so there is nothing for 𝑆𝑂 to refine — it is a genuinely compositional identifiability gain. An orthogonal Helmert change of basis 𝑈 ⊗ I𝐷obj on the object axis (row 0 = the mean = 𝟙 ; rows 1..𝑂−1 = an orthonormal basis of 𝟙⟂ = std) makes the four blocks contiguous with...

[18] [19]

Discussion: what is new, and where it sits in the program This work builds on the identifiability program of arXiv:2605.26379 and advances it on one axis. It (1) locates where the 𝑂(𝑛)indeterminacy is doing too much work, (2)provesthe symmetry-structured refinement (Schur), (3) instantiates the refined theorems with experiments already on the board (𝐺-inv...

Pith/arXiv arXiv 2026