arxiv: 2604.06256 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Spectral Edge Dynamics Reveal Functional Modes of Learning

Yongzhong Xu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords spectral edgegrokkingtraining dynamicsfunctional modesmechanistic interpretabilityalgebraic symmetrymodular arithmeticupdate covariance

0 comments

The pith

Dominant training update directions form a spectral edge that induces low-dimensional functional modes over inputs, adapted to each task's algebraic symmetry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that grokking occurs when training dynamics concentrate along a small number of leading update directions, termed the spectral edge, and that this concentration separates grokking from non-grokking regimes. These directions are invisible to conventional mechanistic tools because they are not confined to individual parameters, neurons, or local features; instead each direction corresponds to a coherent function defined across the whole input domain. For modular addition the edge reduces to one Fourier mode, for multiplication it aligns with the discrete-log basis, and for composite tasks such as x squared plus y squared it combines additive and multiplicative cross-terms. The structure of the edge therefore tracks the underlying algebraic symmetry of the problem, suggesting that learning discovers compact, symmetry-adapted functional subspaces rather than arbitrary high-dimensional representations. If correct, this reframes interpretability as the task of recovering these global input functions from the dynamics rather than from static weights or activations.

Core claim

Training dynamics during grokking concentrate along a small number of dominant update directions -- the spectral edge -- which reliably distinguishes grokking from non-grokking regimes. Standard mechanistic interpretability tools fail to capture these directions because their structure is not localized in parameter or feature space. Instead, each direction induces a structured function over the input domain, revealing low-dimensional functional modes invisible to representation-level analysis. For modular addition all leading directions collapse to a single Fourier mode; for multiplication the same collapse appears only in the discrete-log basis; for subtraction the edge spans a small multi-

What carries the argument

The spectral edge: the leading eigenvectors of the covariance of parameter updates that each induce a low-dimensional structured function across the entire input domain.

Load-bearing premise

The dominant update directions are not localized in parameter or feature space but instead induce structured functions over the input domain that standard mechanistic interpretability tools fail to capture.

What would settle it

Extracting the top eigenvectors of the update covariance on a modular-addition grokking run and verifying that they do not each correspond to a single Fourier mode over the inputs, or that activation probing and sparse autoencoders recover them as localized features, would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.06256 by Yongzhong Xu.

read the original abstract

Training dynamics during grokking concentrate along a small number of dominant update directions -- the spectral edge -- which reliably distinguishes grokking from non-grokking regimes. We show that standard mechanistic interpretability tools (head attribution, activation probing, sparse autoencoders) fail to capture these directions: their structure is not localized in parameter or feature space. Instead, each direction induces a structured function over the input domain, revealing low-dimensional functional modes invisible to representation-level analysis. For modular addition, all leading directions collapse to a single Fourier mode. For multiplication, the same collapse appears only in the discrete-log basis, yielding a 5.9x improvement in concentration. For subtraction, the edge spans a small multi-mode family. For $x^2+y^2$, no single harmonic basis suffices, but cross-terms of additive and multiplicative features provide a 4x variance boost, consistent with the decomposition (a+b)^2 - 2ab. Multitask training amplifies this compositional structure, with the $x^2+y^2$ spectral edge inheriting the addition circuit's characteristic frequency (2.3x concentration increase). These results suggest that training discovers low-dimensional functional modes over the input domain, whose structure depends on the algebraic symmetry of the task. These results suggest that spectral edge dynamics identify low-dimensional functional subspaces governing learning, whose representation depends on the algebraic structure of the task. Simple harmonic structure emerges only when the task admits a symmetry-adapted basis; more complex tasks require richer functional descriptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps dominant grokking update directions to task-specific bases like Fourier or discrete-log but offers only correlational alignments without causal tests.

read the letter

The main observation is that during grokking the leading update directions collapse onto simple functional bases that fit the task algebra. Addition picks a single Fourier mode, multiplication concentrates 5.9 times better in the discrete-log basis, subtraction uses a small multi-mode set, and the quadratic task gains from cross terms between additive and multiplicative features. Multitask training makes the quadratic edge inherit the addition frequency. These patterns are reported to be invisible to standard tools because the structure is non-local in parameter or activation space. That is the concrete new piece: a direct link from update dynamics to input-domain functions that depends on task symmetry. The work does a clean job laying out the alignments across four tasks and the multitask case, and it notes that the edge separates grokking from non-grokking regimes. The numbers give a sense that training is landing on low-dimensional functional subspaces rather than scattered features. The soft spots are straightforward. The evidence stays at the level of variance explained and basis projection; there is no intervention that shows blocking the spectral edge prevents the generalization jump or that steering along it produces it. Without that, the directions could be a byproduct once the solution is already found. The abstract also gives no error bars, no extraction details, and no controls for alternative bases or fitting choices, so the reported concentrations are hard to judge for robustness. This is worth reading for people who study grokking and mechanistic interpretability and want to think about dynamics over representations. A reader who cares about symmetry and phase transitions could pick up useful patterns to test further. I would send it to peer review. The idea is specific enough and the patterns consistent enough across tasks that referees could usefully push on the causal gap and the missing statistics.

Referee Report

2 major / 2 minor

Summary. The paper claims that during grokking, neural network training dynamics concentrate along a small number of dominant update directions termed the 'spectral edge.' These directions reliably distinguish grokking from non-grokking regimes across modular addition, multiplication, subtraction, and x²+y² tasks (plus a multitask setting). The directions induce structured functions over the input domain aligned with algebraic symmetries (e.g., Fourier modes for addition, discrete-log basis for multiplication with 5.9x concentration, cross-terms for x²+y² with 4x boost), but are invisible to standard mechanistic interpretability tools because their structure is non-localized in parameter or feature space. The results suggest training discovers low-dimensional functional modes whose representation depends on task symmetry.

Significance. If the central observations hold, the work offers a dynamics-focused complement to representation-level interpretability, highlighting how update directions can reveal symmetry-adapted functional subspaces that govern generalization in grokking. The reported alignments (Fourier collapse, discrete-log concentration, compositional cross-terms, multitask amplification) provide concrete, task-specific evidence that could inform theories of how networks exploit algebraic structure. The emphasis on non-localized structure is a useful caution against over-reliance on localized probes.

major comments (2)

[Abstract] Abstract and main results: The claim that spectral edge directions 'reveal functional modes of learning' and 'govern' the grokking transition rests on correlational alignments (projections of update vectors onto candidate bases yielding variance explained). No interventional evidence is presented, such as constraining optimization to the orthogonal complement of the spectral edge or ablating updates along these directions to test necessity/sufficiency for the phase transition. Without such tests, the directions could be high-variance consequences rather than causal drivers.
[Abstract] Abstract: No quantitative details are supplied on spectral edge extraction (e.g., eigenvalue threshold, number of leading directions retained, or how 'dominant' is defined), error bars across runs, data exclusion criteria, or statistical controls for alternative explanations such as random high-variance directions or post-grokking stabilization.

minor comments (2)

[Abstract] Abstract: The final two sentences are near-duplicates with minor rephrasing; consolidate into a single concluding statement.
[Abstract] Abstract: The 5.9x and 4x concentration factors are reported without specifying the baseline (e.g., random basis, full space, or alternative harmonic expansions), making the magnitude hard to interpret.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the manuscript would benefit from more precise language on the nature of our evidence and from expanded methodological details. We respond to each major comment below and outline the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract and main results: The claim that spectral edge directions 'reveal functional modes of learning' and 'govern' the grokking transition rests on correlational alignments (projections of update vectors onto candidate bases yielding variance explained). No interventional evidence is presented, such as constraining optimization to the orthogonal complement of the spectral edge or ablating updates along these directions to test necessity/sufficiency for the phase transition. Without such tests, the directions could be high-variance consequences rather than causal drivers.

Authors: We agree that the evidence is correlational, consisting of observed alignments between spectral edge directions and task-specific bases (Fourier for addition, discrete-log for multiplication, cross-terms for x²+y²) together with their ability to distinguish grokking from non-grokking regimes. The current work does not contain interventional experiments that would establish necessity or sufficiency. In the revised manuscript we will replace the phrasing 'govern the grokking transition' and 'governing learning' with more accurate language emphasizing that the directions 'reveal' low-dimensional functional modes via their structured projections. We will also add a short limitations paragraph noting the correlational character of the results and outlining possible future interventional tests. revision: partial
Referee: [Abstract] Abstract: No quantitative details are supplied on spectral edge extraction (e.g., eigenvalue threshold, number of leading directions retained, or how 'dominant' is defined), error bars across runs, data exclusion criteria, or statistical controls for alternative explanations such as random high-variance directions or post-grokking stabilization.

Authors: We will revise the Methods section to supply the requested quantitative details: the precise eigenvalue threshold and criterion for retaining leading directions, the definition of 'dominant' used throughout the study, standard deviations across independent runs for all reported concentration factors, explicit data exclusion criteria, and additional controls that compare the observed alignments against random directions drawn from the same update covariance as well as against post-grokking update directions. revision: yes

Circularity Check

0 steps flagged

No circularity: spectral edge is an observed empirical property, not a self-referential construct

full rationale

The paper defines the spectral edge directly from the leading eigenvectors of the training update covariance matrix and then reports measured alignments (e.g., collapse to Fourier modes for modular addition) via projection onto candidate bases. This is a straightforward observational pipeline with no equations that equate the claimed functional modes to the definition of the edge itself, no fitted parameters renamed as predictions, and no load-bearing self-citations. The central claim remains an empirical finding about concentration of variance rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are stated. The spectral edge concept and the claim of task-dependent functional collapse may implicitly rely on definitions or normalizations from prior spectral analysis work, but this cannot be verified without the full text.

pith-pipeline@v0.9.0 · 5568 in / 1235 out tokens · 55401 ms · 2026-05-10T19:04:48.036432+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Training dynamics during grokking concentrate along a small number of dominant update directions—the spectral edge—which reliably distinguishes grokking from non-grokking regimes... each direction induces a structured function over the input domain
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For modular addition, all leading directions collapse to a single Fourier mode... For multiplication, the same collapse appears only in the discrete-log basis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 5 canonical work pages · 4 internal anchors

[1]

Bricken et al

T. Bricken et al. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023

2023
[2]

Chughtai, L

B. Chughtai, L. Chan, and N. Nanda. A toy model of universality: Reverse engineering how networks learn group operations. In ICML, 2023

2023
[3]

Cunningham, A

H. Cunningham, A. Ewart, L. Riggs, R. Huben, and L. Sharkey. Sparse autoencoders find highly interpretable features in language models. In ICLR, 2024

2024
[4]

Elhage et al

N. Elhage et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021

2021
[5]

Elhage et al

N. Elhage et al. Toy models of superposition. Transformer Circuits Thread, 2022

2022
[6]

C. Olah, N. Elhage, T. Henighan, and others. A toy model of interference weights. Transformer Circuits Thread, 2025. https://transformer-circuits.pub/2025/interference-weights/index.html

2025
[7]

C. Li, H. Farkhoor, R. Liu, and J. Yosinski. Measuring the intrinsic dimension of objective landscapes. In ICLR, 2018

2018
[8]

Nanda, L

N. Nanda, L. Chan, T. Lieberum, J. Smith, and J. Steinhardt. Progress measures for grokking via mechanistic interpretability. In ICLR, 2023

2023
[9]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

A. Power, Y. Burda, H. Edwards, I. Babuschkin, and V. Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv:2201.02177, 2022

work page internal anchor Pith review arXiv 2022
[10]

Stander, Q

D. Stander, Q. Yu, H. Fan, and E. Vonosek. Grokking group multiplication with cosets. arXiv:2312.06581, 2024

work page arXiv 2024
[11]

Y. Xu. Low-dimensional and transversely curved optimization dynamics in grokking. arXiv:2602.16746, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Y. Xu. The geometry of multi-task grokking: Transverse instability, superposition, and weight decay phase structure. arXiv:2602.18523, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Y. Xu. The spectral edge thesis: A mathematical framework for intra-signal phase transitions in neural network training. arXiv:2603.28964, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

Zhong, Z

Z. Zhong, Z. Liu, M. Tegmark, and J. Andreas. The clock and the pizza: Two stories in mechanistic explanation of neural networks. In NeurIPS, 2024

2024