Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4representative citing papers
Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
Gradient-based SVD diagnostic uncovers hidden SED-LCH coupling in single and multitask settings and shows rank-3 subspace constraints speed up grokking by 2.3x.
Empirical tests confirm robust feature repulsion signs but reveal activation-dependent spectral lock-in in grokking, with x^2 yielding rank-2 updates at epoch ~174 and ReLU remaining rank-1.
citing papers explorer
-
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
-
Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise
Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
-
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories
Gradient-based SVD diagnostic uncovers hidden SED-LCH coupling in single and multitask settings and shows rank-3 subspace constraints speed up grokking by 2.3x.
-
Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking
Empirical tests confirm robust feature repulsion signs but reveal activation-dependent spectral lock-in in grokking, with x^2 yielding rank-2 updates at epoch ~174 and ReLU remaining rank-1.