Grokking delay under cross-entropy is mediated primarily by logit scale and resulting softmax saturation, with weight norm acting only as an upstream handle that adds 1-2% beyond the scale.
Grokking in the ising model.arXiv preprint arXiv:2510.25966,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy
Grokking delay under cross-entropy is mediated primarily by logit scale and resulting softmax saturation, with weight norm acting only as an upstream handle that adds 1-2% beyond the scale.