Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
Progress measures for grokking via mechanistic interpretability
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1