Grokking emerges near the model size where memorization timescale T_mem(P) intersects generalization timescale T_gen(P) on modular arithmetic.
International Conference on Learning Representations (ICLR) , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
Machine learning research should prioritize ideas by testing their predicted behavioral signatures in modern models through custom experiments instead of leaderboard chasing or abstract theorems.
citing papers explorer
-
Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
Grokking emerges near the model size where memorization timescale T_mem(P) intersects generalization timescale T_gen(P) on modular arithmetic.
-
Position: Ideas Should be the Center of Machine Learning Research
Machine learning research should prioritize ideas by testing their predicted behavioral signatures in modern models through custom experiments instead of leaderboard chasing or abstract theorems.