Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
Memorization without overfitting: Analyzing the training dynamics of large language models.Advances in Neural Information Processing Systems, 35:38274–38290, 2022
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.
citing papers explorer
-
Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise
Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
-
Data Compressibility Quantifies LLM Memorization
Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.