Memorization without overfitting: Analyzing the training dynamics of large language models.Advances in Neural Information Processing Systems, 35:38274–38290, 2022

Kushal Tirumala, Aram Markosyan, Luke Zettlemoyer, Armen Aghajanyan · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.

Data Compressibility Quantifies LLM Memorization

cs.CL · 2025-07-08 · unverdicted · novelty 5.0

Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.

citing papers explorer

Showing 2 of 2 citing papers.

Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise cs.LG · 2026-05-18 · unverdicted · none · ref 16
Experiments on modular arithmetic with heavy label noise show that over-parameterized networks form a distributed internal generalization structure that can be extracted via frequency methods to achieve high accuracy despite 80% noise.
Data Compressibility Quantifies LLM Memorization cs.CL · 2025-07-08 · unverdicted · none · ref 27
Set-level data entropy estimators show linear correlation with LLM memorization scores, forming the Entropy-Memorization Linearity.

Memorization without overfitting: Analyzing the training dynamics of large language models.Advances in Neural Information Processing Systems, 35:38274–38290, 2022

fields

years

verdicts

representative citing papers

citing papers explorer