arxiv: 2605.12394 · v2 · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

Hari K. Prakash , Charles H Martin

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords overfittingrandom matrix theorygrokkingneural networkscorrelation trapsMarchenko-Pasturanti-grokkinglarge language models

0 comments

The pith

Neural networks form correlation traps in weight spectra that signal the start of overfitting during extended grokking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a random matrix theory technique to detect overfitting in neural networks without using any train or test data. For each layer, weights are randomized element-wise and their spectrum is fit to a Marchenko-Pastur distribution, revealing outliers termed correlation traps. These traps emerge and expand during an anti-grokking phase where training accuracy stays high but test accuracy declines. The method also applies to large language models and offers a way to check if the traps harm generalization by testing with random inputs.

Core claim

The central claim is that correlation traps, identified as large outliers violating the Marchenko-Pastur law in the empirical spectral distribution of element-wise randomized weight matrices, form and grow in number and scale during the anti-grokking phase of long-horizon grokking, coinciding with decreasing test accuracy while train accuracy remains high.

What carries the argument

Correlation traps: spectral outliers from Marchenko-Pastur fitting after element-wise weight randomization that mark overfitting onset.

Load-bearing premise

Randomizing weights element-wise and fitting to Marchenko-Pastur reliably isolates signals of overfitting, with outliers corresponding directly to generalization failure.

What would settle it

A counterexample would be a training run where correlation traps form and grow but test accuracy does not decrease, or where test accuracy decreases without corresponding trap growth.

Figures

Figures reproduced from arXiv: 2605.12394 by Charles H Martin, Hari K. Prakash.

**Figure 1.** Figure 1: Grokking dynamics and Correlation Traps. Reproduction of known grokking experiments, extended to long-horizon training. Train accuracy (red), test accuracy (purple), and Correlation Traps (blue) are shown for (a) an MLP on MNIST, (b) Modular Addition (MA), and (c) GPT2. Shaded regions denote pre-grokking (gray), grokking (yellow), and anti-grokking (green). Correlation Traps arise at the onset of late-sta… view at source ↗

**Figure 2.** Figure 2: Randomized ESDs, MP fits, and emergence of Correlation Traps. (a) Example of the ESD of Xrand compared with a Marchenko–Pastur (MP) fit. In a self-averaging randomized layer, the spectrum is described by the MP bulk and no large right-edge outliers remain. (b,c) Examples of trapped randomized spectra from Layer 2 of the MLP. Right before collapse, the randomized spectrum already shows a dominant spike near… view at source ↗

**Figure 3.** Figure 3: Grokking dynamics and test loss. Train accuracy (red), test accuracy (purple), and test loss (blue) are shown for (a) an MLP on MNIST and (b) Modular Addition (MA) and (c) GPT2. Trap count separates this regime from the earlier phases. During pre-grokking, when the model has already fit the training set but has not yet generalized, the average number of detected traps is effectively zero. During grokking, … view at source ↗

**Figure 4.** Figure 4: JSD diagnostic ablation test for selected MLP, MA, and GPT2 anti-grokking checkpoints. Trap removal score Jk(T = 1) vs. delta test error for individual traps. Benign traps have Jk(T = 1) ≈ 0 while harmful traps have relatively larger Jk(T = 1) scores. 7 Broader implications for frontier-scale models To motivate the broader relevance of Correlation Traps, we applied the same shuffled-spectrum diagnostic t… view at source ↗

**Figure 5.** Figure 5: Layer-wise Correlation Trap profiles in frontier-scale open-weight LLMs. Number of detected Correlation Traps per layer for the OpenAI gpt-oss-20b and gpt-oss-120b models. Taken together, these observations suggest that Correlation Traps may provide a practical spectral diagnostic for identifying potentially harmful overfitting structures in frontier-scale foundation models. More broadly, the large number … view at source ↗

**Figure 6.** Figure 6: Evidence for norm-based prototype collapse in the anti-grokking MLP. (a) The original [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Prototype-like directions associated with trapped layers. Largest right singular vector v (1) of W1 in pixel space. The corresponding direction evolves from unstructured noise in pregrokking, to a smooth global template during grokking, to a localized prototype-like image in anti-grokking. By mapping the trap back to the FC1 layer, one can observe the overfit sector W1 [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 8.** Figure 8: Trap intervention in the anti-grokking phase. (a) Confusion matrix for the full model in anti-grokking. (b) Confusion matrix after replacing the FC1 trap direction with a matched Gaussian random vector. The strong bias toward one prototype class (i.e., 5) is substantially reduced. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Rows of W1 selected by leading-vector localization. The selected receptive fields are noise-like in pre-grokking, remain diffuse around peak generalization, and become recognizable digit templates in anti-grokking. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Weight distributions with extreme coordinates across checkpoints. Anti-grokking introduces structural outliers into the weight values of all three layers. These extreme coordinates are consistent with the shuffled-spectrum outliers counted as Correlation Traps. C MLP Perturbation Study Do Traps Simply Reflect the Scale of the Weights? As an additional robustness check, we want to eliminate the possibility… view at source ↗

read the original abstract

Training Neural Networks (NNs) without overfitting is difficult; detecting that overfitting is difficult as well. We present a novel Random Matrix Theory method that detects the onset of overfitting in deep learning models without access to train or test data. For each model layer, we randomize each weight matrix element-wise, $\mathbf{W} \to \mathbf{W}^{\mathrm{rand}}$, fit the randomized empirical spectral distribution with a Marchenko-Pastur distribution, and identify large outliers that violate self-averaging. We call these outliers Correlation Traps. During the onset of overfitting, which we call the "anti-grokking" phase in long-horizon grokking, Correlation Traps form and grow in number and scale as test accuracy decreases while train accuracy remains high. Traps may be benign or may harm generalization; we provide an empirical approach to distinguish between them by passing random data through the trained model and evaluating the JS divergence of output logits. Our findings show that anti-grokking is an additional grokking phase with high train accuracy and decreasing test accuracy, structurally distinct from pre-grokking through its Correlation Traps. More broadly, we find that some foundation-scale LLMs exhibit the same Correlation Traps, indicating potentially harmful overfitting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proposes using randomized weight spectra and Marchenko-Pastur fits to detect an anti-grokking overfitting phase via Correlation Traps, but the claims need more empirical backing to be convincing.

read the letter

The main takeaway is a data-free diagnostic for overfitting that uses element-wise weight randomization, Marchenko-Pastur fitting, and flags outliers as Correlation Traps tied to an anti-grokking phase in long-horizon training. The paper also checks for the same pattern in some LLMs. That combination and the specific anti-grokking framing look new relative to prior grokking and RMT work on networks. The procedure is laid out plainly and the suggestion to use JS divergence on random inputs to separate benign from harmful traps is a concrete step. The randomization idea itself is a straightforward way to probe for structure that persists after breaking the original weights. The soft spots are the lack of any reported numbers on detection accuracy, no baselines against simpler signals like weight norms or loss gaps, and no controlled checks showing that the outliers arise specifically from overfitting rather than scale, optimizer path, or layer shape. The correspondence between traps and generalization failure is presented as an empirical observation without the supporting plots or statistics that would make it solid. The stress-test concern holds up on the given description: there is no derivation showing why the randomization isolates an overfitting signal. This is for researchers who follow grokking dynamics or RMT tools in deep learning. A reader already working on long-horizon generalization could extract useful ideas even if the validation needs work. I would send it to peer review. The core idea has enough potential that referees can push for the missing controls and numbers.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a Random Matrix Theory method for detecting overfitting in neural networks during long-horizon grokking without access to data. The approach involves element-wise randomization of each layer's weight matrix, fitting the resulting empirical spectral distribution to the Marchenko-Pastur law, and flagging large outliers as 'Correlation Traps'. These traps are claimed to emerge and increase during the 'anti-grokking' phase, marked by sustained high training accuracy and declining test accuracy. An empirical procedure using Jensen-Shannon divergence on random inputs is proposed to differentiate benign from harmful traps, with observations extended to large language models.

Significance. If substantiated, this work could offer a practical, data-independent diagnostic for overfitting, which is particularly relevant for foundation models where test sets are limited or unavailable. It introduces the notion of anti-grokking as a distinct phase and links spectral properties to generalization failure. The use of RMT provides a theoretical grounding that could be extended, though the current lack of quantitative benchmarks limits immediate impact.

major comments (3)

[Method (abstract and procedure description)] The core procedure (randomization followed by Marchenko-Pastur fitting) assumes that spectral outliers specifically isolate overfitting signals, but no derivation or controlled experiment demonstrates why these outliers arise from overfitting rather than weight scale, optimization trajectory, or layer geometry (see skeptic note on weakest assumption and abstract description of the method).
[Empirical results and validation sections] The manuscript provides no quantitative validation, error analysis, baseline comparisons, or statistical details on how traps are distinguished from benign outliers; claims rest on unspecified empirical observations (reader's soundness assessment).
[Anti-grokking phase analysis] The claim that Correlation Traps form and grow specifically during anti-grokking (high train acc, decreasing test acc) lacks ablation studies or falsifiable tests showing the method's specificity to generalization failure versus other training artifacts.

minor comments (2)

[Notation and preliminaries] Notation for the randomized weight matrix (W to W^rand) should be explicitly defined once in the main text for consistency.
[Introduction and related work] Additional citations to prior RMT applications in deep learning and existing grokking literature would improve context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript requires additional validation, clarifications, and experiments to substantiate the claims. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Method (abstract and procedure description)] The core procedure (randomization followed by Marchenko-Pastur fitting) assumes that spectral outliers specifically isolate overfitting signals, but no derivation or controlled experiment demonstrates why these outliers arise from overfitting rather than weight scale, optimization trajectory, or layer geometry (see skeptic note on weakest assumption and abstract description of the method).

Authors: We acknowledge that the current version lacks a formal derivation showing why outliers isolate overfitting signals rather than other factors such as scale or geometry. The randomization is intended to eliminate learned correlations while preserving the empirical spectral distribution's scale, with outliers indicating deviations from random-matrix behavior induced by training. In revision we will add controlled experiments on small MLPs trained on synthetic data, varying training-set size to induce controlled overfitting, and demonstrate that trap count and magnitude increase specifically with the generalization gap. We will also revise the abstract and method section to explicitly state the assumptions and limitations of the randomization step. revision: partial
Referee: [Empirical results and validation sections] The manuscript provides no quantitative validation, error analysis, baseline comparisons, or statistical details on how traps are distinguished from benign outliers; claims rest on unspecified empirical observations (reader's soundness assessment).

Authors: This criticism is correct. The revised manuscript will include quantitative validation: error bars from five independent random seeds, a table reporting trap count, maximum outlier deviation, and Jensen-Shannon divergence values across phases, baseline comparisons against randomly initialized networks and networks trained with weight decay or early stopping, and statistical tests (t-tests and p-values) on the increase in traps during anti-grokking. We will also provide the exact procedure and sensitivity analysis for the JS-divergence threshold used to label traps as benign or harmful. revision: yes
Referee: [Anti-grokking phase analysis] The claim that Correlation Traps form and grow specifically during anti-grokking (high train acc, decreasing test acc) lacks ablation studies or falsifiable tests showing the method's specificity to generalization failure versus other training artifacts.

Authors: We agree that specificity has not been demonstrated. In the revision we will add ablation studies: (i) training runs on the same architecture and data that avoid the anti-grokking phase via stronger regularization or different learning-rate schedules, showing that traps remain stable; (ii) application of the method to standard CIFAR-10 training without grokking dynamics, confirming that traps appear whenever test accuracy declines. These experiments will provide falsifiable evidence that trap growth is tied to generalization failure rather than other training artifacts. revision: partial

Circularity Check

0 steps flagged

No significant circularity: detection method uses standard MP fitting on randomized weights with empirical observation of traps during overfitting phases

full rationale

The paper defines Correlation Traps explicitly as spectral outliers in the empirical spectral distribution of element-wise randomized weight matrices after Marchenko-Pastur fitting. This definition is independent of any overfitting signal or train/test accuracy. The association between trap formation/growth and the anti-grokking phase (high train accuracy with falling test accuracy) is presented as an empirical finding from training trajectories, not derived by construction from the inputs or via a self-citation chain that assumes the result. The randomization step and MP fitting are standard RMT tools applied without data access, and no load-bearing step reduces the central claim to a fitted parameter renamed as a prediction or to an ansatz imported from prior self-work. The procedure remains self-contained against external benchmarks such as the Marchenko-Pastur law.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method assumes standard applicability of Marchenko-Pastur to randomized neural weight matrices and introduces Correlation Traps as a new interpretive entity without external falsification in the provided abstract.

axioms (1)

domain assumption Marchenko-Pastur distribution describes the eigenvalue spectrum of randomized neural network weight matrices
Invoked directly in the procedure for fitting and outlier detection.

invented entities (1)

Correlation Traps no independent evidence
purpose: Spectral outliers indicating the onset of overfitting
Newly defined based on deviations from Marchenko-Pastur in randomized weights.

pith-pipeline@v0.9.0 · 5521 in / 1321 out tokens · 32041 ms · 2026-05-15T05:28:14.791371+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For each model layer, we randomize each weight matrix element-wise, W→W^rand, fit the randomized empirical spectral distribution with a Marchenko-Pastur distribution, and identify large outliers that violate self-averaging. We call these outliers Correlation Traps.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

During the onset of overfitting, which we call the 'anti-grokking' phase in long-horizon grokking, Correlation Traps form and grow in number and scale as test accuracy decreases while train accuracy remains high.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

Amit, Hanoch Gutfreund, and Haim Sompolinsky

Daniel J. Amit, Hanoch Gutfreund, and Haim Sompolinsky. Spin-glass models of neural networks.Physical Review A, 32(2):1007–1018, 1985. doi: 10.1103/PhysRevA.32.1007

work page doi:10.1103/physreva.32.1007 1985
[2]

Anderson

Philip W. Anderson. Absence of diffusion in certain random lattices.Physical Review, 109(5): 1492–1505, 1958. doi: 10.1103/PhysRev.109.1492

work page doi:10.1103/physrev.109.1492 1958
[3]

Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

Jinho Baik, Gérard Ben Arous, and Sandrine Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

work page 2005
[4]

Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998

Siegfried Bös. Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998. doi: 10.1103/PhysRevE.58.833

work page doi:10.1103/physreve.58.833 1998
[5]

Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924

Satyendra Nath Bose. Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924. doi: 10.1007/BF01327326

work page doi:10.1007/bf01327326 1924
[6]

Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

Albert Einstein. Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

work page 1925
[7]

E. Gardner. The space of interactions in neural network models.Journal of Physics A: Mathematical and General, 21(1):257–270, 1988. doi: 10.1088/0305-4470/21/1/030

work page doi:10.1088/0305-4470/21/1/030 1988
[8]

Hopfield

John J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982. doi: 10.1073/pnas.79.8.2554

work page doi:10.1073/pnas.79.8.2554 1982
[9]

Grokfast: Accelerated grokking by amplifying slow gradients, 2024

Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee. Grokfast: Accelerated grokking by amplifying slow gradients, 2024. URL https://arxiv.org/abs/2405.20233

work page arXiv 2024
[10]

Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

Jiping Li and Rishi Sonthalia. Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

work page arXiv 2025
[11]

Nolte, Eric J

Ziming Liu, Ouail Kitouni, Niklas S. Nolte, Eric J. Michaud, Max Tegmark, and Mike Williams. Towards understanding grokking: An effective theory of representation learning. InAdvances in Neural Information Processing Systems, volume 35, pages 34651–34663. Curran Associates, Inc., 2022. URLhttps://arxiv.org/abs/2205.10343

work page arXiv 2022
[12]

Marchenko and Leonid Andreevich Pastur

Vladimir A. Marchenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices.Matematicheskii Sbornik, 72(114)(4):507–536, 1967

work page 1967
[13]

Charles H. Martin. WeightWatcher: Analyze Deep Learning Models without Training or Data. https://github.com/CalculatedContent/WeightWatcher, 2018-2024. Version 0.7.5.5 used in this study. Accessed May 12, 2025

work page 2018
[14]

Martin and Christopher Hinrichs

Charles H. Martin and Christopher Hinrichs. SETOL: A semi-empirical theory of (deep) learning.arXiv preprint arXiv:2507.17912, 2025. URL https://arxiv.org/abs/2507. 17912

work page arXiv 2025
[15]

Martin and Michael W

Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning.Journal of Machine Learning Research, 22(1):165, January 2021. URL http://jmlr.org/papers/ v22/20-410.html. 10

work page 2021
[16]

Martin, Tian Peng, and Michael W

Charles H. Martin, Tian Peng, and Michael W. Mahoney. Predicting trends in the quality of state- of-the-art neural networks without access to training or testing data.Nature Communications, 12:4122, jul 2021. doi: 10.1038/s41467-021-24025-8. URL https://doi.org/10.1038/ s41467-021-24025-8

work page doi:10.1038/s41467-021-24025-8 2021
[17]

Progress measures for grokking via mechanistic interpretability

Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability, 2023. URL https://arxiv.org/ abs/2301.05217

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Introducing gpt-oss

OpenAI. Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/, Au- gust 2025. Accessed: 2026-03-28

work page 2025
[19]

gpt-oss-120b & gpt-oss-20b model card

OpenAI. gpt-oss-120b & gpt-oss-20b model card. https://openai.com/index/ gpt-oss-model-card/, August 2025. Accessed: 2026-03-28

work page 2025
[20]

C. E. Porter and R. G. Thomas. Fluctuations of nuclear reaction widths.Physical Review, 104 (2):483–491, 1956

work page 1956
[21]

Cambridge University Press, 2021

Marc Potters and Jean-Philippe Bouchaud.A First Course in Random Matrix Theory: F or Physi- cists, Engineers and Data Scientists. Cambridge University Press, 2021. ISBN 9781108768900

work page 2021
[22]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets, 2022. URL https://arxiv. org/abs/2201.02177

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Sebastian Seung, Haim Sompolinsky, and Naftali Tishby

H. Sebastian Seung, Haim Sompolinsky, and Naftali Tishby. Statistical mechanics of learning from examples.Physical Review A, 45(8):6056–6091, 1992. doi: 10.1103/PhysRevA.45.6056

work page doi:10.1103/physreva.45.6056 1992
[24]

Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization

Boshi Wang, Xiang Yue, Yu Su, and Huan Sun. Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://arxiv.org/pdf/2405.15071

work page arXiv 2024
[25]

train” curve for inferred facts test_inferredID No Main “test

Pierre Weiss. L’hypothèse du champ moléculaire et la propriété ferromagnétique.Jour- nal de Physique Théorique et Appliquée, 6(1):661–690, 1907. doi: 10.1051/jphystap: 019070060066100. 11 A Experimental Setup and Additional Notes: MLP, MA, and GPT2 A.1 MLP Experimental Setup We train a Multi-Layer Perceptron (MLP) on a subset of the MNIST dataset using th...

work page doi:10.1051/jphystap: 1907