pith. machine review for the scientific record. sign in

arxiv: 2605.12394 · v2 · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords overfittingrandom matrix theorygrokkingneural networkscorrelation trapsMarchenko-Pasturanti-grokkinglarge language models
0
0 comments X

The pith

Neural networks form correlation traps in weight spectra that signal the start of overfitting during extended grokking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a random matrix theory technique to detect overfitting in neural networks without using any train or test data. For each layer, weights are randomized element-wise and their spectrum is fit to a Marchenko-Pastur distribution, revealing outliers termed correlation traps. These traps emerge and expand during an anti-grokking phase where training accuracy stays high but test accuracy declines. The method also applies to large language models and offers a way to check if the traps harm generalization by testing with random inputs.

Core claim

The central claim is that correlation traps, identified as large outliers violating the Marchenko-Pastur law in the empirical spectral distribution of element-wise randomized weight matrices, form and grow in number and scale during the anti-grokking phase of long-horizon grokking, coinciding with decreasing test accuracy while train accuracy remains high.

What carries the argument

Correlation traps: spectral outliers from Marchenko-Pastur fitting after element-wise weight randomization that mark overfitting onset.

Load-bearing premise

Randomizing weights element-wise and fitting to Marchenko-Pastur reliably isolates signals of overfitting, with outliers corresponding directly to generalization failure.

What would settle it

A counterexample would be a training run where correlation traps form and grow but test accuracy does not decrease, or where test accuracy decreases without corresponding trap growth.

Figures

Figures reproduced from arXiv: 2605.12394 by Charles H Martin, Hari K. Prakash.

Figure 1
Figure 1. Figure 1: Grokking dynamics and Correlation Traps. Reproduction of known grokking experi￾ments, extended to long-horizon training. Train accuracy (red), test accuracy (purple), and Correlation Traps (blue) are shown for (a) an MLP on MNIST, (b) Modular Addition (MA), and (c) GPT2. Shaded regions denote pre-grokking (gray), grokking (yellow), and anti-grokking (green). Correlation Traps arise at the onset of late-sta… view at source ↗
Figure 2
Figure 2. Figure 2: Randomized ESDs, MP fits, and emergence of Correlation Traps. (a) Example of the ESD of Xrand compared with a Marchenko–Pastur (MP) fit. In a self-averaging randomized layer, the spectrum is described by the MP bulk and no large right-edge outliers remain. (b,c) Examples of trapped randomized spectra from Layer 2 of the MLP. Right before collapse, the randomized spectrum already shows a dominant spike near… view at source ↗
Figure 3
Figure 3. Figure 3: Grokking dynamics and test loss. Train accuracy (red), test accuracy (purple), and test loss (blue) are shown for (a) an MLP on MNIST and (b) Modular Addition (MA) and (c) GPT2. Trap count separates this regime from the earlier phases. During pre-grokking, when the model has already fit the training set but has not yet generalized, the average number of detected traps is effectively zero. During grokking, … view at source ↗
Figure 4
Figure 4. Figure 4: JSD diagnostic ablation test for selected MLP, MA, and GPT2 anti-grokking check￾points. Trap removal score Jk(T = 1) vs. delta test error for individual traps. Benign traps have Jk(T = 1) ≈ 0 while harmful traps have relatively larger Jk(T = 1) scores. 7 Broader implications for frontier-scale models To motivate the broader relevance of Correlation Traps, we applied the same shuffled-spectrum diag￾nostic t… view at source ↗
Figure 5
Figure 5. Figure 5: Layer-wise Correlation Trap profiles in frontier-scale open-weight LLMs. Number of detected Correlation Traps per layer for the OpenAI gpt-oss-20b and gpt-oss-120b models. Taken together, these observations suggest that Correlation Traps may provide a practical spectral diagnostic for identifying potentially harmful overfitting structures in frontier-scale foundation models. More broadly, the large number … view at source ↗
Figure 6
Figure 6. Figure 6: Evidence for norm-based prototype collapse in the anti-grokking MLP. (a) The original [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prototype-like directions associated with trapped layers. Largest right singular vector v (1) of W1 in pixel space. The corresponding direction evolves from unstructured noise in pre￾grokking, to a smooth global template during grokking, to a localized prototype-like image in anti-grokking. By mapping the trap back to the FC1 layer, one can observe the overfit sector W1 [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 8
Figure 8. Figure 8: Trap intervention in the anti-grokking phase. (a) Confusion matrix for the full model in anti-grokking. (b) Confusion matrix after replacing the FC1 trap direction with a matched Gaussian random vector. The strong bias toward one prototype class (i.e., 5) is substantially reduced. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Rows of W1 selected by leading-vector localization. The selected receptive fields are noise-like in pre-grokking, remain diffuse around peak generalization, and become recognizable digit templates in anti-grokking. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Weight distributions with extreme coordinates across checkpoints. Anti-grokking introduces structural outliers into the weight values of all three layers. These extreme coordinates are consistent with the shuffled-spectrum outliers counted as Correlation Traps. C MLP Perturbation Study Do Traps Simply Reflect the Scale of the Weights? As an additional robustness check, we want to eliminate the possibility… view at source ↗
read the original abstract

Training Neural Networks (NNs) without overfitting is difficult; detecting that overfitting is difficult as well. We present a novel Random Matrix Theory method that detects the onset of overfitting in deep learning models without access to train or test data. For each model layer, we randomize each weight matrix element-wise, $\mathbf{W} \to \mathbf{W}^{\mathrm{rand}}$, fit the randomized empirical spectral distribution with a Marchenko-Pastur distribution, and identify large outliers that violate self-averaging. We call these outliers Correlation Traps. During the onset of overfitting, which we call the "anti-grokking" phase in long-horizon grokking, Correlation Traps form and grow in number and scale as test accuracy decreases while train accuracy remains high. Traps may be benign or may harm generalization; we provide an empirical approach to distinguish between them by passing random data through the trained model and evaluating the JS divergence of output logits. Our findings show that anti-grokking is an additional grokking phase with high train accuracy and decreasing test accuracy, structurally distinct from pre-grokking through its Correlation Traps. More broadly, we find that some foundation-scale LLMs exhibit the same Correlation Traps, indicating potentially harmful overfitting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a Random Matrix Theory method for detecting overfitting in neural networks during long-horizon grokking without access to data. The approach involves element-wise randomization of each layer's weight matrix, fitting the resulting empirical spectral distribution to the Marchenko-Pastur law, and flagging large outliers as 'Correlation Traps'. These traps are claimed to emerge and increase during the 'anti-grokking' phase, marked by sustained high training accuracy and declining test accuracy. An empirical procedure using Jensen-Shannon divergence on random inputs is proposed to differentiate benign from harmful traps, with observations extended to large language models.

Significance. If substantiated, this work could offer a practical, data-independent diagnostic for overfitting, which is particularly relevant for foundation models where test sets are limited or unavailable. It introduces the notion of anti-grokking as a distinct phase and links spectral properties to generalization failure. The use of RMT provides a theoretical grounding that could be extended, though the current lack of quantitative benchmarks limits immediate impact.

major comments (3)
  1. [Method (abstract and procedure description)] The core procedure (randomization followed by Marchenko-Pastur fitting) assumes that spectral outliers specifically isolate overfitting signals, but no derivation or controlled experiment demonstrates why these outliers arise from overfitting rather than weight scale, optimization trajectory, or layer geometry (see skeptic note on weakest assumption and abstract description of the method).
  2. [Empirical results and validation sections] The manuscript provides no quantitative validation, error analysis, baseline comparisons, or statistical details on how traps are distinguished from benign outliers; claims rest on unspecified empirical observations (reader's soundness assessment).
  3. [Anti-grokking phase analysis] The claim that Correlation Traps form and grow specifically during anti-grokking (high train acc, decreasing test acc) lacks ablation studies or falsifiable tests showing the method's specificity to generalization failure versus other training artifacts.
minor comments (2)
  1. [Notation and preliminaries] Notation for the randomized weight matrix (W to W^rand) should be explicitly defined once in the main text for consistency.
  2. [Introduction and related work] Additional citations to prior RMT applications in deep learning and existing grokking literature would improve context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript requires additional validation, clarifications, and experiments to substantiate the claims. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Method (abstract and procedure description)] The core procedure (randomization followed by Marchenko-Pastur fitting) assumes that spectral outliers specifically isolate overfitting signals, but no derivation or controlled experiment demonstrates why these outliers arise from overfitting rather than weight scale, optimization trajectory, or layer geometry (see skeptic note on weakest assumption and abstract description of the method).

    Authors: We acknowledge that the current version lacks a formal derivation showing why outliers isolate overfitting signals rather than other factors such as scale or geometry. The randomization is intended to eliminate learned correlations while preserving the empirical spectral distribution's scale, with outliers indicating deviations from random-matrix behavior induced by training. In revision we will add controlled experiments on small MLPs trained on synthetic data, varying training-set size to induce controlled overfitting, and demonstrate that trap count and magnitude increase specifically with the generalization gap. We will also revise the abstract and method section to explicitly state the assumptions and limitations of the randomization step. revision: partial

  2. Referee: [Empirical results and validation sections] The manuscript provides no quantitative validation, error analysis, baseline comparisons, or statistical details on how traps are distinguished from benign outliers; claims rest on unspecified empirical observations (reader's soundness assessment).

    Authors: This criticism is correct. The revised manuscript will include quantitative validation: error bars from five independent random seeds, a table reporting trap count, maximum outlier deviation, and Jensen-Shannon divergence values across phases, baseline comparisons against randomly initialized networks and networks trained with weight decay or early stopping, and statistical tests (t-tests and p-values) on the increase in traps during anti-grokking. We will also provide the exact procedure and sensitivity analysis for the JS-divergence threshold used to label traps as benign or harmful. revision: yes

  3. Referee: [Anti-grokking phase analysis] The claim that Correlation Traps form and grow specifically during anti-grokking (high train acc, decreasing test acc) lacks ablation studies or falsifiable tests showing the method's specificity to generalization failure versus other training artifacts.

    Authors: We agree that specificity has not been demonstrated. In the revision we will add ablation studies: (i) training runs on the same architecture and data that avoid the anti-grokking phase via stronger regularization or different learning-rate schedules, showing that traps remain stable; (ii) application of the method to standard CIFAR-10 training without grokking dynamics, confirming that traps appear whenever test accuracy declines. These experiments will provide falsifiable evidence that trap growth is tied to generalization failure rather than other training artifacts. revision: partial

Circularity Check

0 steps flagged

No significant circularity: detection method uses standard MP fitting on randomized weights with empirical observation of traps during overfitting phases

full rationale

The paper defines Correlation Traps explicitly as spectral outliers in the empirical spectral distribution of element-wise randomized weight matrices after Marchenko-Pastur fitting. This definition is independent of any overfitting signal or train/test accuracy. The association between trap formation/growth and the anti-grokking phase (high train accuracy with falling test accuracy) is presented as an empirical finding from training trajectories, not derived by construction from the inputs or via a self-citation chain that assumes the result. The randomization step and MP fitting are standard RMT tools applied without data access, and no load-bearing step reduces the central claim to a fitted parameter renamed as a prediction or to an ansatz imported from prior self-work. The procedure remains self-contained against external benchmarks such as the Marchenko-Pastur law.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method assumes standard applicability of Marchenko-Pastur to randomized neural weight matrices and introduces Correlation Traps as a new interpretive entity without external falsification in the provided abstract.

axioms (1)
  • domain assumption Marchenko-Pastur distribution describes the eigenvalue spectrum of randomized neural network weight matrices
    Invoked directly in the procedure for fitting and outlier detection.
invented entities (1)
  • Correlation Traps no independent evidence
    purpose: Spectral outliers indicating the onset of overfitting
    Newly defined based on deviations from Marchenko-Pastur in randomized weights.

pith-pipeline@v0.9.0 · 5521 in / 1321 out tokens · 32041 ms · 2026-05-15T05:28:14.791371+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

  1. [1]

    Amit, Hanoch Gutfreund, and Haim Sompolinsky

    Daniel J. Amit, Hanoch Gutfreund, and Haim Sompolinsky. Spin-glass models of neural networks.Physical Review A, 32(2):1007–1018, 1985. doi: 10.1103/PhysRevA.32.1007

  2. [2]

    Anderson

    Philip W. Anderson. Absence of diffusion in certain random lattices.Physical Review, 109(5): 1492–1505, 1958. doi: 10.1103/PhysRev.109.1492

  3. [3]

    Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

    Jinho Baik, Gérard Ben Arous, and Sandrine Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.The Annals of Probability, 33(5):1643–1697, 2005

  4. [4]

    Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998

    Siegfried Bös. Statistical mechanics approach to early stopping and weight decay.Physical Review E, 58(1):833–847, 1998. doi: 10.1103/PhysRevE.58.833

  5. [5]

    Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924

    Satyendra Nath Bose. Plancks Gesetz und Lichtquantenhypothese.Zeitschrift für Physik, 26(1): 178–181, 1924. doi: 10.1007/BF01327326

  6. [6]

    Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

    Albert Einstein. Quantentheorie des einatomigen idealen Gases.Sitzungsberichte der Preussis- chen Akademie der Wissenschaften, pages 3–14, 1925

  7. [7]

    E. Gardner. The space of interactions in neural network models.Journal of Physics A: Mathematical and General, 21(1):257–270, 1988. doi: 10.1088/0305-4470/21/1/030

  8. [8]

    Hopfield

    John J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982. doi: 10.1073/pnas.79.8.2554

  9. [9]

    Grokfast: Accelerated grokking by amplifying slow gradients, 2024

    Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee. Grokfast: Accelerated grokking by amplifying slow gradients, 2024. URL https://arxiv.org/abs/2405.20233

  10. [10]

    Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

    Jiping Li and Rishi Sonthalia. Risk phase transitions in spiked regression: Alignment driven benign and catastrophic overfitting.arXiv preprint arXiv:2510.01414, 2025

  11. [11]

    Nolte, Eric J

    Ziming Liu, Ouail Kitouni, Niklas S. Nolte, Eric J. Michaud, Max Tegmark, and Mike Williams. Towards understanding grokking: An effective theory of representation learning. InAdvances in Neural Information Processing Systems, volume 35, pages 34651–34663. Curran Associates, Inc., 2022. URLhttps://arxiv.org/abs/2205.10343

  12. [12]

    Marchenko and Leonid Andreevich Pastur

    Vladimir A. Marchenko and Leonid Andreevich Pastur. Distribution of eigenvalues for some sets of random matrices.Matematicheskii Sbornik, 72(114)(4):507–536, 1967

  13. [13]

    Charles H. Martin. WeightWatcher: Analyze Deep Learning Models without Training or Data. https://github.com/CalculatedContent/WeightWatcher, 2018-2024. Version 0.7.5.5 used in this study. Accessed May 12, 2025

  14. [14]

    Martin and Christopher Hinrichs

    Charles H. Martin and Christopher Hinrichs. SETOL: A semi-empirical theory of (deep) learning.arXiv preprint arXiv:2507.17912, 2025. URL https://arxiv.org/abs/2507. 17912

  15. [15]

    Martin and Michael W

    Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning.Journal of Machine Learning Research, 22(1):165, January 2021. URL http://jmlr.org/papers/ v22/20-410.html. 10

  16. [16]

    Martin, Tian Peng, and Michael W

    Charles H. Martin, Tian Peng, and Michael W. Mahoney. Predicting trends in the quality of state- of-the-art neural networks without access to training or testing data.Nature Communications, 12:4122, jul 2021. doi: 10.1038/s41467-021-24025-8. URL https://doi.org/10.1038/ s41467-021-24025-8

  17. [17]

    Progress measures for grokking via mechanistic interpretability

    Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability, 2023. URL https://arxiv.org/ abs/2301.05217

  18. [18]

    Introducing gpt-oss

    OpenAI. Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/, Au- gust 2025. Accessed: 2026-03-28

  19. [19]

    gpt-oss-120b & gpt-oss-20b model card

    OpenAI. gpt-oss-120b & gpt-oss-20b model card. https://openai.com/index/ gpt-oss-model-card/, August 2025. Accessed: 2026-03-28

  20. [20]

    C. E. Porter and R. G. Thomas. Fluctuations of nuclear reaction widths.Physical Review, 104 (2):483–491, 1956

  21. [21]

    Cambridge University Press, 2021

    Marc Potters and Jean-Philippe Bouchaud.A First Course in Random Matrix Theory: F or Physi- cists, Engineers and Data Scientists. Cambridge University Press, 2021. ISBN 9781108768900

  22. [22]

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

    Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. Grokking: Generalization beyond overfitting on small algorithmic datasets, 2022. URL https://arxiv. org/abs/2201.02177

  23. [23]

    Sebastian Seung, Haim Sompolinsky, and Naftali Tishby

    H. Sebastian Seung, Haim Sompolinsky, and Naftali Tishby. Statistical mechanics of learning from examples.Physical Review A, 45(8):6056–6091, 1992. doi: 10.1103/PhysRevA.45.6056

  24. [24]

    Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization

    Boshi Wang, Xiang Yue, Yu Su, and Huan Sun. Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://arxiv.org/pdf/2405.15071

  25. [25]

    train” curve for inferred facts test_inferredID No Main “test

    Pierre Weiss. L’hypothèse du champ moléculaire et la propriété ferromagnétique.Jour- nal de Physique Théorique et Appliquée, 6(1):661–690, 1907. doi: 10.1051/jphystap: 019070060066100. 11 A Experimental Setup and Additional Notes: MLP, MA, and GPT2 A.1 MLP Experimental Setup We train a Multi-Layer Perceptron (MLP) on a subset of the MNIST dataset using th...