DNNs, Dataset Statistics, and Correlation Functions

James F. Woodward; Robert W. Batterman

arxiv: 2511.21715 · v2 · submitted 2025-11-18 · ⚛️ physics.hist-ph · cond-mat.stat-mech· cs.LG

DNNs, Dataset Statistics, and Correlation Functions

Robert W. Batterman , James F. Woodward This is my paper

Pith reviewed 2026-05-17 20:29 UTC · model grok-4.3

classification ⚛️ physics.hist-ph cond-mat.stat-mechcs.LG

keywords deep neural networksimage classificationhigh-order correlation functionsdataset structuregeneralizationcondensed matter physicsmesoscale correlations

0 comments

The pith

Deep neural networks succeed in image classification by discovering high-order correlation functions in their datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that the key to deep neural networks' success in tasks like image recognition lies in the correlational structure of the training datasets rather than solely in the networks' architecture. It draws a parallel to methods in condensed matter physics, where researchers analyze mesoscale correlation structures that bridge atomic and larger scales. Specifically, the authors claim that effective DNNs must identify high-order correlation functions within these datasets. This perspective offers a way to understand why DNNs generalize well even when they appear to contradict principles from statistical learning theory. Readers might care because it reframes the source of machine learning performance as a property of the data itself.

Core claim

DNNs that are successful in image classification must be discovering high order correlation functions. This approach mirrors a common methodology in condensed matter physics and materials science that emphasizes mesoscale correlation structures existing between fundamental atomic scales and continuum scales. The discussion addresses how this accounts for the puzzle of DNNs generalizing successfully despite apparent violations of standard statistical learning theory.

What carries the argument

High-order correlation functions that describe mesoscale structures in image datasets, analogous to those studied in condensed matter physics.

If this is right

DNN success in image tasks depends critically on the presence of these high-order correlations in the data.
The generalization ability of DNNs stems from their capacity to capture these dataset-specific statistical structures.
Insights from physics on correlation functions can guide improvements in training data selection or network design.
Standard statistical learning theory may need revision to account for the role of these higher-order structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Examining the learned features in DNN layers could reveal explicit representations of these correlation functions.
Creating synthetic datasets that control for high-order correlations would allow direct tests of their necessity for classification accuracy.
This view might extend to other domains like audio or text processing if similar correlational structures exist in those datasets.
Future work could explore whether altering these correlations in datasets predictably affects DNN performance across different architectures.

Load-bearing premise

That the correlational structure in real-world image datasets is the primary driver of DNN success and corresponds directly to the high-order correlation functions from condensed matter physics.

What would settle it

Training a DNN on a modified image dataset that preserves low-order statistics but removes high-order correlations, and checking if classification accuracy drops significantly compared to the original dataset.

Figures

Figures reproduced from arXiv: 2511.21715 by James F. Woodward, Robert W. Batterman.

**Figure 1.** Figure 1: Stream, Trees, Rocks This scaling result means, essentially, that if one forms block pixels (in analogy with block spins in a real-space renormalization scheme [14], we would see the same statistical structure in the pixel-blocked images after 6 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 4.** Figure 4: Making blocks. In this illustration a twodimensional Ising model containing 81 spins is broken into blockseach containing 9 spinsEach one of those blocks is assigned Figure 2: Blocking and averaging to yield a new (coarse-grained) effective [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 3.** Figure 3: Scaling of Contrast Distribution[22] a new data set, yet the statistics were virtually unchanged. The exponent η “changed slightly from η = 0.19 to η ′ = 0.20. Given the drastic nature of the recalibration procedure, this change is surprisingly small.” [21, p. 3389] This example is meant to demonstrate the robustness of scaling in natural images by pushing an extreme limit of recalibration. Of course it ca… view at source ↗

**Figure 4.** Figure 4: Throwing Line Segments on the Plane. [10, p. 3393] (a) yields [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Conductor/Insulator Composite. Dark bands are the Insulators. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Throwing Lines, Triangles . . . to Determine [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Scree Plots: Scaling Behavior of ΣM for Various Datasets [17, p. 4] where X ∈ R d×M where d the dimension of the image vectors, and M is the number of samples. The matrix ΣM is an empirical covariance (Gram) matrix.17 Their empirical investigations show that the spectrum of ΣM for various datasets can be separated into a set of large eigenvalues (O(10)), a bulk of eigenvalues which decay as a power law λi … view at source ↗

**Figure 8.** Figure 8: Marˇchenko-Pastur (MP) Distributions [19, p.14] [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Taxonomy of Trained Models. Changing RMT statistics for Weight [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: 2-Point Probability Plots To yield the results displayed in figures 10 and 11, the MNIST data were changed from grayscale to black and white, with white pixels given the value 1 and black pixels the value 0. Each image is a square of 282 pixels labelled by xi,j . We first seek the probability that two pixels values [xi,j ] and [xi+shiftx , xj+shifty ] are both 1 (white) for some fixed shift = (shiftx, shi… view at source ↗

**Figure 11.** Figure 11: 3-Point Probability Plots higher order correlation functions of the kind discussed briefly in section 3, are sufficient for distinguishing distinct classes of numerals in the MNIST dataset. It is reasonable, we believe, to expect similar results from the other datasets mentioned listed in section 4. However, the second question remains: Can one show that, as a matter of fact, image recognitions DNNs are … view at source ↗

**Figure 12.** Figure 12: Rectangular Data and Decision Boundaries [20, p. 4] [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 12.** Figure 12: figure 12. At this zeroth order, the decision boundary moves to a line (yellow) [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗

read the original abstract

This paper argues that dataset structure is important in image recognition tasks (among other tasks). Specifically, we focus on the nature and genesis of correlational structure in the actual datasets upon which DNNs are trained. We argue that DNNs are implementing a widespread methodology in condensed matter physics and materials science that focuses on mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales. Specifically, we argue that DNNs that are successful in image classification must be discovering high order correlation functions. It is well-known that DNNs successfully generalize in apparent contravention of standard statistical learning theory. We consider the implications of our discussion for this puzzle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that the success of DNNs in image classification tasks stems from their discovery of mesoscale high-order correlation functions present in real-world image datasets, drawing an analogy to methods in condensed matter physics and materials science. It posits that this correlational structure explains DNN generalization in apparent violation of standard statistical learning theory and considers the broader implications of this view.

Significance. If the proposed mapping between DNN internals and high-order correlation functions could be made explicit and tested, the work would offer a valuable interdisciplinary bridge between machine learning and physics, potentially reframing the generalization puzzle in terms of dataset mesoscale statistics. The manuscript receives credit for emphasizing dataset structure over purely architectural or optimization-based explanations and for situating DNN performance within established physics methodologies, though these strengths remain at the level of conceptual suggestion rather than demonstrated result.

major comments (2)

[Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.
[Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.

minor comments (2)

[Abstract] The abstract contains several long sentences that combine multiple distinct ideas; breaking them into shorter statements would improve readability.
Notation for correlation functions and mesoscale scales is introduced conceptually but would benefit from a brief clarifying paragraph or diagram to distinguish n-point functions from standard two-point statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive report and for recognizing the potential value of situating DNN performance within mesoscale physics methodologies. We address the major comments below with point-by-point responses and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.

Authors: We agree that the manuscript advances this claim at a conceptual level without providing explicit operator-level mappings between DNN layers and specific n-point functions or quantitative tests that isolate high-order correlations from lower-order moments. The work is framed as an interdisciplinary analogy to condensed-matter methods rather than a technical derivation. In the revised manuscript we will temper the abstract language to present the claim as a hypothesis motivated by dataset statistics, and we will add a brief discussion of possible future directions for establishing such correspondences. revision: yes
Referee: [Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.

Authors: The referee is correct that the current text does not explicitly compare or rule out alternative explanations such as hierarchical compositionality or inductive biases in training dynamics. Our emphasis is on dataset mesoscale structure as a complementary factor that has received less attention, not as an exclusive account. In revision we will expand the relevant section to acknowledge these alternatives, clarify that our perspective is intended to coexist with rather than supplant them, and note how dataset statistics might interact with architectural and optimization biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; interpretive argument remains self-contained

full rationale

The paper advances a conceptual claim that successful image-classification DNNs discover high-order correlation functions from condensed-matter physics by focusing on mesoscale dataset structure. The abstract and surrounding context present this as an interpretive link to the generalization puzzle rather than a closed mathematical derivation. No equations, fitted parameters renamed as predictions, self-citations that bear the central load, or definitional reductions (e.g., X defined via Y then Y derived from X) are exhibited. The argument relies on analogy between known correlational structure in images and physics quantities without reducing the necessity claim to its own inputs by construction. External benchmarks or explicit operator mappings would be needed to test the claim, but their absence does not create circularity within the given derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on interpretive assumptions about DNN internal mechanisms and dataset statistics without independent verification or formal derivation.

axioms (2)

domain assumption Successful DNNs discover high-order correlation functions in datasets
Stated directly in the abstract as the key mechanism without derivation or evidence.
ad hoc to paper Mesoscale correlation structures explain DNN generalization beyond statistical learning theory
Invoked to resolve the generalization puzzle but not justified quantitatively.

pith-pipeline@v0.9.0 · 5403 in / 1198 out tokens · 30347 ms · 2026-05-17T20:29:50.123616+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DNNs that are successful in image classification must be discovering high order correlation functions... mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

[1]

Baniassadi, S

M. Baniassadi, S. Ahzi, H. Garmestani, D. Ruch, and Y. Remond. New approximate solution forn-point correlation functions for heterogeneous materials.Journal of the Mechanics and Physics of Solids, 60:104–199, 2012

work page 2012
[2]

Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics

Robert W. Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics. Oxford University Press, 2021

work page 2021
[3]

Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

Mikhail Belkin, Daniel Hsu, Syuan Ma, and Soumik Mandal. Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

work page 2019
[4]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006

work page 2006
[5]

Florian J. Boge. Two dimensions of opacity and the deep learning predicament.Minds and Machines, 32:43–75, 2022

work page 2022
[6]

Chalkley, Jerome Cornfield, and Helen Park

Harold W. Chalkley, Jerome Cornfield, and Helen Park. A method for estimating volume-surface ratios.Science, 110:295–298, 1949

work page 1949
[7]

Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

Kathleen Creel. Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

work page 2020
[8]

Debeye, H

P. Debeye, H. R. Anderson Jr., and H. Brumberger. Scattering by an in- homegeneous solid. ii. the correlation function and its application.Jour- nal of Applied Physics, 28(679–683), 1957

work page 1957
[9]

The representational status of deep learning models

Eamon Duede. The representational status of deep learning models. arXiv:2303.12032v2, 2025

work page arXiv 2025
[10]

Advanced Book Classics

Dieter Forster.Hydrodynamic Fluctuations, Broken Symmetry, and Cor- relation Functions. Advanced Book Classics. Perseus Books, 1990

work page 1990
[11]

Number 85 in Frontiers in Physics

Nigel Goldenfeld.Lectures on Phase Transitions and the Renormaliza- tion Group. Number 85 in Frontiers in Physics. Addison-Wesley, Read- ing, Massachusetts, 1992. 35

work page 1992
[12]

Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

Sebastian Goldt, Marc M´ ezard, Florent Krzakala, and Lenka Zdeborov´ a. Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

work page arXiv 1909
[13]

A. N. Gorban and I. Y. Tyukin. Blessing of dimensionality: Mathemat- ical foundations of the statistical physics of data.Philosophical Trans- actions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(20170237), 2018

work page 2018
[14]

Kadanoff

Leo P. Kadanoff. Theories of matter: Infinities and renormalization. In Robert W. Batterman, editor,The Oxford Handbook of Philosophy of Physics, chapter Four, pages 141–188. Oxford University Press, 2013

work page 2013
[15]

Kadanoff and Paul C

Leo P. Kadanoff and Paul C. Martin. Hydrodynamic equations and correlation functions.Annals of Physics, 24:419–469, 1963

work page 1963
[16]

Generalization in deep learning

Kenji Kawaguchi, Leslie Pack Kaelbling, and Yoshua Bengio. General- ization in deep learning.arXiv:1710.05468, 2023

work page arXiv 2023
[17]

Trajectory of mini-batch momentum: Batch size saturation and convergence in high dimensions,

Noam Levi and Yaron Oz. The underlying scaling laws and universal structure of complex datasets.arXiv:2306.14975v3, 2024

work page arXiv 2024
[18]

Lin, Max Tegmark, and David Rolnick

Henry W. Lin, Max Tegmark, and David Rolnick. Why does deep and cheap lerning work so well?Journal of Statistical Physics, 168:1223– 1247, 2017

work page 2017
[19]

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and im- plications for learning.CoRR, abs/1810.01075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Neural net- works trained with SGD learn distributions of increasing complexity

Maria Refinetti, Alessandro Ingrosso, and Sebastian Goldt. Neural net- works trained with SGD learn distributions of increasing complexity. arXiv:2211.11567v2, 2023

work page arXiv 2023
[21]

Ruderman

Daniel L. Ruderman. Origins of scaling in natural images.Vision Re- search, 37(23):3385–3398, 1997

work page 1997
[22]

Ruderman and William Bialek

Daniel L. Ruderman and William Bialek. Statistics of natural images: Scaling in the woods.Physical Review Letters, 73(6):814–817, 1994

work page 1994
[23]

Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022

Emily Sullivan. Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022. 36

work page 2022
[24]

Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

Emily Sullivan. Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

work page 2024
[25]

Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

Damien Teney, Liangze Jian, Florin Gogianu Bitdefender, and Eshan Abbasnejad. Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

work page arXiv 2025
[26]

Springer, New York, 2002

Salvatore Torquato.Random Heterogeneous Materials: Microstructure and Macroscopic Properties. Springer, New York, 2002

work page 2002
[27]

Deep learning is not so mysterious or different

Andrew Gordon Wilson. Deep learning is not so mysterious or different. Proceedings of the42 nd International Conference on Machine Learning, 2025

work page 2025
[28]

Wilson and J

Kenneth G. Wilson and J. Kogut. The renormalization group and theϵ expansion.Physics Reports, 12(2):75–199, 1974

work page 1974
[29]

D. H. Wolpert and W. G. Macready. No free lunch theorems for op- timization.IEEE Transactions on Evolutionay Computation, 1:67–82, 1997

work page 1997
[30]

Rethinking bias-variance trade-off for generalization of neural networks

Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. Rethinking bias-variance trade-off for generalization of neural networks. Proceedings of the37 th International Conference on Machine Learning, 2020

work page 2020
[31]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requries re-thinking generalzition. ArXiv:1611.03530v2, 2017. 37

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Baniassadi, S

M. Baniassadi, S. Ahzi, H. Garmestani, D. Ruch, and Y. Remond. New approximate solution forn-point correlation functions for heterogeneous materials.Journal of the Mechanics and Physics of Solids, 60:104–199, 2012

work page 2012

[2] [2]

Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics

Robert W. Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics. Oxford University Press, 2021

work page 2021

[3] [3]

Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

Mikhail Belkin, Daniel Hsu, Syuan Ma, and Soumik Mandal. Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

work page 2019

[4] [4]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006

work page 2006

[5] [5]

Florian J. Boge. Two dimensions of opacity and the deep learning predicament.Minds and Machines, 32:43–75, 2022

work page 2022

[6] [6]

Chalkley, Jerome Cornfield, and Helen Park

Harold W. Chalkley, Jerome Cornfield, and Helen Park. A method for estimating volume-surface ratios.Science, 110:295–298, 1949

work page 1949

[7] [7]

Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

Kathleen Creel. Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

work page 2020

[8] [8]

Debeye, H

P. Debeye, H. R. Anderson Jr., and H. Brumberger. Scattering by an in- homegeneous solid. ii. the correlation function and its application.Jour- nal of Applied Physics, 28(679–683), 1957

work page 1957

[9] [9]

The representational status of deep learning models

Eamon Duede. The representational status of deep learning models. arXiv:2303.12032v2, 2025

work page arXiv 2025

[10] [10]

Advanced Book Classics

Dieter Forster.Hydrodynamic Fluctuations, Broken Symmetry, and Cor- relation Functions. Advanced Book Classics. Perseus Books, 1990

work page 1990

[11] [11]

Number 85 in Frontiers in Physics

Nigel Goldenfeld.Lectures on Phase Transitions and the Renormaliza- tion Group. Number 85 in Frontiers in Physics. Addison-Wesley, Read- ing, Massachusetts, 1992. 35

work page 1992

[12] [12]

Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

Sebastian Goldt, Marc M´ ezard, Florent Krzakala, and Lenka Zdeborov´ a. Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

work page arXiv 1909

[13] [13]

A. N. Gorban and I. Y. Tyukin. Blessing of dimensionality: Mathemat- ical foundations of the statistical physics of data.Philosophical Trans- actions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(20170237), 2018

work page 2018

[14] [14]

Kadanoff

Leo P. Kadanoff. Theories of matter: Infinities and renormalization. In Robert W. Batterman, editor,The Oxford Handbook of Philosophy of Physics, chapter Four, pages 141–188. Oxford University Press, 2013

work page 2013

[15] [15]

Kadanoff and Paul C

Leo P. Kadanoff and Paul C. Martin. Hydrodynamic equations and correlation functions.Annals of Physics, 24:419–469, 1963

work page 1963

[16] [16]

Generalization in deep learning

Kenji Kawaguchi, Leslie Pack Kaelbling, and Yoshua Bengio. General- ization in deep learning.arXiv:1710.05468, 2023

work page arXiv 2023

[17] [17]

Trajectory of mini-batch momentum: Batch size saturation and convergence in high dimensions,

Noam Levi and Yaron Oz. The underlying scaling laws and universal structure of complex datasets.arXiv:2306.14975v3, 2024

work page arXiv 2024

[18] [18]

Lin, Max Tegmark, and David Rolnick

Henry W. Lin, Max Tegmark, and David Rolnick. Why does deep and cheap lerning work so well?Journal of Statistical Physics, 168:1223– 1247, 2017

work page 2017

[19] [19]

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and im- plications for learning.CoRR, abs/1810.01075, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Neural net- works trained with SGD learn distributions of increasing complexity

Maria Refinetti, Alessandro Ingrosso, and Sebastian Goldt. Neural net- works trained with SGD learn distributions of increasing complexity. arXiv:2211.11567v2, 2023

work page arXiv 2023

[21] [21]

Ruderman

Daniel L. Ruderman. Origins of scaling in natural images.Vision Re- search, 37(23):3385–3398, 1997

work page 1997

[22] [22]

Ruderman and William Bialek

Daniel L. Ruderman and William Bialek. Statistics of natural images: Scaling in the woods.Physical Review Letters, 73(6):814–817, 1994

work page 1994

[23] [23]

Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022

Emily Sullivan. Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022. 36

work page 2022

[24] [24]

Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

Emily Sullivan. Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

work page 2024

[25] [25]

Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

Damien Teney, Liangze Jian, Florin Gogianu Bitdefender, and Eshan Abbasnejad. Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

work page arXiv 2025

[26] [26]

Springer, New York, 2002

Salvatore Torquato.Random Heterogeneous Materials: Microstructure and Macroscopic Properties. Springer, New York, 2002

work page 2002

[27] [27]

Deep learning is not so mysterious or different

Andrew Gordon Wilson. Deep learning is not so mysterious or different. Proceedings of the42 nd International Conference on Machine Learning, 2025

work page 2025

[28] [28]

Wilson and J

Kenneth G. Wilson and J. Kogut. The renormalization group and theϵ expansion.Physics Reports, 12(2):75–199, 1974

work page 1974

[29] [29]

D. H. Wolpert and W. G. Macready. No free lunch theorems for op- timization.IEEE Transactions on Evolutionay Computation, 1:67–82, 1997

work page 1997

[30] [30]

Rethinking bias-variance trade-off for generalization of neural networks

Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. Rethinking bias-variance trade-off for generalization of neural networks. Proceedings of the37 th International Conference on Machine Learning, 2020

work page 2020

[31] [31]

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requries re-thinking generalzition. ArXiv:1611.03530v2, 2017. 37

work page internal anchor Pith review Pith/arXiv arXiv 2017