DNNs, Dataset Statistics, and Correlation Functions
Pith reviewed 2026-05-17 20:29 UTC · model grok-4.3
The pith
Deep neural networks succeed in image classification by discovering high-order correlation functions in their datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DNNs that are successful in image classification must be discovering high order correlation functions. This approach mirrors a common methodology in condensed matter physics and materials science that emphasizes mesoscale correlation structures existing between fundamental atomic scales and continuum scales. The discussion addresses how this accounts for the puzzle of DNNs generalizing successfully despite apparent violations of standard statistical learning theory.
What carries the argument
High-order correlation functions that describe mesoscale structures in image datasets, analogous to those studied in condensed matter physics.
If this is right
- DNN success in image tasks depends critically on the presence of these high-order correlations in the data.
- The generalization ability of DNNs stems from their capacity to capture these dataset-specific statistical structures.
- Insights from physics on correlation functions can guide improvements in training data selection or network design.
- Standard statistical learning theory may need revision to account for the role of these higher-order structures.
Where Pith is reading between the lines
- Examining the learned features in DNN layers could reveal explicit representations of these correlation functions.
- Creating synthetic datasets that control for high-order correlations would allow direct tests of their necessity for classification accuracy.
- This view might extend to other domains like audio or text processing if similar correlational structures exist in those datasets.
- Future work could explore whether altering these correlations in datasets predictably affects DNN performance across different architectures.
Load-bearing premise
That the correlational structure in real-world image datasets is the primary driver of DNN success and corresponds directly to the high-order correlation functions from condensed matter physics.
What would settle it
Training a DNN on a modified image dataset that preserves low-order statistics but removes high-order correlations, and checking if classification accuracy drops significantly compared to the original dataset.
Figures
read the original abstract
This paper argues that dataset structure is important in image recognition tasks (among other tasks). Specifically, we focus on the nature and genesis of correlational structure in the actual datasets upon which DNNs are trained. We argue that DNNs are implementing a widespread methodology in condensed matter physics and materials science that focuses on mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales. Specifically, we argue that DNNs that are successful in image classification must be discovering high order correlation functions. It is well-known that DNNs successfully generalize in apparent contravention of standard statistical learning theory. We consider the implications of our discussion for this puzzle.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that the success of DNNs in image classification tasks stems from their discovery of mesoscale high-order correlation functions present in real-world image datasets, drawing an analogy to methods in condensed matter physics and materials science. It posits that this correlational structure explains DNN generalization in apparent violation of standard statistical learning theory and considers the broader implications of this view.
Significance. If the proposed mapping between DNN internals and high-order correlation functions could be made explicit and tested, the work would offer a valuable interdisciplinary bridge between machine learning and physics, potentially reframing the generalization puzzle in terms of dataset mesoscale statistics. The manuscript receives credit for emphasizing dataset structure over purely architectural or optimization-based explanations and for situating DNN performance within established physics methodologies, though these strengths remain at the level of conceptual suggestion rather than demonstrated result.
major comments (2)
- [Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.
- [Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.
minor comments (2)
- [Abstract] The abstract contains several long sentences that combine multiple distinct ideas; breaking them into shorter statements would improve readability.
- Notation for correlation functions and mesoscale scales is introduced conceptually but would benefit from a brief clarifying paragraph or diagram to distinguish n-point functions from standard two-point statistics.
Simulated Author's Rebuttal
We thank the referee for their constructive report and for recognizing the potential value of situating DNN performance within mesoscale physics methodologies. We address the major comments below with point-by-point responses and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.
Authors: We agree that the manuscript advances this claim at a conceptual level without providing explicit operator-level mappings between DNN layers and specific n-point functions or quantitative tests that isolate high-order correlations from lower-order moments. The work is framed as an interdisciplinary analogy to condensed-matter methods rather than a technical derivation. In the revised manuscript we will temper the abstract language to present the claim as a hypothesis motivated by dataset statistics, and we will add a brief discussion of possible future directions for establishing such correspondences. revision: yes
-
Referee: [Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.
Authors: The referee is correct that the current text does not explicitly compare or rule out alternative explanations such as hierarchical compositionality or inductive biases in training dynamics. Our emphasis is on dataset mesoscale structure as a complementary factor that has received less attention, not as an exclusive account. In revision we will expand the relevant section to acknowledge these alternatives, clarify that our perspective is intended to coexist with rather than supplant them, and note how dataset statistics might interact with architectural and optimization biases. revision: yes
Circularity Check
No significant circularity; interpretive argument remains self-contained
full rationale
The paper advances a conceptual claim that successful image-classification DNNs discover high-order correlation functions from condensed-matter physics by focusing on mesoscale dataset structure. The abstract and surrounding context present this as an interpretive link to the generalization puzzle rather than a closed mathematical derivation. No equations, fitted parameters renamed as predictions, self-citations that bear the central load, or definitional reductions (e.g., X defined via Y then Y derived from X) are exhibited. The argument relies on analogy between known correlational structure in images and physics quantities without reducing the necessity claim to its own inputs by construction. External benchmarks or explicit operator mappings would be needed to test the claim, but their absence does not create circularity within the given derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Successful DNNs discover high-order correlation functions in datasets
- ad hoc to paper Mesoscale correlation structures explain DNN generalization beyond statistical learning theory
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DNNs that are successful in image classification must be discovering high order correlation functions... mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Baniassadi, S. Ahzi, H. Garmestani, D. Ruch, and Y. Remond. New approximate solution forn-point correlation functions for heterogeneous materials.Journal of the Mechanics and Physics of Solids, 60:104–199, 2012
work page 2012
-
[2]
Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics
Robert W. Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics. Oxford University Press, 2021
work page 2021
-
[3]
Mikhail Belkin, Daniel Hsu, Syuan Ma, and Soumik Mandal. Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019
work page 2019
-
[4]
Bishop.Pattern Recognition and Machine Learning
Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006
work page 2006
-
[5]
Florian J. Boge. Two dimensions of opacity and the deep learning predicament.Minds and Machines, 32:43–75, 2022
work page 2022
-
[6]
Chalkley, Jerome Cornfield, and Helen Park
Harold W. Chalkley, Jerome Cornfield, and Helen Park. A method for estimating volume-surface ratios.Science, 110:295–298, 1949
work page 1949
-
[7]
Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020
Kathleen Creel. Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020
work page 2020
- [8]
-
[9]
The representational status of deep learning models
Eamon Duede. The representational status of deep learning models. arXiv:2303.12032v2, 2025
-
[10]
Dieter Forster.Hydrodynamic Fluctuations, Broken Symmetry, and Cor- relation Functions. Advanced Book Classics. Perseus Books, 1990
work page 1990
-
[11]
Number 85 in Frontiers in Physics
Nigel Goldenfeld.Lectures on Phase Transitions and the Renormaliza- tion Group. Number 85 in Frontiers in Physics. Addison-Wesley, Read- ing, Massachusetts, 1992. 35
work page 1992
-
[12]
Sebastian Goldt, Marc M´ ezard, Florent Krzakala, and Lenka Zdeborov´ a. Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020
-
[13]
A. N. Gorban and I. Y. Tyukin. Blessing of dimensionality: Mathemat- ical foundations of the statistical physics of data.Philosophical Trans- actions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(20170237), 2018
work page 2018
- [14]
-
[15]
Leo P. Kadanoff and Paul C. Martin. Hydrodynamic equations and correlation functions.Annals of Physics, 24:419–469, 1963
work page 1963
-
[16]
Generalization in deep learning
Kenji Kawaguchi, Leslie Pack Kaelbling, and Yoshua Bengio. General- ization in deep learning.arXiv:1710.05468, 2023
-
[17]
Trajectory of mini-batch momentum: Batch size saturation and convergence in high dimensions,
Noam Levi and Yaron Oz. The underlying scaling laws and universal structure of complex datasets.arXiv:2306.14975v3, 2024
-
[18]
Lin, Max Tegmark, and David Rolnick
Henry W. Lin, Max Tegmark, and David Rolnick. Why does deep and cheap lerning work so well?Journal of Statistical Physics, 168:1223– 1247, 2017
work page 2017
-
[19]
Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and im- plications for learning.CoRR, abs/1810.01075, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Neural net- works trained with SGD learn distributions of increasing complexity
Maria Refinetti, Alessandro Ingrosso, and Sebastian Goldt. Neural net- works trained with SGD learn distributions of increasing complexity. arXiv:2211.11567v2, 2023
- [21]
-
[22]
Daniel L. Ruderman and William Bialek. Statistics of natural images: Scaling in the woods.Physical Review Letters, 73(6):814–817, 1994
work page 1994
-
[23]
Emily Sullivan. Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022. 36
work page 2022
-
[24]
Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024
Emily Sullivan. Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024
work page 2024
-
[25]
Damien Teney, Liangze Jian, Florin Gogianu Bitdefender, and Eshan Abbasnejad. Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025
-
[26]
Salvatore Torquato.Random Heterogeneous Materials: Microstructure and Macroscopic Properties. Springer, New York, 2002
work page 2002
-
[27]
Deep learning is not so mysterious or different
Andrew Gordon Wilson. Deep learning is not so mysterious or different. Proceedings of the42 nd International Conference on Machine Learning, 2025
work page 2025
-
[28]
Kenneth G. Wilson and J. Kogut. The renormalization group and theϵ expansion.Physics Reports, 12(2):75–199, 1974
work page 1974
-
[29]
D. H. Wolpert and W. G. Macready. No free lunch theorems for op- timization.IEEE Transactions on Evolutionay Computation, 1:67–82, 1997
work page 1997
-
[30]
Rethinking bias-variance trade-off for generalization of neural networks
Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. Rethinking bias-variance trade-off for generalization of neural networks. Proceedings of the37 th International Conference on Machine Learning, 2020
work page 2020
-
[31]
Understanding deep learning requires rethinking generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requries re-thinking generalzition. ArXiv:1611.03530v2, 2017. 37
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.