pith. sign in

arxiv: 2511.21715 · v2 · submitted 2025-11-18 · ⚛️ physics.hist-ph · cond-mat.stat-mech· cs.LG

DNNs, Dataset Statistics, and Correlation Functions

Pith reviewed 2026-05-17 20:29 UTC · model grok-4.3

classification ⚛️ physics.hist-ph cond-mat.stat-mechcs.LG
keywords deep neural networksimage classificationhigh-order correlation functionsdataset structuregeneralizationcondensed matter physicsmesoscale correlations
0
0 comments X

The pith

Deep neural networks succeed in image classification by discovering high-order correlation functions in their datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that the key to deep neural networks' success in tasks like image recognition lies in the correlational structure of the training datasets rather than solely in the networks' architecture. It draws a parallel to methods in condensed matter physics, where researchers analyze mesoscale correlation structures that bridge atomic and larger scales. Specifically, the authors claim that effective DNNs must identify high-order correlation functions within these datasets. This perspective offers a way to understand why DNNs generalize well even when they appear to contradict principles from statistical learning theory. Readers might care because it reframes the source of machine learning performance as a property of the data itself.

Core claim

DNNs that are successful in image classification must be discovering high order correlation functions. This approach mirrors a common methodology in condensed matter physics and materials science that emphasizes mesoscale correlation structures existing between fundamental atomic scales and continuum scales. The discussion addresses how this accounts for the puzzle of DNNs generalizing successfully despite apparent violations of standard statistical learning theory.

What carries the argument

High-order correlation functions that describe mesoscale structures in image datasets, analogous to those studied in condensed matter physics.

If this is right

  • DNN success in image tasks depends critically on the presence of these high-order correlations in the data.
  • The generalization ability of DNNs stems from their capacity to capture these dataset-specific statistical structures.
  • Insights from physics on correlation functions can guide improvements in training data selection or network design.
  • Standard statistical learning theory may need revision to account for the role of these higher-order structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Examining the learned features in DNN layers could reveal explicit representations of these correlation functions.
  • Creating synthetic datasets that control for high-order correlations would allow direct tests of their necessity for classification accuracy.
  • This view might extend to other domains like audio or text processing if similar correlational structures exist in those datasets.
  • Future work could explore whether altering these correlations in datasets predictably affects DNN performance across different architectures.

Load-bearing premise

That the correlational structure in real-world image datasets is the primary driver of DNN success and corresponds directly to the high-order correlation functions from condensed matter physics.

What would settle it

Training a DNN on a modified image dataset that preserves low-order statistics but removes high-order correlations, and checking if classification accuracy drops significantly compared to the original dataset.

Figures

Figures reproduced from arXiv: 2511.21715 by James F. Woodward, Robert W. Batterman.

Figure 1
Figure 1. Figure 1: Stream, Trees, Rocks This scaling result means, essentially, that if one forms block pixels (in analogy with block spins in a real-space renormalization scheme [14], we would see the same statistical structure in the pixel-blocked images after 6 [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: Making blocks. In this illustration a twodimensional Ising model containing 81 spins is broken into blockseach containing 9 spinsEach one of those blocks is assigned Figure 2: Blocking and averaging to yield a new (coarse-grained) effective [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scaling of Contrast Distribution[22] a new data set, yet the statistics were virtually unchanged. The exponent η “changed slightly from η = 0.19 to η ′ = 0.20. Given the drastic nature of the recalibration procedure, this change is surprisingly small.” [21, p. 3389] This example is meant to demonstrate the robustness of scaling in natural images by pushing an extreme limit of recalibration. Of course it ca… view at source ↗
Figure 4
Figure 4. Figure 4: Throwing Line Segments on the Plane. [10, p. 3393] (a) yields [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Conductor/Insulator Composite. Dark bands are the Insulators. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Throwing Lines, Triangles . . . to Determine [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Scree Plots: Scaling Behavior of ΣM for Various Datasets [17, p. 4] where X ∈ R d×M where d the dimension of the image vectors, and M is the number of samples. The matrix ΣM is an empirical covariance (Gram) matrix.17 Their empirical investigations show that the spectrum of ΣM for various datasets can be separated into a set of large eigenvalues (O(10)), a bulk of eigenvalues which decay as a power law λi … view at source ↗
Figure 8
Figure 8. Figure 8: Marˇchenko-Pastur (MP) Distributions [19, p.14] [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Taxonomy of Trained Models. Changing RMT statistics for Weight [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: 2-Point Probability Plots To yield the results displayed in figures 10 and 11, the MNIST data were changed from grayscale to black and white, with white pixels given the value 1 and black pixels the value 0. Each image is a square of 282 pixels labelled by xi,j . We first seek the probability that two pixels values [xi,j ] and [xi+shiftx , xj+shifty ] are both 1 (white) for some fixed shift = (shiftx, shi… view at source ↗
Figure 11
Figure 11. Figure 11: 3-Point Probability Plots higher order correlation functions of the kind discussed briefly in section 3, are sufficient for distinguishing distinct classes of numerals in the MNIST dataset. It is reasonable, we believe, to expect similar results from the other datasets mentioned listed in section 4. However, the second question re￾mains: Can one show that, as a matter of fact, image recognitions DNNs are … view at source ↗
Figure 12
Figure 12. Figure 12: Rectangular Data and Decision Boundaries [20, p. 4] [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
Figure 12
Figure 12. Figure 12: figure 12. At this zeroth order, the decision boundary moves to a line (yellow) [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗
read the original abstract

This paper argues that dataset structure is important in image recognition tasks (among other tasks). Specifically, we focus on the nature and genesis of correlational structure in the actual datasets upon which DNNs are trained. We argue that DNNs are implementing a widespread methodology in condensed matter physics and materials science that focuses on mesoscale correlation structures that live between fundamental atomic/molecular scales and continuum scales. Specifically, we argue that DNNs that are successful in image classification must be discovering high order correlation functions. It is well-known that DNNs successfully generalize in apparent contravention of standard statistical learning theory. We consider the implications of our discussion for this puzzle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that the success of DNNs in image classification tasks stems from their discovery of mesoscale high-order correlation functions present in real-world image datasets, drawing an analogy to methods in condensed matter physics and materials science. It posits that this correlational structure explains DNN generalization in apparent violation of standard statistical learning theory and considers the broader implications of this view.

Significance. If the proposed mapping between DNN internals and high-order correlation functions could be made explicit and tested, the work would offer a valuable interdisciplinary bridge between machine learning and physics, potentially reframing the generalization puzzle in terms of dataset mesoscale statistics. The manuscript receives credit for emphasizing dataset structure over purely architectural or optimization-based explanations and for situating DNN performance within established physics methodologies, though these strengths remain at the level of conceptual suggestion rather than demonstrated result.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.
  2. [Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.
minor comments (2)
  1. [Abstract] The abstract contains several long sentences that combine multiple distinct ideas; breaking them into shorter statements would improve readability.
  2. Notation for correlation functions and mesoscale scales is introduced conceptually but would benefit from a brief clarifying paragraph or diagram to distinguish n-point functions from standard two-point statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive report and for recognizing the potential value of situating DNN performance within mesoscale physics methodologies. We address the major comments below with point-by-point responses and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'DNNs that are successful in image classification must be discovering high order correlation functions' is asserted without any explicit operator-level correspondence (for example, showing how a convolutional layer or attention mechanism computes a specific n-point function) or any quantitative test that would distinguish this from lower-order moments or other representational strategies.

    Authors: We agree that the manuscript advances this claim at a conceptual level without providing explicit operator-level mappings between DNN layers and specific n-point functions or quantitative tests that isolate high-order correlations from lower-order moments. The work is framed as an interdisciplinary analogy to condensed-matter methods rather than a technical derivation. In the revised manuscript we will temper the abstract language to present the claim as a hypothesis motivated by dataset statistics, and we will add a brief discussion of possible future directions for establishing such correspondences. revision: yes

  2. Referee: [Implications for generalization] Section on implications for the generalization puzzle: the argument that mesoscale correlation structures explain DNN success beyond statistical learning theory proceeds by equating the presence of correlational structure in datasets with the network's discovery of high-order functions, without ruling out or comparing against alternative drivers such as hierarchical compositionality or inductive biases in the training dynamics.

    Authors: The referee is correct that the current text does not explicitly compare or rule out alternative explanations such as hierarchical compositionality or inductive biases in training dynamics. Our emphasis is on dataset mesoscale structure as a complementary factor that has received less attention, not as an exclusive account. In revision we will expand the relevant section to acknowledge these alternatives, clarify that our perspective is intended to coexist with rather than supplant them, and note how dataset statistics might interact with architectural and optimization biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; interpretive argument remains self-contained

full rationale

The paper advances a conceptual claim that successful image-classification DNNs discover high-order correlation functions from condensed-matter physics by focusing on mesoscale dataset structure. The abstract and surrounding context present this as an interpretive link to the generalization puzzle rather than a closed mathematical derivation. No equations, fitted parameters renamed as predictions, self-citations that bear the central load, or definitional reductions (e.g., X defined via Y then Y derived from X) are exhibited. The argument relies on analogy between known correlational structure in images and physics quantities without reducing the necessity claim to its own inputs by construction. External benchmarks or explicit operator mappings would be needed to test the claim, but their absence does not create circularity within the given derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on interpretive assumptions about DNN internal mechanisms and dataset statistics without independent verification or formal derivation.

axioms (2)
  • domain assumption Successful DNNs discover high-order correlation functions in datasets
    Stated directly in the abstract as the key mechanism without derivation or evidence.
  • ad hoc to paper Mesoscale correlation structures explain DNN generalization beyond statistical learning theory
    Invoked to resolve the generalization puzzle but not justified quantitatively.

pith-pipeline@v0.9.0 · 5403 in / 1198 out tokens · 30347 ms · 2026-05-17T20:29:50.123616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Baniassadi, S

    M. Baniassadi, S. Ahzi, H. Garmestani, D. Ruch, and Y. Remond. New approximate solution forn-point correlation functions for heterogeneous materials.Journal of the Mechanics and Physics of Solids, 60:104–199, 2012

  2. [2]

    Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics

    Robert W. Batterman.A Middle Way: A Non-Fundamental Approach to Many-Body Physics. Oxford University Press, 2021

  3. [3]

    Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

    Mikhail Belkin, Daniel Hsu, Syuan Ma, and Soumik Mandal. Recon- ciling modern machine-learning practice and the classical bias-variance trade-off.PNAS, 116(32):15849–15854, 2019

  4. [4]

    Bishop.Pattern Recognition and Machine Learning

    Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, 2006

  5. [5]

    Florian J. Boge. Two dimensions of opacity and the deep learning predicament.Minds and Machines, 32:43–75, 2022

  6. [6]

    Chalkley, Jerome Cornfield, and Helen Park

    Harold W. Chalkley, Jerome Cornfield, and Helen Park. A method for estimating volume-surface ratios.Science, 110:295–298, 1949

  7. [7]

    Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

    Kathleen Creel. Transparency in complex computational systems.Phi- losophy of Science, 87(4):568–589, 2020

  8. [8]

    Debeye, H

    P. Debeye, H. R. Anderson Jr., and H. Brumberger. Scattering by an in- homegeneous solid. ii. the correlation function and its application.Jour- nal of Applied Physics, 28(679–683), 1957

  9. [9]

    The representational status of deep learning models

    Eamon Duede. The representational status of deep learning models. arXiv:2303.12032v2, 2025

  10. [10]

    Advanced Book Classics

    Dieter Forster.Hydrodynamic Fluctuations, Broken Symmetry, and Cor- relation Functions. Advanced Book Classics. Perseus Books, 1990

  11. [11]

    Number 85 in Frontiers in Physics

    Nigel Goldenfeld.Lectures on Phase Transitions and the Renormaliza- tion Group. Number 85 in Frontiers in Physics. Addison-Wesley, Read- ing, Massachusetts, 1992. 35

  12. [12]

    Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

    Sebastian Goldt, Marc M´ ezard, Florent Krzakala, and Lenka Zdeborov´ a. Modelling the influence of data structure on learning in neural networks: The hiddn manifold model.arXiv:1909.11500v4, 2020

  13. [13]

    A. N. Gorban and I. Y. Tyukin. Blessing of dimensionality: Mathemat- ical foundations of the statistical physics of data.Philosophical Trans- actions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(20170237), 2018

  14. [14]

    Kadanoff

    Leo P. Kadanoff. Theories of matter: Infinities and renormalization. In Robert W. Batterman, editor,The Oxford Handbook of Philosophy of Physics, chapter Four, pages 141–188. Oxford University Press, 2013

  15. [15]

    Kadanoff and Paul C

    Leo P. Kadanoff and Paul C. Martin. Hydrodynamic equations and correlation functions.Annals of Physics, 24:419–469, 1963

  16. [16]

    Generalization in deep learning

    Kenji Kawaguchi, Leslie Pack Kaelbling, and Yoshua Bengio. General- ization in deep learning.arXiv:1710.05468, 2023

  17. [17]

    Trajectory of mini-batch momentum: Batch size saturation and convergence in high dimensions,

    Noam Levi and Yaron Oz. The underlying scaling laws and universal structure of complex datasets.arXiv:2306.14975v3, 2024

  18. [18]

    Lin, Max Tegmark, and David Rolnick

    Henry W. Lin, Max Tegmark, and David Rolnick. Why does deep and cheap lerning work so well?Journal of Statistical Physics, 168:1223– 1247, 2017

  19. [19]

    Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

    Charles H. Martin and Michael W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and im- plications for learning.CoRR, abs/1810.01075, 2018

  20. [20]

    Neural net- works trained with SGD learn distributions of increasing complexity

    Maria Refinetti, Alessandro Ingrosso, and Sebastian Goldt. Neural net- works trained with SGD learn distributions of increasing complexity. arXiv:2211.11567v2, 2023

  21. [21]

    Ruderman

    Daniel L. Ruderman. Origins of scaling in natural images.Vision Re- search, 37(23):3385–3398, 1997

  22. [22]

    Ruderman and William Bialek

    Daniel L. Ruderman and William Bialek. Statistics of natural images: Scaling in the woods.Physical Review Letters, 73(6):814–817, 1994

  23. [23]

    Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022

    Emily Sullivan. Understanding from machine learning models.The British Journal for the Philosophy of Science, 73(1):109–133, 2022. 36

  24. [24]

    Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

    Emily Sullivan. Do machine learning models represent their targets? Philosophy of Science, 91(5):1445–1455, 2024

  25. [25]

    Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

    Damien Teney, Liangze Jian, Florin Gogianu Bitdefender, and Eshan Abbasnejad. Do we always need the simplicity bias? looking for optimal inductive biases in the wild.arXiv:2503.10065v1, 2025

  26. [26]

    Springer, New York, 2002

    Salvatore Torquato.Random Heterogeneous Materials: Microstructure and Macroscopic Properties. Springer, New York, 2002

  27. [27]

    Deep learning is not so mysterious or different

    Andrew Gordon Wilson. Deep learning is not so mysterious or different. Proceedings of the42 nd International Conference on Machine Learning, 2025

  28. [28]

    Wilson and J

    Kenneth G. Wilson and J. Kogut. The renormalization group and theϵ expansion.Physics Reports, 12(2):75–199, 1974

  29. [29]

    D. H. Wolpert and W. G. Macready. No free lunch theorems for op- timization.IEEE Transactions on Evolutionay Computation, 1:67–82, 1997

  30. [30]

    Rethinking bias-variance trade-off for generalization of neural networks

    Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. Rethinking bias-variance trade-off for generalization of neural networks. Proceedings of the37 th International Conference on Machine Learning, 2020

  31. [31]

    Understanding deep learning requires rethinking generalization

    Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requries re-thinking generalzition. ArXiv:1611.03530v2, 2017. 37