pith. sign in

arxiv: 2606.20299 · v1 · pith:LJZAJHXJnew · submitted 2026-06-18 · 📊 stat.ML · cs.LG· hep-ph· physics.data-an

Statistical Properties of Training & Generalization

Pith reviewed 2026-06-26 15:34 UTC · model grok-4.3

classification 📊 stat.ML cs.LGhep-phphysics.data-an
keywords deep learningneural scaling lawsphysics-informed machine learninggeneralizationinductive biasestraining dynamicsstatistical properties
0
0 comments X

The pith

A physics-informed lens explains how neural scaling laws interact with constraints to shape deep learning training and generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews why deep learning succeeds despite classical statistical intuitions by examining its key statistical features through a physics perspective. It centers on neural scaling laws, which describe performance gains with model size and data, and shows how these laws interact with physical constraints and inductive biases in applications to physics problems. The authors discuss and justify common choices in building deep learning models while highlighting surprises in training dynamics and generalization. A reader would care because this framing can guide model construction for scientific tasks where domain knowledge is available.

Core claim

Deep learning evades numerous intuitions from classical statistics to achieve high performance; neural scaling laws interplay with the constraints and inductive biases present when applying machine learning to problems in physics, and a physics-informed perspective can justify many model choices while revealing key statistical features of training and generalization.

What carries the argument

Neural scaling laws, which capture power-law improvements in performance with model size, data volume, and compute, modulated by physics-derived constraints and inductive biases.

If this is right

  • Scaling laws can be used to forecast performance improvements when physics constraints are added to models.
  • Inductive biases from physics reduce the effective data requirements for generalization compared to generic deep learning.
  • Training dynamics in physics applications exhibit statistical regularities that classical statistics alone cannot predict.
  • Model architecture choices become justifiable when they respect physical symmetries or conservation laws.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid models that embed physics equations directly may exhibit distinct scaling regimes not captured by standard neural scaling laws.
  • The same perspective could be tested on non-physics domains by substituting domain-specific constraints for physical ones.
  • If the interplay holds, it predicts that removing physics biases from a trained model would degrade scaling behavior predictably.

Load-bearing premise

A physics-informed perspective can meaningfully justify choices in deep learning models and reveal key statistical features of training and generalization.

What would settle it

A controlled comparison showing that scaling exponents and generalization curves in physics tasks remain unchanged when all physical constraints and biases are removed would falsify the claimed interplay.

Figures

Figures reproduced from arXiv: 2606.20299 by Itay Lavie, Noam Levi, Yonatan Kahn.

Figure 1
Figure 1. Figure 1: Bias-variance trade-off and double descent. The test error at first decreases [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of benign overfitting, reproduced from [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: Test error summed over three different target functions as a function of the polynomial degree p. Colors indicate different inductive bias parameters k. The under-parametrized regime is highly sensitive to the parameter count, while the over-parametrized regime is largely insensitive to it. Right: Minimal (w.r.t. p) test error summed over three different target functions achieved in the under parame￾… view at source ↗
read the original abstract

Deep learning has managed to evade numerous intuitions from classical statistics to achieve unprecedented performance on a number of real-world tasks. In this article, we investigate the key features and surprises of deep learning from a physics-informed perspective, taking care to point out and justify where possible the many choices inherent in constructing a deep learning model. In particular, we review the phenomenon of neural scaling laws and discuss their interplay with the constraints and inductive biases which may be present when applying machine learning to problems in physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript investigates key features and surprises of deep learning from a physics-informed perspective. It reviews the phenomenon of neural scaling laws and discusses their interplay with constraints and inductive biases in physics applications, while taking care to justify choices in constructing deep learning models and noting how DL evades classical statistical intuitions.

Significance. If substantiated, the work could bridge statistical ML theory with physics applications by linking scaling laws to inductive biases and constraints, potentially aiding model design in scientific domains. The abstract signals an intent for careful discussion of modeling choices, which is a positive feature for a review-style contribution in stat.ML.

major comments (1)
  1. [Abstract] Abstract: the central claim that a physics-informed perspective meaningfully justifies DL modeling choices and reveals key statistical features of training/generalization is stated without specifying the validation method or evidence used; this is load-bearing for the paper's contribution and cannot be assessed from the provided abstract alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a physics-informed perspective meaningfully justifies DL modeling choices and reveals key statistical features of training/generalization is stated without specifying the validation method or evidence used; this is load-bearing for the paper's contribution and cannot be assessed from the provided abstract alone.

    Authors: The manuscript is a review-style contribution whose central claims are substantiated by synthesis and citation of the existing literature on neural scaling laws, inductive biases, and physics-constrained ML applications, as developed in the body of the paper. We agree that the abstract does not explicitly identify the review-based nature of the evidence or point to the specific bodies of work being synthesized. We will revise the abstract to state that the discussion draws on a review of the relevant empirical and theoretical literature. revision: yes

Circularity Check

0 steps flagged

No significant circularity; review perspective with no load-bearing derivations or self-referential fits

full rationale

The supplied abstract and context describe a review paper examining neural scaling laws and physics-informed choices in deep learning, without presenting equations, fitted parameters, predictions, or uniqueness theorems. No derivation chain is exhibited that reduces to its own inputs by construction, self-citation, or renaming. The central claim is a perspective on interplay between scaling laws and inductive biases, which remains independent of any internal fitting or self-citation load-bearing step. This matches the default expectation of a non-circular review format.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities stated. Ledger remains empty pending full text.

pith-pipeline@v0.9.1-grok · 5604 in / 882 out tokens · 12113 ms · 2026-06-26T15:34:04.986419+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

300 extracted references · 40 canonical work pages · 7 internal anchors

  1. [1]

    2024 , url =

    Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

  2. [2]

    2015 , howpublished=

    Keras , author=. 2015 , howpublished=

  3. [3]

    Paquette, Elliot and Paquette, Courtney and Xiao, Lechao and Pennington, Jeffrey , month = nov, year =. 4+3

  4. [4]

    Journal of Machine Learning Research , author =

    Scaling. Journal of Machine Learning Research , author =. 2022 , pages =

  5. [5]

    Bordelon, Blake and Atanasov, Alexander and Pehlevan, Cengiz , month = jun, year =. A. doi:10.48550/arXiv.2402.01092 , abstract =

  6. [6]

    Bordelon, Blake and Atanasov, Alexander and Pehlevan, Cengiz , month = sep, year =. How

  7. [7]

    Explaining

    Bahri, Yasaman and Dyer, Ethan and Kaplan, Jared and Lee, Jaehoon and Sharma, Utkarsh , month = feb, year =. Explaining. doi:10.48550/arXiv.2102.06701 , abstract =

  8. [8]

    Journal of Statistical Mechanics: Theory and Experiment , author =

    Scaling description of generalization with number of parameters in deep learning , volume =. Journal of Statistical Mechanics: Theory and Experiment , author =. 2020 , note =. doi:10.1088/1742-5468/ab633c , abstract =

  9. [9]

    Choromanska and M

    A. Choromanska and M. Henaff and M. Mathieu and G. B. Arous and Y. LeCun , year =. The Loss Surfaces of Multilayer Networks , publisher =

  10. [10]

    Draxler and K

    F. Draxler and K. Veschgini and M. Salmhofer and F. Hamprecht , year =. Essentially No Barriers in Neural Network Energy Landscapes , publisher =

  11. [11]

    Belkin and D

    M. Belkin and D. Hsu and S. Ma and S. Mandal , year =. Reconciling modern machine-learning practice and the classical bias-variance trade-off , journal =

  12. [12]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber , year =. Flat Minima , journal =

  13. [13]

    Kaplan and S

    J. Kaplan and S. McCandlish and T. Henighan and T. B. Brown and B. Chess and R. Child and S. GrayA , title =. 2020 , note =

  14. [14]

    Transactions on Machine Learning Research , author =

    The. Transactions on Machine Learning Research , author =

  15. [15]

    Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks , volume =

    Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks , volume =. Nature Communications , author =. 2021 , note =. doi:10.1038/s41467-021-23103-1 , abstract =

  16. [16]

    Physical Review Research , author =

    Learning curves for overparametrized deep neural networks:. Physical Review Research , author =. 2021 , note =. doi:10.1103/PhysRevResearch.3.023034 , abstract =

  17. [17]

    Nature Communications , author =

    Separation of scales and a thermodynamic description of feature learning in some. Nature Communications , author =. 2023 , note =. doi:10.1038/s41467-023-36361-y , abstract =

  18. [18]

    Rubin, Noa and Fischer, Kirsten and Lindner, Javed and Dahmen, David and Seroussi, Inbar and Ringel, Zohar and Krämer, Michael and Helias, Moritz , month = may, year =. From. doi:10.48550/arXiv.2502.03210 , abstract =

  19. [19]

    Applications of

    Ringel, Zohar and Rubin, Noa and Mor, Edo and Helias, Moritz and Seroussi, Inbar , month = apr, year =. Applications of. doi:10.48550/arXiv.2502.18553 , abstract =

  20. [20]

    Demystifying

    Lavie, Itay and Ringel, Zohar , month = feb, year =. Demystifying. doi:10.48550/arXiv.2406.02663 , abstract =

  21. [21]

    Adaptive kernel predictors from feature-learning infinite limits of neural networks , url =

    Lauditi, Clarissa and Bordelon, Blake and Pehlevan, Cengiz , month = sep, year =. Adaptive kernel predictors from feature-learning infinite limits of neural networks , url =. doi:10.48550/arXiv.2502.07998 , abstract =

  22. [22]

    Physical Review E , author =

    Jamming transition as a paradigm to understand the loss landscape of deep neural networks , volume =. Physical Review E , author =. doi:10.1103/PhysRevE.100.012115 , number =

  23. [23]

    , month = dec, year =

    Hastie, Trevor and Montanari, Andrea and Rosset, Saharon and Tibshirani, Ryan J. , month = dec, year =. Surprises in

  24. [24]

    and Saxe, Andrew M

    Advani, Madhu S. and Saxe, Andrew M. , month = oct, year =. High-dimensional dynamics of generalization error in neural networks , url =

  25. [25]

    Dynamics of

    Bös, Siegfried and Opper, Manfred , year =. Dynamics of. Advances in

  26. [26]

    Statistical

    Opper, Manfred and Kinzel, Wolfgang , editor =. Statistical. Models of. 1996 , doi =

  27. [27]

    Nakkiran, Preetum and Kaplun, Gal and Bansal, Yamini and Yang, Tristan and Barak, Boaz and Sutskever, Ilya , month = sep, year =. Deep

  28. [28]

    Advances in neural information processing systems , author =

    Implicit bias of gradient descent on linear convolutional networks , volume =. Advances in neural information processing systems , author =

  29. [29]

    and Simchowitz, Max and Jordan, Michael I

    Lee, Jason D. and Simchowitz, Max and Jordan, Michael I. and Recht, Benjamin , year =. Gradient descent only converges to minimizers , url =. Conference on learning theory , publisher =

  30. [30]

    Training Compute-Optimal Large Language Models

    Hoffmann, Jordan and Borgeaud, Sebastian and Mensch, Arthur and Buchatskaya, Elena and Cai, Trevor and Rutherford, Eliza and Casas, Diego de Las and Hendricks, Lisa Anne and Welbl, Johannes and Clark, Aidan and Hennigan, Tom and Noland, Eric and Millican, Katie and Driessche, George van den and Damoc, Bogdan and Guy, Aurelia and Osindero, Simon and Simony...

  31. [31]

    Levi, Noam Itzhak and Oz, Yaron , month = oct, year =. The. Proceedings of the 42nd

  32. [32]

    Yang, Ge and Hu, Edward and Babuschkin, Igor and Sidor, Szymon and Liu, Xiaodong and Farhi, David and Ryder, Nick and Pachocki, Jakub and Chen, Weizhu and Gao, Jianfeng , year =. Tuning. Advances in

  33. [33]

    Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    Bronstein, Michael M. and Bruna, Joan and Cohen, Taco and Veličković, Petar , month = may, year =. Geometric. doi:10.48550/arXiv.2104.13478 , abstract =

  34. [34]

    Depthwise

    Bordelon, Blake and Noci, Lorenzo and Li, Mufan Bill and Hanin, Boris and Pehlevan, Cengiz , month = oct, year =. Depthwise

  35. [35]

    Yang, Greg and Yu, Dingli and Zhu, Chen and Hayou, Soufiane , month = oct, year =. Tensor

  36. [36]

    Blake, Charlie and Eichenberg, Constantin and Dean, Josef and Balles, Lukas and Prince, Luke Yuri and Deiseroth, Björn and Cruz-Salinas, Andres Felipe and Luschi, Carlo and Weinbach, Samuel and Orr, Douglas , month = oct, year =. u-\

  37. [37]

    Haas, Moritz and Xu, Jin and Cevher, Volkan and Vankadara, Leena Chennuru , month = nov, year =. \

  38. [38]

    Don't be lazy:

    Dey, Nolan and Zhang, Bin Claire and Noci, Lorenzo and Li, Mufan and Bordelon, Blake and Bergsma, Shane and Pehlevan, Cengiz and Hanin, Boris and Hestness, Joel , month = oct, year =. Don't be lazy:. doi:10.48550/arXiv.2505.01618 , abstract =

  39. [39]

    Qiu, Shikai and Xiao, Lechao and Wilson, Andrew Gordon and Pennington, Jeffrey and Agarwala, Atish , month = jun, year =. Scaling

  40. [40]

    , month = jul, year =

    Yang, Greg and Hu, Edward J. , month = jul, year =. Tensor. Proceedings of the 38th

  41. [41]

    and Novak, Roman and Liu, Peter J

    Everett, Katie and Xiao, Lechao and Wortsman, Mitchell and Alemi, Alexander A. and Novak, Roman and Liu, Peter J. and Gur, Izzeddin and Sohl-Dickstein, Jascha and Kaelbling, Leslie Pack and Lee, Jaehoon and Pennington, Jeffrey , month = jul, year =. Scaling. doi:10.48550/arXiv.2407.05872 , abstract =

  42. [42]

    Ishikawa, Satoki and Karakida, Ryo , month = oct, year =. On the

  43. [43]

    Infinite

    Bordelon, Blake and Chaudhry, Hamza Tahir and Pehlevan, Cengiz , month = nov, year =. Infinite

  44. [44]

    Lingle, Lucas , month = feb, year =. An. doi:10.48550/arXiv.2404.05728 , abstract =

  45. [45]

    Transactions on Machine Learning Research , author =

    A thorough reproduction and evaluation of \. Transactions on Machine Learning Research , author =

  46. [46]

    Cerebras-

    Dey, Nolan and Gosal, Gurpreet and Zhiming and Chen and Khachane, Hemant and Marshall, William and Pathria, Ribhu and Tom, Marvin and Hestness, Joel , month = apr, year =. Cerebras-. doi:10.48550/arXiv.2304.03208 , abstract =

  47. [47]

    Decoupled

    Loshchilov, Ilya and Hutter, Frank , month = sep, year =. Decoupled

  48. [48]

    Sharpness-aware

    Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam , month = oct, year =. Sharpness-aware

  49. [49]

    Shampoo: Preconditioned Stochastic Tensor Optimization

    Gupta, Vineet and Koren, Tomer and Singer, Yoram , month = mar, year =. Shampoo:. doi:10.48550/arXiv.1802.09568 , abstract =

  50. [50]

    Proceedings of the 33rd

    Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Köpf, Andreas and Yang, Edward and DeVito, Zach and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and B...

  51. [51]

    Developers, TensorFlow , month = aug, year =

  52. [52]

    2026 , eprint=

    On the origin of neural scaling laws: from random graphs to natural language , author=. 2026 , eprint=

  53. [53]

    2023 , eprint=

    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit , author=. 2023 , eprint=

  54. [54]

    2022 , eprint=

    Meta-Principled Family of Hyperparameter Scaling Strategies , author=. 2022 , eprint=

  55. [55]

    2023 , eprint=

    Effective Theory of Transformers at Initialization , author=. 2023 , eprint=

  56. [56]

    Scaling laws for amplitude surrogates

    Bahl, Henning and Bres \'o -Pla, Victor and Butter, Anja and Ramirez, Joaqu \' n Iturriza. Scaling laws for amplitude surrogates. 2026. arXiv:2601.13308

  57. [57]

    Advances in Neural Information Processing Systems , year =

    Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization , author =. Advances in Neural Information Processing Systems , year =. 1406.2572 , archiveprefix=

  58. [58]

    Advances in Neural Information Processing Systems , year =

    Deep Learning without Poor Local Minima , author =. Advances in Neural Information Processing Systems , year =. 1605.07110 , archiveprefix=

  59. [59]

    Proceedings of the 34th International Conference on Machine Learning , year =

    How to Escape Saddle Points Efficiently , author =. Proceedings of the 34th International Conference on Machine Learning , year =. 1703.00887 , archiveprefix=

  60. [60]

    USSR Computational Mathematics and Mathematical Physics , volume =

    Some Methods of Speeding Up the Convergence of Iteration Methods , author =. USSR Computational Mathematics and Mathematical Physics , volume =

  61. [61]

    Introductory Lectures on Convex Optimization: A Basic Course , author =

  62. [62]

    Proceedings of the 30th International Conference on Machine Learning , year =

    On the Importance of Initialization and Momentum in Deep Learning , author =. Proceedings of the 30th International Conference on Machine Learning , year =. 1309.1019 , archiveprefix =

  63. [63]

    Advances in Neural Information Processing Systems , year =

    Loss Surfaces, Mode Connectivity, and Fast Ensembling of Deep Neural Networks , author =. Advances in Neural Information Processing Systems , year =. 1802.10026 , archiveprefix =

  64. [64]

    2014 , eprint =

    Adam: A Method for Stochastic Optimization , author =. 2014 , eprint =

  65. [65]

    2016 , eprint =

    On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author =. 2016 , eprint =

  66. [66]

    2017 , eprint =

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , author =. 2017 , eprint =

  67. [67]

    2018 , eprint =

    Don't Decay the Learning Rate, Increase the Batch Size , author =. 2018 , eprint =

  68. [68]

    Proceedings of the 33rd International Conference on Machine Learning , year =

    Train Faster, Generalize Better: Stability of Stochastic Gradient Descent , author =. Proceedings of the 33rd International Conference on Machine Learning , year =. 1509.01240 , archiveprefix =

  69. [69]

    Journal of Machine Learning Research , volume =

    Stochastic Gradient Descent as Approximate Bayesian Inference , author =. Journal of Machine Learning Research , volume =. 2017 , url =

  70. [70]

    2017 , eprint =

    SGDR: Stochastic Gradient Descent with Warm Restarts , author =. 2017 , eprint =

  71. [71]

    2017 IEEE Winter Conference on Applications of Computer Vision (WACV) , year =

    Cyclical Learning Rates for Training Neural Networks , author =. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) , year =

  72. [72]

    Proceedings of the 33rd International Conference on Machine Learning , year =

    Group Equivariant Convolutional Networks , author =. Proceedings of the 33rd International Conference on Machine Learning , year =. 1602.07576 , archiveprefix=

  73. [73]

    Advances in Neural Information Processing Systems , year =

    Deep Sets , author =. Advances in Neural Information Processing Systems , year =. 1703.06114 , archiveprefix=

  74. [74]

    Journal of High Energy Physics , year =

    Energy Flow Networks: Deep Sets for Particle Jets , author =. Journal of High Energy Physics , year =. doi:10.1007/JHEP01(2019)121 , eprint =

  75. [75]

    Physical Review D , year =

    ParticleNet: Jet Tagging via Particle Clouds , author =. Physical Review D , year =. doi:10.1103/PhysRevD.101.056019 , eprint =

  76. [76]

    Gaussian Processes for Machine Learning , author =

  77. [77]

    Active Learning Literature Survey , author =

  78. [78]

    Advances in Neural Information Processing Systems , year =

    Learning both Weights and Connections for Efficient Neural Network , author =. Advances in Neural Information Processing Systems , year =. 1506.02626 , archiveprefix=

  79. [79]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year =

    Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year =. 1712.05877 , archiveprefix=

  80. [80]

    2015 , eprint =

    Distilling the Knowledge in a Neural Network , author =. 2015 , eprint =

Showing first 80 references.