Recognition: unknown
Optimal Architecture and Fundamental Bounds in Neural Network Field Theory
Pith reviewed 2026-05-07 09:05 UTC · model grok-4.3
The pith
For a massive scalar field, setting the architectural parameter α to zero in neural network field theory minimizes finite-width variance and removes infrared-sensitive corrections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The parameter α parametrizes how neuron momenta are chosen (proportional to the propagator raised to α) and how amplitudes scale with width. This choice leaves the infinite-width theory exactly the same for any α. At finite width, however, the variance and bias in correlation functions depend strongly on α. For a free massive scalar field, only α=0 eliminates the infrared divergence in the variance at large separations. In the interacting theory, α=0 is the unique value that removes the leading infrared-sensitive finite-width corrections to the effective potential.
What carries the argument
The architectural freedom α that weights the momenta of neurons in the neural network by the propagator to the power α while keeping the infinite-width theory fixed.
If this is right
- Finite-width calculations of correlation functions become more accurate by choosing α=0.
- The bias in finite-width results vanishes upon extrapolation to infinite network width.
- Variance remains and imposes a fundamental bound on the signal-to-noise ratio achievable in NNFT, analogous to lattice field theory.
- NNFT can be developed into a practical numerical method for studying field theories by using this optimal architecture.
Where Pith is reading between the lines
- The optimality of α=0 might extend to other field theories or to higher-order correlation functions.
- Other techniques could be combined with the α=0 choice to further reduce the variance bound.
- This framework suggests NNFT could offer an alternative to traditional lattice simulations for certain observables if the exponential error growth can be managed.
Load-bearing premise
The infinite-width theory stays exactly the same no matter what value α takes, while the size of finite-width errors changes a lot with α.
What would settle it
Measure the variance of the two-point correlation function at distances much larger than the correlation length for several different values of α; the variance should be smallest and free of infrared divergence only at α=0.
Figures
read the original abstract
Neural network field theory (NNFT) represents fields as neural networks and samples field configurations by drawing network parameters from a probability distribution. We identify a previously unexplored architectural freedom in NNFT, parameterized by $\alpha$, that leaves the infinite-width theory invariant but dramatically affects finite-width errors in the calculation of correlation functions. For a massive scalar field, we show that $\alpha=0$, corresponding to propagator-weighted neuron momenta and constant neuron amplitudes, is optimal: it minimizes finite-width variance and uniquely removes IR-sensitive corrections in the interacting theory. Even at $\alpha=0$, relative errors from both bias and variance grow exponentially with distance beyond the correlation length. The bias can be removed by extrapolating to infinite width, which we demonstrate numerically, while the variance imposes a fundamental bound on the achievable signal-to-noise ratio as in lattice field theory. These results chart a path toward developing NNFT into a practical tool for the numerical study of field theories.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces an architectural parameter α in Neural Network Field Theory (NNFT), where fields are represented as neural networks with parameters drawn from a probability distribution. It claims that α leaves the infinite-width theory invariant while controlling finite-width errors in correlation functions. For a massive scalar field, α=0 (propagator-weighted neuron momenta and constant amplitudes) is shown to be optimal: it minimizes finite-width variance and uniquely eliminates IR-sensitive corrections in the interacting theory. Numerical extrapolation to infinite width removes bias, while variance imposes a fundamental SNR bound analogous to lattice field theory.
Significance. If the central claims hold, this work provides a concrete optimization for NNFT architectures that reduces finite-width artifacts without altering the infinite-width limit, strengthening NNFT as a potential numerical tool for field theories. The analytic identification of IR-sensitive terms and the numerical demonstration of bias removal via N→∞ extrapolation are notable strengths, as is the clear analogy to lattice SNR bounds. These elements chart a practical path forward if the derivations and numerics are robust.
minor comments (3)
- [Interacting theory section] The abstract states that α=0 'uniquely removes IR-sensitive corrections in the interacting theory,' but the main text should include an explicit equation or derivation (e.g., in the section on the interacting theory) showing how this uniqueness follows from the α dependence rather than from a specific choice of measure or cutoff.
- [Numerical results] The numerical demonstration of bias removal by infinite-width extrapolation is mentioned but lacks detail on the fitting procedure, number of widths sampled, and error bars; adding a dedicated subsection or table with these would strengthen verifiability without altering the central claim.
- [Architecture definition] Notation for neuron momenta and amplitudes should be defined once at first use (likely near the definition of α) to avoid ambiguity when comparing α=0 to other values.
Simulated Author's Rebuttal
We thank the referee for their positive and accurate summary of our work, which correctly captures the role of the architectural parameter α in NNFT, the optimality of α=0 for minimizing finite-width variance and removing IR-sensitive corrections, and the analogy to lattice SNR bounds. The referee's assessment of the analytic and numerical strengths is appreciated. Since the report contains no specific major comments or requested changes, we have no points requiring rebuttal or revision.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper introduces an architectural parameter α shown to leave the infinite-width NNFT invariant while controlling finite-width correlation errors. Optimality of α=0 for the massive scalar is derived by explicit minimization of variance and removal of IR-sensitive terms in the interacting theory, rather than by redefinition or fitting the quantity being optimized. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract or description. The bias/variance decomposition and SNR bound are presented as independent consequences of the finite-width expansion, with numerical extrapolation demonstrated separately. The chain therefore contains independent content and does not reduce to its inputs by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Anomalies in Neural Network Field Theory
Derives Schwinger-Dyson equations and Ward identities in NN-FT to study anomalies in QFTs via a conserved parameter-space current, yielding a new perspective on symmetries.
Reference graph
Works this paper leans on
-
[1]
J. Halverson, A. Maiti, and K. Stoner, Neural Networks and Quantum Field Theory, Mach. Learn. Sci. Tech.2, 035002 (2021), arXiv:2008.08601 [cs.LG]
-
[2]
Halverson,Building Quantum Field Theories Out of Neurons,2112.04527
J. Halverson, Building Quantum Field Theories Out of Neurons (2021) arXiv:2112.04527 [hep-th]
-
[3]
M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz, and K. Stoner, Neural network field theories: non- Gaussianity, actions, and locality, Mach. Learn. Sci. Tech.5, 015002 (2024), arXiv:2307.03223 [hep-th]
-
[4]
J. Halverson, J. Naskar, and J. Tian, Conformal fields from neural networks, JHEP10, 039, arXiv:2409.12222 [hep-th]
-
[5]
P. Capuozzo, B. Robinson, and B. Suzzoni, Confor- mal Defects in Neural Network Field Theories (2025) arXiv:2512.07946 [hep-th]
-
[6]
Robinson,Virasoro Symmetry in Neural Network Field Theories,2512.24420
B. Robinson, Virasoro Symmetry in Neural Network Field Theories (2025) arXiv:2512.24420 [hep-th]. 6
-
[7]
G. Huang and K. Zhou, The neural networks with ten- sor weights and emergent fermionic Wick rules in the large-width limit, Phys. Lett. B873, 140146 (2026), arXiv:2507.05303 [hep-th]
- [8]
-
[9]
S. Frank and J. Halverson, String Theory from Infinite Width Neural Networks (2026) arXiv:2601.06249 [hep- th]
- [10]
-
[11]
Non-perturbative renormalization for the neural network-QFT correspondence
H. Erbin, V. Lahoche, and D. O. Samary, Non- perturbative renormalization for the neural network- QFT correspondence, Mach. Learn. Sci. Tech.3, 015027 (2022), arXiv:2108.01403 [hep-th]
- [12]
- [13]
-
[14]
Halverson,TASI Lectures on Physics for Machine Learning,2408.00082
J. Halverson, TASI Lectures on Physics for Machine Learning (2024) arXiv:2408.00082 [hep-th]
-
[15]
C. Ferko and J. Halverson, Quantum mechanics and neu- ral networks, Mach. Learn. Sci. Tech.7, 015002 (2026), arXiv:2504.05462 [hep-th]
- [16]
- [17]
-
[18]
Topological Effects in Neural Network Field Theory
C. Ferko, J. Halverson, V. Jejjala, and B. Robinson, Topological Effects in Neural Network Field Theory (2026) arXiv:2604.02313 [hep-th]
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [19]
-
[20]
R. M. Neal,Bayesian Learning for Neural Networks, Lec- ture Notes in Statistics, Vol. 118 (Springer, 1996)
1996
-
[21]
C. K. I. Williams, Computing with infinite networks, inAdvances in Neural Information Processing Systems, Vol. 9, edited by M. Mozer, M. Jordan, and T. Petsche (MIT Press, 1996)
1996
-
[22]
J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pen- nington, and J. Sohl-Dickstein, Deep Neural Networks as Gaussian Processes (2017) arXiv:1711.00165 [stat.ML]
work page Pith review arXiv 2017
- [23]
-
[24]
G. Yang, Tensor programs i: Wide feedforward or re- current neural networks of any architecture are gaussian processes (2019) arXiv:1910.12478 [cs.NE]
-
[25]
B. Hanin, Random neural networks in the infinite width limit as gaussian processes (2021) arXiv:2107.01562 [math.PR]
- [26]
-
[27]
Rahimi and B
A. Rahimi and B. Recht, Random features for large-scale kernel machines, inAdvances in Neural Information Pro- cessing Systems, Vol. 20, edited by J. Platt, D. Koller, Y. Singer, and S. Roweis (Curran Associates, Inc., 2007) pp. 1177–1184
2007
-
[28]
[9] in the con- text of the 2d free boson
This freedom was noted in passing in Ref. [9] in the con- text of the 2d free boson
-
[29]
[2, 3] instead gives⟨|a i|2n⟩= (2n−1)!!⟨|a i|2⟩n, which amplifies the bias and variance by numerical prefactors but does not change the parametric dependence onα
Sampling|a i|from a Gaussian as in Refs. [2, 3] instead gives⟨|a i|2n⟩= (2n−1)!!⟨|a i|2⟩n, which amplifies the bias and variance by numerical prefactors but does not change the parametric dependence onα
-
[30]
Parisi, The Strategy for Computing the Hadronic Mass Spectrum, Phys
G. Parisi, The Strategy for Computing the Hadronic Mass Spectrum, Phys. Rept.103, 203 (1984)
1984
-
[31]
G. P. Lepage, The Analysis of Algorithms for Lattice Field Theory, inTheoretical Advanced Study Institute in Elementary Particle Physics(1989). END MA TTER Appendix A: Correlation functions for i.i.d. neurons — To derive the four-point function in Eq. (9), we start fromϕ(x) = 1√ N P i ϕi(x) and write G(4)(x1, . . . , x4) = 1 N2 X ijkl ϕi(x1)ϕj(x2)ϕk(x3)ϕl...
1989
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.