arxiv: 2604.27050 · v1 · submitted 2026-04-29 · ✦ hep-th · hep-ph

Recognition: unknown

Optimal Architecture and Fundamental Bounds in Neural Network Field Theory

Zhengkang Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:05 UTC · model grok-4.3

classification ✦ hep-th hep-ph

keywords neural network field theoryfinite width correctionsscalar field theorycorrelation functionsarchitectural parameterinfrared sensitivity

0 comments

The pith

For a massive scalar field, setting the architectural parameter α to zero in neural network field theory minimizes finite-width variance and removes infrared-sensitive corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural network field theory represents fields using neural networks whose parameters are drawn from a distribution. The paper identifies a free parameter α in the architecture that does not alter the infinite-width limit but controls how errors behave at finite width. For massive scalars, α equals zero turns out to be optimal because it suppresses problematic long-range corrections in the interacting theory and lowers the statistical variance in correlation functions. Even with this choice, relative errors grow exponentially with distance beyond the correlation length. The bias part of the error can be eliminated by going to infinite width, but the variance sets a basic limit on precision.

Core claim

The parameter α parametrizes how neuron momenta are chosen (proportional to the propagator raised to α) and how amplitudes scale with width. This choice leaves the infinite-width theory exactly the same for any α. At finite width, however, the variance and bias in correlation functions depend strongly on α. For a free massive scalar field, only α=0 eliminates the infrared divergence in the variance at large separations. In the interacting theory, α=0 is the unique value that removes the leading infrared-sensitive finite-width corrections to the effective potential.

What carries the argument

The architectural freedom α that weights the momenta of neurons in the neural network by the propagator to the power α while keeping the infinite-width theory fixed.

If this is right

Finite-width calculations of correlation functions become more accurate by choosing α=0.
The bias in finite-width results vanishes upon extrapolation to infinite network width.
Variance remains and imposes a fundamental bound on the signal-to-noise ratio achievable in NNFT, analogous to lattice field theory.
NNFT can be developed into a practical numerical method for studying field theories by using this optimal architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The optimality of α=0 might extend to other field theories or to higher-order correlation functions.
Other techniques could be combined with the α=0 choice to further reduce the variance bound.
This framework suggests NNFT could offer an alternative to traditional lattice simulations for certain observables if the exponential error growth can be managed.

Load-bearing premise

The infinite-width theory stays exactly the same no matter what value α takes, while the size of finite-width errors changes a lot with α.

What would settle it

Measure the variance of the two-point correlation function at distances much larger than the correlation length for several different values of α; the variance should be smallest and free of infrared divergence only at α=0.

Figures

Figures reproduced from arXiv: 2604.27050 by Zhengkang Zhang.

**Figure 1.** Figure 1: FIG. 1. Ratio of the NNFT four-point function to the free-theory prediction as a function of 1 view at source ↗

read the original abstract

Neural network field theory (NNFT) represents fields as neural networks and samples field configurations by drawing network parameters from a probability distribution. We identify a previously unexplored architectural freedom in NNFT, parameterized by $\alpha$, that leaves the infinite-width theory invariant but dramatically affects finite-width errors in the calculation of correlation functions. For a massive scalar field, we show that $\alpha=0$, corresponding to propagator-weighted neuron momenta and constant neuron amplitudes, is optimal: it minimizes finite-width variance and uniquely removes IR-sensitive corrections in the interacting theory. Even at $\alpha=0$, relative errors from both bias and variance grow exponentially with distance beyond the correlation length. The bias can be removed by extrapolating to infinite width, which we demonstrate numerically, while the variance imposes a fundamental bound on the achievable signal-to-noise ratio as in lattice field theory. These results chart a path toward developing NNFT into a practical tool for the numerical study of field theories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NNFT gets an architectural parameter α that leaves infinite width fixed but tunes finite-width variance and IR errors, with α=0 optimal for massive scalars.

read the letter

The main point is that this paper identifies a free parameter α in how neurons are set up in NNFT. It keeps the infinite-width theory exactly the same but changes how finite-width errors behave in correlation functions, and α=0 turns out best for a massive scalar because it cuts variance and removes IR-sensitive corrections in the interacting case. They also show numerically that bias can be subtracted by extrapolating to large width, while variance sets a hard SNR limit similar to lattice calculations. Even at the best α, errors still grow exponentially with distance past the correlation length. The work is narrow but the derivation of invariance and the optimality claim look derived rather than assumed. The numerics are only for scalars so far, and the exponential error growth remains a real practical cap. This is useful for the small set of people already using neural networks for field theory numerics. It gives a concrete handle on errors that prior NNFT papers did not have. I would send it to peer review because the central claim is specific, the invariance argument is clean, and the numerical check on bias removal is reproducible in principle.

Referee Report

0 major / 3 minor

Summary. The paper introduces an architectural parameter α in Neural Network Field Theory (NNFT), where fields are represented as neural networks with parameters drawn from a probability distribution. It claims that α leaves the infinite-width theory invariant while controlling finite-width errors in correlation functions. For a massive scalar field, α=0 (propagator-weighted neuron momenta and constant amplitudes) is shown to be optimal: it minimizes finite-width variance and uniquely eliminates IR-sensitive corrections in the interacting theory. Numerical extrapolation to infinite width removes bias, while variance imposes a fundamental SNR bound analogous to lattice field theory.

Significance. If the central claims hold, this work provides a concrete optimization for NNFT architectures that reduces finite-width artifacts without altering the infinite-width limit, strengthening NNFT as a potential numerical tool for field theories. The analytic identification of IR-sensitive terms and the numerical demonstration of bias removal via N→∞ extrapolation are notable strengths, as is the clear analogy to lattice SNR bounds. These elements chart a practical path forward if the derivations and numerics are robust.

minor comments (3)

[Interacting theory section] The abstract states that α=0 'uniquely removes IR-sensitive corrections in the interacting theory,' but the main text should include an explicit equation or derivation (e.g., in the section on the interacting theory) showing how this uniqueness follows from the α dependence rather than from a specific choice of measure or cutoff.
[Numerical results] The numerical demonstration of bias removal by infinite-width extrapolation is mentioned but lacks detail on the fitting procedure, number of widths sampled, and error bars; adding a dedicated subsection or table with these would strengthen verifiability without altering the central claim.
[Architecture definition] Notation for neuron momenta and amplitudes should be defined once at first use (likely near the definition of α) to avoid ambiguity when comparing α=0 to other values.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of our work, which correctly captures the role of the architectural parameter α in NNFT, the optimality of α=0 for minimizing finite-width variance and removing IR-sensitive corrections, and the analogy to lattice SNR bounds. The referee's assessment of the analytic and numerical strengths is appreciated. Since the report contains no specific major comments or requested changes, we have no points requiring rebuttal or revision.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces an architectural parameter α shown to leave the infinite-width NNFT invariant while controlling finite-width correlation errors. Optimality of α=0 for the massive scalar is derived by explicit minimization of variance and removal of IR-sensitive terms in the interacting theory, rather than by redefinition or fitting the quantity being optimized. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract or description. The bias/variance decomposition and SNR bound are presented as independent consequences of the finite-width expansion, with numerical extrapolation demonstrated separately. The chain therefore contains independent content and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents exhaustive enumeration; the central claim rests on the unstated assumption that the neural-network representation of fields admits a tunable α that decouples infinite- and finite-width behavior.

pith-pipeline@v0.9.0 · 5449 in / 1160 out tokens · 36506 ms · 2026-05-07T09:05:33.574795+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Anomalies in Neural Network Field Theory
hep-th 2026-05 unverdicted novelty 7.0

Derives Schwinger-Dyson equations and Ward identities in NN-FT to study anomalies in QFTs via a conserved parameter-space current, yielding a new perspective on symmetries.

Reference graph

Works this paper leans on

31 extracted references · 24 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Halverson, A

J. Halverson, A. Maiti, and K. Stoner, Neural Networks and Quantum Field Theory, Mach. Learn. Sci. Tech.2, 035002 (2021), arXiv:2008.08601 [cs.LG]

work page arXiv 2021
[2]

Halverson,Building Quantum Field Theories Out of Neurons,2112.04527

J. Halverson, Building Quantum Field Theories Out of Neurons (2021) arXiv:2112.04527 [hep-th]

work page arXiv 2021
[3]

Demirtas, J

M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz, and K. Stoner, Neural network field theories: non- Gaussianity, actions, and locality, Mach. Learn. Sci. Tech.5, 015002 (2024), arXiv:2307.03223 [hep-th]

work page arXiv 2024
[4]

Halverson, J

J. Halverson, J. Naskar, and J. Tian, Conformal fields from neural networks, JHEP10, 039, arXiv:2409.12222 [hep-th]

work page arXiv
[5]

Capuozzo, B

P. Capuozzo, B. Robinson, and B. Suzzoni, Confor- mal Defects in Neural Network Field Theories (2025) arXiv:2512.07946 [hep-th]

work page arXiv 2025
[6]

Robinson,Virasoro Symmetry in Neural Network Field Theories,2512.24420

B. Robinson, Virasoro Symmetry in Neural Network Field Theories (2025) arXiv:2512.24420 [hep-th]. 6

work page arXiv 2025
[7]

Huang and K

G. Huang and K. Zhou, The neural networks with ten- sor weights and emergent fermionic Wick rules in the large-width limit, Phys. Lett. B873, 140146 (2026), arXiv:2507.05303 [hep-th]

work page arXiv 2026
[8]

Frank, J

S. Frank, J. Halverson, A. Maiti, and F. Ruehle, Fermions and Supersymmetry in Neural Network Field Theories (2025) arXiv:2511.16741 [hep-th]

work page arXiv 2025
[9]

Frank and J

S. Frank and J. Halverson, String Theory from Infinite Width Neural Networks (2026) arXiv:2601.06249 [hep- th]

work page arXiv 2026
[10]

D. S. Ageev and Y. A. Ageeva, Excited String States and D-branes from Infinite Width Neural Networks (2026) arXiv:2602.10214 [hep-th]

work page arXiv 2026
[11]

Non-perturbative renormalization for the neural network-QFT correspondence

H. Erbin, V. Lahoche, and D. O. Samary, Non- perturbative renormalization for the neural network- QFT correspondence, Mach. Learn. Sci. Tech.3, 015027 (2022), arXiv:2108.01403 [hep-th]

work page arXiv 2022
[12]

Erbin, V

H. Erbin, V. Lahoche, and D. O. Samary, Renormaliza- tion in the neural network-quantum field theory corre- spondence (2022) arXiv:2212.11811 [hep-th]

work page arXiv 2022
[13]

J. N. Howard, M. S. Klinger, A. Maiti, and A. G. Staple- ton, Bayesian RG flow in neural network field theories, SciPost Phys. Core8, 027 (2025), arXiv:2405.17538 [hep- th]

work page arXiv 2025
[14]

Halverson,TASI Lectures on Physics for Machine Learning,2408.00082

J. Halverson, TASI Lectures on Physics for Machine Learning (2024) arXiv:2408.00082 [hep-th]

work page arXiv 2024
[15]

Ferko and J

C. Ferko and J. Halverson, Quantum mechanics and neu- ral networks, Mach. Learn. Sci. Tech.7, 015002 (2026), arXiv:2504.05462 [hep-th]

work page arXiv 2026
[16]

Ferko, J

C. Ferko, J. Halverson, and A. Mutchler, Universality of Neural Network Field Theory (2026) arXiv:2601.14453 [hep-th]

work page arXiv 2026
[17]

D. S. Ageev and Y. A. Ageeva, Neural Network Quan- tum Field Theory from Transformer Architectures (2026) arXiv:2602.10209 [cs.LG]

work page arXiv 2026
[18]

Topological Effects in Neural Network Field Theory

C. Ferko, J. Halverson, V. Jejjala, and B. Robinson, Topological Effects in Neural Network Field Theory (2026) arXiv:2604.02313 [hep-th]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

Maiti, K

A. Maiti, K. Stoner, and J. Halverson, Symmetry- via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators (2021) arXiv:2106.00694 [cs.LG]

work page arXiv 2021
[20]

R. M. Neal,Bayesian Learning for Neural Networks, Lec- ture Notes in Statistics, Vol. 118 (Springer, 1996)

1996
[21]

C. K. I. Williams, Computing with infinite networks, inAdvances in Neural Information Processing Systems, Vol. 9, edited by M. Mozer, M. Jordan, and T. Petsche (MIT Press, 1996)

1996
[22]

J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pen- nington, and J. Sohl-Dickstein, Deep Neural Networks as Gaussian Processes (2017) arXiv:1711.00165 [stat.ML]

work page Pith review arXiv 2017
[23]

A. G. d. G. Matthews, M. Rowland, J. Hron, R. E. Turner, and Z. Ghahramani, Gaussian process behaviour in wide deep neural networks (2018) arXiv:1804.11271 [stat.ML]

work page arXiv 2018
[24]

Yang, Tensor programs i: Wide feedforward or re- current neural networks of any architecture are gaussian processes (2019) arXiv:1910.12478 [cs.NE]

G. Yang, Tensor programs i: Wide feedforward or re- current neural networks of any architecture are gaussian processes (2019) arXiv:1910.12478 [cs.NE]

work page arXiv 2019
[25]

Hanin, Random neural networks in the infinite width limit as gaussian processes (2021) arXiv:2107.01562 [math.PR]

B. Hanin, Random neural networks in the infinite width limit as gaussian processes (2021) arXiv:2107.01562 [math.PR]

work page arXiv 2021
[26]

Sen and V

S. Sen and V. Vaidya, Viability of perturbative ex- pansion for quantum field theories on neurons (2025) arXiv:2508.03810 [hep-th]

work page arXiv 2025
[27]

Rahimi and B

A. Rahimi and B. Recht, Random features for large-scale kernel machines, inAdvances in Neural Information Pro- cessing Systems, Vol. 20, edited by J. Platt, D. Koller, Y. Singer, and S. Roweis (Curran Associates, Inc., 2007) pp. 1177–1184

2007
[28]

[9] in the con- text of the 2d free boson

This freedom was noted in passing in Ref. [9] in the con- text of the 2d free boson
[29]

[2, 3] instead gives⟨|a i|2n⟩= (2n−1)!!⟨|a i|2⟩n, which amplifies the bias and variance by numerical prefactors but does not change the parametric dependence onα

Sampling|a i|from a Gaussian as in Refs. [2, 3] instead gives⟨|a i|2n⟩= (2n−1)!!⟨|a i|2⟩n, which amplifies the bias and variance by numerical prefactors but does not change the parametric dependence onα
[30]

Parisi, The Strategy for Computing the Hadronic Mass Spectrum, Phys

G. Parisi, The Strategy for Computing the Hadronic Mass Spectrum, Phys. Rept.103, 203 (1984)

1984
[31]

G. P. Lepage, The Analysis of Algorithms for Lattice Field Theory, inTheoretical Advanced Study Institute in Elementary Particle Physics(1989). END MA TTER Appendix A: Correlation functions for i.i.d. neurons — To derive the four-point function in Eq. (9), we start fromϕ(x) = 1√ N P i ϕi(x) and write G(4)(x1, . . . , x4) = 1 N2 X ijkl ϕi(x1)ϕj(x2)ϕk(x3)ϕl...

1989