arXiv preprint arXiv:1804.11271 , year=

· 2018 · stat.ML · arXiv 1804.11271

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architecture increasingly wide, the implied random function converges in distribution to a Gaussian process, formalising and extending existing results by Neal (1996) to deep networks. To evaluate convergence rates empirically, we use maximum mean discrepancy. We then compare finite Bayesian deep networks from the literature to Gaussian processes in terms of the key predictive quantities of interest, finding that in some cases the agreement can be very close. We discuss the desirability of Gaussian process behaviour and review non-Gaussian alternative models from the literature.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

math.PR · 2026-04-29 · unverdicted · novelty 7.0

Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

Optimal Architecture and Fundamental Bounds in Neural Network Field Theory

hep-th · 2026-04-29 · unverdicted · novelty 6.0

α=0 architecture in NNFT minimizes finite-width variance, removes IR corrections, and sets a fundamental SNR bound for correlation functions in scalar field theory.

Viability of perturbative expansion for quantum field theories on neurons

hep-th · 2025-08-05 · unverdicted · novelty 5.0

The work tests perturbative viability of single-layer neural networks for local QFTs at finite neuron number N in phi^4 theory, finding UV-cutoff-sensitive O(1/N) corrections with weak convergence and proposing a modification for better scaling.

The Neural Tangent Kernel for Classification

cs.LG · 2026-05-17

citing papers explorer

Showing 6 of 6 citing papers.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences cs.LG · 2026-05-06 · unverdicted · none · ref 28
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models math.PR · 2026-04-29 · unverdicted · none · ref 17
Transformers converge pathwise to a stochastic particle system and SPDE in the scaling limit, exhibiting synchronization by noise and exponential energy dissipation when common noise is coercive relative to self-attention drift.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 59
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Optimal Architecture and Fundamental Bounds in Neural Network Field Theory hep-th · 2026-04-29 · unverdicted · none · ref 23
α=0 architecture in NNFT minimizes finite-width variance, removes IR corrections, and sets a fundamental SNR bound for correlation functions in scalar field theory.
Viability of perturbative expansion for quantum field theories on neurons hep-th · 2025-08-05 · unverdicted · none · ref 26 · internal anchor
The work tests perturbative viability of single-layer neural networks for local QFTs at finite neuron number N in phi^4 theory, finding UV-cutoff-sensitive O(1/N) corrections with weak convergence and proposing a modification for better scaling.
The Neural Tangent Kernel for Classification cs.LG · 2026-05-17 · unreviewed · ref 5 · internal anchor

arXiv preprint arXiv:1804.11271 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer