Deep Convolutional Networks as shallow Gaussian Processes

Adri\`a Garriga-Alonso; Carl Edward Rasmussen; Laurence Aitchison

arxiv: 1808.05587 · v2 · pith:APTFPWDTnew · submitted 2018-08-16 · 📊 stat.ML · cs.LG

Deep Convolutional Networks as shallow Gaussian Processes

Adri\`a Garriga-Alonso , Carl Edward Rasmussen , Laurence Aitchison This is my paper

classification 📊 stat.ML cs.LG

keywords kernelconvolutionalcomputeddeepequivalentgaussianlayernetworks

0 comments

read the original abstract

We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neural Operator: Graph Kernel Network for Partial Differential Equations
cs.LG 2020-03 unverdicted novelty 7.0

Graph Kernel Networks learn PDE solution operators that generalize across discretization methods and grid resolutions using graph-based kernel integration.
Viability of perturbative expansion for quantum field theories on neurons
hep-th 2025-08 unverdicted novelty 5.0

The work tests perturbative viability of single-layer neural networks for local QFTs at finite neuron number N in phi^4 theory, finding UV-cutoff-sensitive O(1/N) corrections with weak convergence and proposing a modi...