An exact mapping between the Variational Renormalization Group and Deep Learning
read the original abstract
Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
-
Dreaming up scale invariance via inverse renormalization group
Small neural networks invert the RG coarse-graining in the 2D Ising model to probabilistically generate critical configurations that reproduce scaling observables and nontrivial RG eigenvalues.
-
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.
-
Renormalization group for spectral collapse in random matrices with power-law variance profiles
A renormalization group scheme with running normalization collapses eigenvalue spectra of Wigner and Wishart matrices modified by power-law variance profiles, confirmed via fixed-point equations and simulations.
-
Group Convolutional Neural Network for the Low-Energy Spectrum in the Quantum Dimer Model
GCNN variational states optimized with directed-loop sampling yield a 4-fold degenerate ground state for V ≤ 0.4 in the quantum dimer model, with benchmarks matching ED and QMC up to L=32.
-
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.
-
Lecture Notes on Statistical Physics and Neural Networks
Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.