An exact mapping between the Variational Renormalization Group and Deep Learning

David J. Schwab; Pankaj Mehta

arxiv: 1410.3831 · v1 · pith:FT4B6HEWnew · submitted 2014-10-14 · 📊 stat.ML · cond-mat.stat-mech· cs.LG· cs.NE

An exact mapping between the Variational Renormalization Group and Deep Learning

Pankaj Mehta , David J. Schwab This is my paper

classification 📊 stat.ML cond-mat.stat-mechcs.LGcs.NE

keywords learningdeeptechniquesfeaturesgrouprelevantrenormalizationdata

0 comments

read the original abstract

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
stat.ML 2026-05 accept novelty 7.0

A solvable hierarchical model with power-law feature strengths yields explicit power-law scaling of prediction error through sequential recovery of latent directions by a layer-wise spectral algorithm.
Dreaming up scale invariance via inverse renormalization group
cond-mat.stat-mech 2025-06 conditional novelty 7.0

Small neural networks invert the RG coarse-graining in the 2D Ising model to probabilistically generate critical configurations that reproduce scaling observables and nontrivial RG eigenvalues.
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
cs.LG 2026-05 unverdicted novelty 6.0

Frozen text-pretrained transformer weights transfer across modalities through a thin interface, achieving SOTA on a robotic task and parity on decision-making with far fewer trainable parameters.
Renormalization group for spectral collapse in random matrices with power-law variance profiles
cond-mat.stat-mech 2025-12 unverdicted novelty 6.0

A renormalization group scheme with running normalization collapses eigenvalue spectra of Wigner and Wishart matrices modified by power-law variance profiles, confirmed via fixed-point equations and simulations.
Group Convolutional Neural Network for the Low-Energy Spectrum in the Quantum Dimer Model
cond-mat.dis-nn 2025-05 conditional novelty 6.0

GCNN variational states optimized with directed-loop sampling yield a 4-fold degenerate ground state for V ≤ 0.4 in the quantum dimer model, with benchmarks matching ED and QMC up to L=32.
Borrowed Geometry: Cross-Distribution Head-Importance Fingerprints of Frozen Pretrained Gemma 4 31B
cs.LG 2026-05 unverdicted novelty 5.0

Four heads (L26.28, L27.28, L27.2, L27.3) in frozen Gemma 4 31B exhibit joint high importance on text and non-text tasks with hypergeometric significance (P=0.0013) and causal validation on a cube task.
Lecture Notes on Statistical Physics and Neural Networks
cond-mat.dis-nn 2026-05 unverdicted novelty 2.0

Lecture notes that treat statistical physics as probability theory and connect Ising models, spin glasses, and renormalization group ideas to Hopfield networks, restricted Boltzmann machines, and large language models.