Universal Spin Models are Universal Approximators in Machine Learning

Gemma De les Coves; Tobias Reinhart

arxiv: 2507.07669 · v2 · submitted 2025-07-10 · ❄️ cond-mat.dis-nn

Universal Spin Models are Universal Approximators in Machine Learning

Tobias Reinhart , Gemma De les Coves This is my paper

Pith reviewed 2026-05-19 05:53 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn

keywords universal approximationspin modelsBoltzmann machinesprobability distributionsIsing modelmachine learninguniversal spin models

0 comments

The pith

Universal spin models can approximate any probability distribution to arbitrary precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a spin model qualifies as universal for approximating probability distributions precisely when it can reproduce the low-energy behavior of any other spin model. This link lets researchers borrow the known sufficient and necessary conditions for spin-model universality and apply them directly to check whether machine-learning architectures reach universal approximation. A sympathetic reader cares because the result supplies a single, transferable test instead of separate proofs for each new model. The authors verify the test on restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks.

Core claim

We prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks.

What carries the argument

Universal spin models, defined as those able to reproduce the low-energy sector of any other spin model, shown to serve equally as universal approximators for arbitrary probability distributions.

If this is right

Simple models such as the 2D Ising model with fields become universal approximators of probability distributions.
Restricted Boltzmann machines satisfy the transferred conditions and are therefore universal approximators.
Deep Boltzmann machines and deep belief networks likewise meet the sufficient conditions.
New universal-approximation results for other architectures reduce to checking the same spin-model conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer technique could be applied to other physical systems whose universality has already been characterized in statistical mechanics.
Training dynamics of these models may acquire new interpretations drawn from the low-energy sector analysis of spin systems.

Load-bearing premise

The prior sufficient and necessary conditions that define universal spin models transfer without change to the problem of approximating probability distributions.

What would settle it

A concrete counter-example would be any probability distribution that the two-dimensional Ising model with fields cannot approximate to arbitrary accuracy, since that model is already known to be a universal spin model.

Figures

Figures reproduced from arXiv: 2507.07669 by Gemma De les Coves, Tobias Reinhart.

**Figure 2.** Figure 2: If S simulates T, the spectrum of S equals that of T below the cut-off ∆ up to a shift −Γ. cal, binary degrees of freedom, or ‘spins’ interacting according to a hypergraph (V, E). Each spin is associated to a vertex v ∈ V , and each interaction between spins is associated to a hyperedge e ∈ E and given by a local energy function. A configuration s assigns a number from {0, 1} to each spin. Mapping config… view at source ↗

read the original abstract

One of the theoretical pillars that sustain certain machine learning models are universal approximation theorems, which prove that they can approximate all functions from a function class to arbitrary precision. Independently, classical spin models are termed universal if they can reproduce the behavior of any other spin model in their low energy sector. Universal spin models have been characterized via sufficient and necessary conditions, showing that simple models such as the 2d Ising with fields are universal. In this work, we prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks. This work illustrates that independently discovered universality statements may be intimately related, enabling the transfer of results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows universal spin models can approximate any probability distribution via their Boltzmann measure and turns the known spin-universality conditions into a recipe for ML models like RBMs.

read the letter

This paper makes a direct connection between two kinds of universality that have been studied separately. It shows that spin models known to be universal in their ability to reproduce the low-energy physics of any other spin model can also approximate any probability distribution through their Boltzmann distribution. The authors then use the known characterization of such universal spin models to give sufficient conditions for universal approximation in machine learning models. They do a good job of spelling out the transfer. Starting from the conditions that make a spin model universal, like having enough flexibility in couplings and fields to embed arbitrary targets, they argue that this flexibility carries over to making the normalized exp(-E) dense in the space of distributions. They then check these conditions for restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks, which gives a clean way to see why those architectures work as universal approximators. The strength is in the unification. Instead of proving approximation power from scratch for each model, you verify the spin universality conditions. This could save time and reveal new models that work. The explicit tests add some evidence that the recipe is practical. A potential soft spot is in the details of the proof. The stress test raises a fair point about whether matching low-energy sectors is enough to control the full distribution. If the embedding uses auxiliary spins that affect higher states in uncontrolled ways, or if the partition function normalization doesn't allow independent tuning, the density might not follow automatically. The paper presumably addresses this with a specific reduction, but it would be important to see if that reduction is tight or if it assumes something extra about the energy landscape. If that part is solid, the rest holds up. Overall, this is for people who work on the theory of energy-based models or who want to import ideas from statistical mechanics into machine learning. It is not a broad empirical paper but a conceptual one that could help organize future proofs. I recommend putting it through peer review. The idea is worth checking out in detail, and the authors have done enough to make it a serious submission.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to prove that universal spin models (those able to reproduce the low-energy sector of arbitrary spin models, with the 2D Ising model with fields as an example) are universal approximators of probability distributions. This link is used to obtain sufficient conditions for universal approximation in machine learning models, which are then verified and tested explicitly for restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks.

Significance. If the central claim holds, the work supplies a unified recipe for deriving universal approximation theorems by reducing the problem to checking known spin-model universality conditions. This could streamline theoretical analysis across architectures and facilitate identification of new universal models. The explicit tests on RBMs, DBMs, and DBNs provide concrete support for the recipe's applicability.

major comments (2)

[Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.
[Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.

minor comments (2)

[Notation and definitions] Clarify notation for the target distribution q(s), the model distribution p(s), and the energy function E throughout; inconsistent symbols appear in the abstract and early sections.
[Introduction] Add explicit citations to the foundational papers characterizing universal spin models (sufficient and necessary conditions) in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. We address each major comment below and have made revisions to clarify the points raised.

read point-by-point responses

Referee: [Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.

Authors: We thank the referee for highlighting this important aspect of the proof. In Section 3 of the manuscript, the main theorem establishes that if a spin model is universal in the low-energy sector, then by sufficiently increasing the energy gap to higher states (via rescaling or temperature adjustment), the Boltzmann distribution can be made arbitrarily close in total variation distance to any target distribution that is uniform over the ground states of an embedded Hamiltonian. This is because the probability mass on excited states can be bounded by exp(-Δβ), where Δ is the gap, which can be made large. We have added an explicit lemma and proof sketch in the revised version to make this reduction clear, including the control over the partition function. revision: yes
Referee: [Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.

Authors: We agree that the applications section would benefit from more explicit details. In the revised manuscript, we have expanded the sections on RBMs, DBMs, and DBNs to include the precise parameter mappings from the 2D Ising model with fields to the model parameters (e.g., weights and biases). We also describe how the auxiliary spins are integrated out to obtain the marginal distribution over the visible units, showing that it matches the standard form for these models. This makes the derivation of the known universality results direct from our recipe. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation transfers external spin-model characterization to distribution approximation without self-referential reduction

full rationale

The paper's central step is to prove that models satisfying the known sufficient/necessary conditions for universal spin models (e.g., 2D Ising with fields) can realize Boltzmann distributions dense in the space of target distributions. This rests on an external prior characterization of low-energy universality rather than on any quantity defined in terms of the target approximation result itself. No equations rename a fitted parameter as a prediction, no self-citation is load-bearing for the uniqueness or ansatz, and the derivation chain does not reduce by construction to its inputs. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the pre-existing characterization of universal spin models and on standard mathematical notions of universal approximation; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Universal spin models can reproduce the behavior of any other spin model in their low energy sector.
This is the definition of universality for spin models invoked to transfer results to probability approximation.

pith-pipeline@v0.9.0 · 5693 in / 1226 out tokens · 32756 ms · 2026-05-19T05:53:42.072008+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Universal spin models are universal approximators. ... Simulations preserve Boltzmann distributions (Lemma 10)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

[1]

each target configuration t with energy at most ∆ can be uniquely extended to a source configuration (t, ht) with equal energy, and

work page
[2]

If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT

all other source configurations have energy at least ∆. If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT . We say that S simulates a function f on {0, 1}V if S simulates any spin system with Hamiltonian f. This implies that for all configurations t on V , up to a shift, HS(t, ht) =∆ f(t),...

work page
[3]

it is flag complete, and

work page
[4]

only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if

it is closed for any subset of flag systems with dis- joint flag spins. Proof. The “only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if” direction constructively. We denote the flag system for configuration x by Sx. Without loss 3 of generalit...

work page
[5]

Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

E. Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

work page 1925
[6]

Reinhart and G

T. Reinhart and G. De les Coves, The grammar of the 7 Ising model: A new complexity hierarchy, Proc. R. Soc. A 481, 20240579 (2025)

work page 2025
[7]

Boltzmann machines and energy-based models

T. Osogami, Boltzmann machines and energy-based models, arXiv:1708.06008 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[8]

Le Roux and Y

N. Le Roux and Y. Bengio, Representational power of restricted boltzmann machines and deep belief networks, Neural Computation 20, 1631 (2008)

work page 2008
[9]

Le Roux and Y

N. Le Roux and Y. Bengio, Deep belief networks are com- pact universal approximators, Neural Computation 22, 2192 (2010)

work page 2010
[10]

Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

G. Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

work page 2015
[11]

Mont´ ufar and J

G. Mont´ ufar and J. Rauh, Hierarchical models as marginals of hierarchical models, International Journal of Approximate Reasoning 88, 531 (2017)

work page 2017
[12]

De las Cuevas and T

G. De las Cuevas and T. S. Cubitt, Simple universal mod- els capture all classical spin physics, Science 351, 1180 (2016)

work page 2016
[13]

Reinhart, B

T. Reinhart, B. Engel, and G. D. les Coves, The structure of emulations in classical spin models: Modularity and universality, arXiv:2407.13428 (2024)

work page arXiv 2024
[14]

Examples are DBMs and RBMs

Specifically, we mean machine learning models that are defined by an energy function acting on binary spins such that their output is the corresponding Boltzmann distribution over visible spins. Examples are DBMs and RBMs

work page
[15]

S1 ∩ S2 = T1 ∩ T2

By disjoint auxiliary spins we mean that the overlap is at most between physical spins, i.e. S1 ∩ S2 = T1 ∩ T2

work page
[16]

We denote by 0 the all zero and by 1 the all one vector

work page
[17]

By changing couplings from R to R′ we mean that R and R′ are defined on the same interaction hypergraph and thus act on the same spins, but differ in their local en- ergy functions and specifically the parameters that define them

work page
[18]

Sutskever and G

I. Sutskever and G. E. Hinton, Deep, narrow sigmoid be- lief networks are universal approximators, Neural Com- put. 20, 2629 – 2636 (2008)

work page 2008
[19]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neu- ral Networks 2, 359 (1989)

work page 1989
[20]

Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

G. Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

work page 1989
[21]

A survey on the expressive power of graph neural networks.arXiv:2003.04078,

R. Sato, A survey on the expressive power of graph neural networks, arXiv:2003.04078 (2020)

work page arXiv 2003
[22]

C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, Are transformers universal approxima- tors of sequence-to-sequence functions?, in 8th Interna- tional Conference on Learning Representations, ICLR 2020 (2020)

work page 2020
[23]

L. Gu, F. Zhou, and L. Yang, Towards the representa- tional power of restricted boltzmann machines, Neuro- computing 415, 358 (2020)

work page 2020
[24]

Martens, A

J. Martens, A. Chattopadhya, T. Pitassi, and R. Zemel, On the representational efficiency of restricted boltzmann machines, in Advances in Neural Information Process- ing Systems , Vol. 26, edited by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Curran Associates, Inc., 2013)

work page 2013
[25]

Gonda, T

T. Gonda, T. Reinhart, S. Stengele, and G. De les Coves, A framework for universality in physics, computer sci- ence, and beyond, Compositionality 6, 3 (2024). 8 SUPPLEMENT AR Y MA TERIAL Here we provide details to the results of the main text. Lemma 9. Consider a copy system C, flag systems Rx and Ryi with flag spins fx and fyi and fields on flag spins e...

work page 2024

[1] [1]

each target configuration t with energy at most ∆ can be uniquely extended to a source configuration (t, ht) with equal energy, and

work page

[2] [2]

If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT

all other source configurations have energy at least ∆. If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT . We say that S simulates a function f on {0, 1}V if S simulates any spin system with Hamiltonian f. This implies that for all configurations t on V , up to a shift, HS(t, ht) =∆ f(t),...

work page

[3] [3]

it is flag complete, and

work page

[4] [4]

only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if

it is closed for any subset of flag systems with dis- joint flag spins. Proof. The “only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if” direction constructively. We denote the flag system for configuration x by Sx. Without loss 3 of generalit...

work page

[5] [5]

Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

E. Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

work page 1925

[6] [6]

Reinhart and G

T. Reinhart and G. De les Coves, The grammar of the 7 Ising model: A new complexity hierarchy, Proc. R. Soc. A 481, 20240579 (2025)

work page 2025

[7] [7]

Boltzmann machines and energy-based models

T. Osogami, Boltzmann machines and energy-based models, arXiv:1708.06008 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[8] [8]

Le Roux and Y

N. Le Roux and Y. Bengio, Representational power of restricted boltzmann machines and deep belief networks, Neural Computation 20, 1631 (2008)

work page 2008

[9] [9]

Le Roux and Y

N. Le Roux and Y. Bengio, Deep belief networks are com- pact universal approximators, Neural Computation 22, 2192 (2010)

work page 2010

[10] [10]

Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

G. Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

work page 2015

[11] [11]

Mont´ ufar and J

G. Mont´ ufar and J. Rauh, Hierarchical models as marginals of hierarchical models, International Journal of Approximate Reasoning 88, 531 (2017)

work page 2017

[12] [12]

De las Cuevas and T

G. De las Cuevas and T. S. Cubitt, Simple universal mod- els capture all classical spin physics, Science 351, 1180 (2016)

work page 2016

[13] [13]

Reinhart, B

T. Reinhart, B. Engel, and G. D. les Coves, The structure of emulations in classical spin models: Modularity and universality, arXiv:2407.13428 (2024)

work page arXiv 2024

[14] [14]

Examples are DBMs and RBMs

Specifically, we mean machine learning models that are defined by an energy function acting on binary spins such that their output is the corresponding Boltzmann distribution over visible spins. Examples are DBMs and RBMs

work page

[15] [15]

S1 ∩ S2 = T1 ∩ T2

By disjoint auxiliary spins we mean that the overlap is at most between physical spins, i.e. S1 ∩ S2 = T1 ∩ T2

work page

[16] [16]

We denote by 0 the all zero and by 1 the all one vector

work page

[17] [17]

By changing couplings from R to R′ we mean that R and R′ are defined on the same interaction hypergraph and thus act on the same spins, but differ in their local en- ergy functions and specifically the parameters that define them

work page

[18] [18]

Sutskever and G

I. Sutskever and G. E. Hinton, Deep, narrow sigmoid be- lief networks are universal approximators, Neural Com- put. 20, 2629 – 2636 (2008)

work page 2008

[19] [19]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neu- ral Networks 2, 359 (1989)

work page 1989

[20] [20]

Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

G. Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

work page 1989

[21] [21]

A survey on the expressive power of graph neural networks.arXiv:2003.04078,

R. Sato, A survey on the expressive power of graph neural networks, arXiv:2003.04078 (2020)

work page arXiv 2003

[22] [22]

C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, Are transformers universal approxima- tors of sequence-to-sequence functions?, in 8th Interna- tional Conference on Learning Representations, ICLR 2020 (2020)

work page 2020

[23] [23]

L. Gu, F. Zhou, and L. Yang, Towards the representa- tional power of restricted boltzmann machines, Neuro- computing 415, 358 (2020)

work page 2020

[24] [24]

Martens, A

J. Martens, A. Chattopadhya, T. Pitassi, and R. Zemel, On the representational efficiency of restricted boltzmann machines, in Advances in Neural Information Process- ing Systems , Vol. 26, edited by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Curran Associates, Inc., 2013)

work page 2013

[25] [25]

Gonda, T

T. Gonda, T. Reinhart, S. Stengele, and G. De les Coves, A framework for universality in physics, computer sci- ence, and beyond, Compositionality 6, 3 (2024). 8 SUPPLEMENT AR Y MA TERIAL Here we provide details to the results of the main text. Lemma 9. Consider a copy system C, flag systems Rx and Ryi with flag spins fx and fyi and fields on flag spins e...

work page 2024