Universal Spin Models are Universal Approximators in Machine Learning
Pith reviewed 2026-05-19 05:53 UTC · model grok-4.3
The pith
Universal spin models can approximate any probability distribution to arbitrary precision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks.
What carries the argument
Universal spin models, defined as those able to reproduce the low-energy sector of any other spin model, shown to serve equally as universal approximators for arbitrary probability distributions.
If this is right
- Simple models such as the 2D Ising model with fields become universal approximators of probability distributions.
- Restricted Boltzmann machines satisfy the transferred conditions and are therefore universal approximators.
- Deep Boltzmann machines and deep belief networks likewise meet the sufficient conditions.
- New universal-approximation results for other architectures reduce to checking the same spin-model conditions.
Where Pith is reading between the lines
- The same transfer technique could be applied to other physical systems whose universality has already been characterized in statistical mechanics.
- Training dynamics of these models may acquire new interpretations drawn from the low-energy sector analysis of spin systems.
Load-bearing premise
The prior sufficient and necessary conditions that define universal spin models transfer without change to the problem of approximating probability distributions.
What would settle it
A concrete counter-example would be any probability distribution that the two-dimensional Ising model with fields cannot approximate to arbitrary accuracy, since that model is already known to be a universal spin model.
Figures
read the original abstract
One of the theoretical pillars that sustain certain machine learning models are universal approximation theorems, which prove that they can approximate all functions from a function class to arbitrary precision. Independently, classical spin models are termed universal if they can reproduce the behavior of any other spin model in their low energy sector. Universal spin models have been characterized via sufficient and necessary conditions, showing that simple models such as the 2d Ising with fields are universal. In this work, we prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks. This work illustrates that independently discovered universality statements may be intimately related, enabling the transfer of results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to prove that universal spin models (those able to reproduce the low-energy sector of arbitrary spin models, with the 2D Ising model with fields as an example) are universal approximators of probability distributions. This link is used to obtain sufficient conditions for universal approximation in machine learning models, which are then verified and tested explicitly for restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks.
Significance. If the central claim holds, the work supplies a unified recipe for deriving universal approximation theorems by reducing the problem to checking known spin-model universality conditions. This could streamline theoretical analysis across architectures and facilitate identification of new universal models. The explicit tests on RBMs, DBMs, and DBNs provide concrete support for the recipe's applicability.
major comments (2)
- [Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.
- [Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.
minor comments (2)
- [Notation and definitions] Clarify notation for the target distribution q(s), the model distribution p(s), and the energy function E throughout; inconsistent symbols appear in the abstract and early sections.
- [Introduction] Add explicit citations to the foundational papers characterizing universal spin models (sufficient and necessary conditions) in the introduction.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive report. We address each major comment below and have made revisions to clarify the points raised.
read point-by-point responses
-
Referee: [Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.
Authors: We thank the referee for highlighting this important aspect of the proof. In Section 3 of the manuscript, the main theorem establishes that if a spin model is universal in the low-energy sector, then by sufficiently increasing the energy gap to higher states (via rescaling or temperature adjustment), the Boltzmann distribution can be made arbitrarily close in total variation distance to any target distribution that is uniform over the ground states of an embedded Hamiltonian. This is because the probability mass on excited states can be bounded by exp(-Δβ), where Δ is the gap, which can be made large. We have added an explicit lemma and proof sketch in the revised version to make this reduction clear, including the control over the partition function. revision: yes
-
Referee: [Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.
Authors: We agree that the applications section would benefit from more explicit details. In the revised manuscript, we have expanded the sections on RBMs, DBMs, and DBNs to include the precise parameter mappings from the 2D Ising model with fields to the model parameters (e.g., weights and biases). We also describe how the auxiliary spins are integrated out to obtain the marginal distribution over the visible units, showing that it matches the standard form for these models. This makes the derivation of the known universality results direct from our recipe. revision: yes
Circularity Check
No circularity: derivation transfers external spin-model characterization to distribution approximation without self-referential reduction
full rationale
The paper's central step is to prove that models satisfying the known sufficient/necessary conditions for universal spin models (e.g., 2D Ising with fields) can realize Boltzmann distributions dense in the space of target distributions. This rests on an external prior characterization of low-energy universality rather than on any quantity defined in terms of the target approximation result itself. No equations rename a fitted parameter as a prediction, no self-citation is load-bearing for the uniqueness or ansatz, and the derivation chain does not reduce by construction to its inputs. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Universal spin models can reproduce the behavior of any other spin model in their low energy sector.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Universal spin models are universal approximators. ... Simulations preserve Boltzmann distributions (Lemma 10)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
each target configuration t with energy at most ∆ can be uniquely extended to a source configuration (t, ht) with equal energy, and
-
[2]
all other source configurations have energy at least ∆. If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT . We say that S simulates a function f on {0, 1}V if S simulates any spin system with Hamiltonian f. This implies that for all configurations t on V , up to a shift, HS(t, ht) =∆ f(t),...
-
[3]
it is flag complete, and
-
[4]
it is closed for any subset of flag systems with dis- joint flag spins. Proof. The “only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if” direction constructively. We denote the flag system for configuration x by Sx. Without loss 3 of generalit...
-
[5]
Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)
E. Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)
work page 1925
-
[6]
T. Reinhart and G. De les Coves, The grammar of the 7 Ising model: A new complexity hierarchy, Proc. R. Soc. A 481, 20240579 (2025)
work page 2025
-
[7]
Boltzmann machines and energy-based models
T. Osogami, Boltzmann machines and energy-based models, arXiv:1708.06008 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[8]
N. Le Roux and Y. Bengio, Representational power of restricted boltzmann machines and deep belief networks, Neural Computation 20, 1631 (2008)
work page 2008
-
[9]
N. Le Roux and Y. Bengio, Deep belief networks are com- pact universal approximators, Neural Computation 22, 2192 (2010)
work page 2010
-
[10]
G. Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)
work page 2015
-
[11]
G. Mont´ ufar and J. Rauh, Hierarchical models as marginals of hierarchical models, International Journal of Approximate Reasoning 88, 531 (2017)
work page 2017
-
[12]
G. De las Cuevas and T. S. Cubitt, Simple universal mod- els capture all classical spin physics, Science 351, 1180 (2016)
work page 2016
-
[13]
T. Reinhart, B. Engel, and G. D. les Coves, The structure of emulations in classical spin models: Modularity and universality, arXiv:2407.13428 (2024)
-
[14]
Specifically, we mean machine learning models that are defined by an energy function acting on binary spins such that their output is the corresponding Boltzmann distribution over visible spins. Examples are DBMs and RBMs
-
[15]
By disjoint auxiliary spins we mean that the overlap is at most between physical spins, i.e. S1 ∩ S2 = T1 ∩ T2
-
[16]
We denote by 0 the all zero and by 1 the all one vector
-
[17]
By changing couplings from R to R′ we mean that R and R′ are defined on the same interaction hypergraph and thus act on the same spins, but differ in their local en- ergy functions and specifically the parameters that define them
-
[18]
I. Sutskever and G. E. Hinton, Deep, narrow sigmoid be- lief networks are universal approximators, Neural Com- put. 20, 2629 – 2636 (2008)
work page 2008
- [19]
-
[20]
G. Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)
work page 1989
-
[21]
A survey on the expressive power of graph neural networks.arXiv:2003.04078,
R. Sato, A survey on the expressive power of graph neural networks, arXiv:2003.04078 (2020)
-
[22]
C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, Are transformers universal approxima- tors of sequence-to-sequence functions?, in 8th Interna- tional Conference on Learning Representations, ICLR 2020 (2020)
work page 2020
-
[23]
L. Gu, F. Zhou, and L. Yang, Towards the representa- tional power of restricted boltzmann machines, Neuro- computing 415, 358 (2020)
work page 2020
-
[24]
J. Martens, A. Chattopadhya, T. Pitassi, and R. Zemel, On the representational efficiency of restricted boltzmann machines, in Advances in Neural Information Process- ing Systems , Vol. 26, edited by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Curran Associates, Inc., 2013)
work page 2013
-
[25]
T. Gonda, T. Reinhart, S. Stengele, and G. De les Coves, A framework for universality in physics, computer sci- ence, and beyond, Compositionality 6, 3 (2024). 8 SUPPLEMENT AR Y MA TERIAL Here we provide details to the results of the main text. Lemma 9. Consider a copy system C, flag systems Rx and Ryi with flag spins fx and fyi and fields on flag spins e...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.