pith. sign in

arxiv: 2507.07669 · v2 · submitted 2025-07-10 · ❄️ cond-mat.dis-nn

Universal Spin Models are Universal Approximators in Machine Learning

Pith reviewed 2026-05-19 05:53 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn
keywords universal approximationspin modelsBoltzmann machinesprobability distributionsIsing modelmachine learninguniversal spin models
0
0 comments X

The pith

Universal spin models can approximate any probability distribution to arbitrary precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a spin model qualifies as universal for approximating probability distributions precisely when it can reproduce the low-energy behavior of any other spin model. This link lets researchers borrow the known sufficient and necessary conditions for spin-model universality and apply them directly to check whether machine-learning architectures reach universal approximation. A sympathetic reader cares because the result supplies a single, transferable test instead of separate proofs for each new model. The authors verify the test on restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks.

Core claim

We prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks.

What carries the argument

Universal spin models, defined as those able to reproduce the low-energy sector of any other spin model, shown to serve equally as universal approximators for arbitrary probability distributions.

If this is right

  • Simple models such as the 2D Ising model with fields become universal approximators of probability distributions.
  • Restricted Boltzmann machines satisfy the transferred conditions and are therefore universal approximators.
  • Deep Boltzmann machines and deep belief networks likewise meet the sufficient conditions.
  • New universal-approximation results for other architectures reduce to checking the same spin-model conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transfer technique could be applied to other physical systems whose universality has already been characterized in statistical mechanics.
  • Training dynamics of these models may acquire new interpretations drawn from the low-energy sector analysis of spin systems.

Load-bearing premise

The prior sufficient and necessary conditions that define universal spin models transfer without change to the problem of approximating probability distributions.

What would settle it

A concrete counter-example would be any probability distribution that the two-dimensional Ising model with fields cannot approximate to arbitrary accuracy, since that model is already known to be a universal spin model.

Figures

Figures reproduced from arXiv: 2507.07669 by Gemma De les Coves, Tobias Reinhart.

Figure 1
Figure 1. Figure 1: We prove that universal spin models are uni [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: If S simulates T, the spectrum of S equals that of T below the cut-off ∆ up to a shift −Γ. cal, binary degrees of freedom, or ‘spins’ interacting ac￾cording to a hypergraph (V, E). Each spin is associated to a vertex v ∈ V , and each interaction between spins is associated to a hyperedge e ∈ E and given by a lo￾cal energy function. A configuration s assigns a number from {0, 1} to each spin. Mapping config… view at source ↗
read the original abstract

One of the theoretical pillars that sustain certain machine learning models are universal approximation theorems, which prove that they can approximate all functions from a function class to arbitrary precision. Independently, classical spin models are termed universal if they can reproduce the behavior of any other spin model in their low energy sector. Universal spin models have been characterized via sufficient and necessary conditions, showing that simple models such as the 2d Ising with fields are universal. In this work, we prove that universal spin models are universal approximators of probability distributions. This enables us to leverage the characterization of the former to reveal conditions which are sufficient for universal approximation. Deriving universal approximation theorems thus amounts to verifying these conditions, yielding a unified recipe for universal approximation theorems applicable to a wide range of models. We explicitly test this recipe for restricted and deep Boltzmann machines, as well as for deep belief networks. This work illustrates that independently discovered universality statements may be intimately related, enabling the transfer of results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to prove that universal spin models (those able to reproduce the low-energy sector of arbitrary spin models, with the 2D Ising model with fields as an example) are universal approximators of probability distributions. This link is used to obtain sufficient conditions for universal approximation in machine learning models, which are then verified and tested explicitly for restricted Boltzmann machines, deep Boltzmann machines, and deep belief networks.

Significance. If the central claim holds, the work supplies a unified recipe for deriving universal approximation theorems by reducing the problem to checking known spin-model universality conditions. This could streamline theoretical analysis across architectures and facilitate identification of new universal models. The explicit tests on RBMs, DBMs, and DBNs provide concrete support for the recipe's applicability.

major comments (2)
  1. [Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.
  2. [Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.
minor comments (2)
  1. [Notation and definitions] Clarify notation for the target distribution q(s), the model distribution p(s), and the energy function E throughout; inconsistent symbols appear in the abstract and early sections.
  2. [Introduction] Add explicit citations to the foundational papers characterizing universal spin models (sufficient and necessary conditions) in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. We address each major comment below and have made revisions to clarify the points raised.

read point-by-point responses
  1. Referee: [Main theorem / proof of universality transfer] The transfer from low-energy universality (embedding arbitrary target Hamiltonians via auxiliary spins or gadgets) to density of the full Boltzmann distributions p(s) ∝ exp(−E(s)) in the space of target distributions q(s) (under total variation or KL) is load-bearing but not obviously justified. The prior characterizations control ground-state energies and degeneracies; it is unclear whether this grants independent control over the entire spectrum and partition function contributions from higher states. Please supply the explicit reduction (likely in the main theorem section) showing how arbitrary q can be approximated to any precision.

    Authors: We thank the referee for highlighting this important aspect of the proof. In Section 3 of the manuscript, the main theorem establishes that if a spin model is universal in the low-energy sector, then by sufficiently increasing the energy gap to higher states (via rescaling or temperature adjustment), the Boltzmann distribution can be made arbitrarily close in total variation distance to any target distribution that is uniform over the ground states of an embedded Hamiltonian. This is because the probability mass on excited states can be bounded by exp(-Δβ), where Δ is the gap, which can be made large. We have added an explicit lemma and proof sketch in the revised version to make this reduction clear, including the control over the partition function. revision: yes

  2. Referee: [Sections on RBMs, DBMs, and DBNs] In the applications to RBMs and DBMs, the verification that the 2D Ising model with fields satisfies the sufficient conditions must include the precise mapping of parameters (visible/hidden weights, biases) and how the low-energy embedding translates to the marginal distribution over visible units. Without this, the claim that the recipe directly yields the known universality results for these models remains incomplete.

    Authors: We agree that the applications section would benefit from more explicit details. In the revised manuscript, we have expanded the sections on RBMs, DBMs, and DBNs to include the precise parameter mappings from the 2D Ising model with fields to the model parameters (e.g., weights and biases). We also describe how the auxiliary spins are integrated out to obtain the marginal distribution over the visible units, showing that it matches the standard form for these models. This makes the derivation of the known universality results direct from our recipe. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation transfers external spin-model characterization to distribution approximation without self-referential reduction

full rationale

The paper's central step is to prove that models satisfying the known sufficient/necessary conditions for universal spin models (e.g., 2D Ising with fields) can realize Boltzmann distributions dense in the space of target distributions. This rests on an external prior characterization of low-energy universality rather than on any quantity defined in terms of the target approximation result itself. No equations rename a fitted parameter as a prediction, no self-citation is load-bearing for the uniqueness or ansatz, and the derivation chain does not reduce by construction to its inputs. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the pre-existing characterization of universal spin models and on standard mathematical notions of universal approximation; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Universal spin models can reproduce the behavior of any other spin model in their low energy sector.
    This is the definition of universality for spin models invoked to transfer results to probability approximation.

pith-pipeline@v0.9.0 · 5693 in / 1226 out tokens · 32756 ms · 2026-05-19T05:53:42.072008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    each target configuration t with energy at most ∆ can be uniquely extended to a source configuration (t, ht) with equal energy, and

  2. [2]

    If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT

    all other source configurations have energy at least ∆. If S simulates T we write S → T and divide the spins of S into physical spins, VT ⊆ VS, and auxiliary spins, VS \ VT . We say that S simulates a function f on {0, 1}V if S simulates any spin system with Hamiltonian f. This implies that for all configurations t on V , up to a shift, HS(t, ht) =∆ f(t),...

  3. [3]

    it is flag complete, and

  4. [4]

    only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if

    it is closed for any subset of flag systems with dis- joint flag spins. Proof. The “only if” direction is immediate, sinceM sim- ulates all spin systems. Simulating a flag system with high enough cut-off yields a flag system itself. We prove the “if” direction constructively. We denote the flag system for configuration x by Sx. Without loss 3 of generalit...

  5. [5]

    Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

    E. Ising, Beitrag zur Theorie des Ferromagnetismus, Zeitschrift f¨ ur Physik31, 253 (1925)

  6. [6]

    Reinhart and G

    T. Reinhart and G. De les Coves, The grammar of the 7 Ising model: A new complexity hierarchy, Proc. R. Soc. A 481, 20240579 (2025)

  7. [7]

    Boltzmann machines and energy-based models

    T. Osogami, Boltzmann machines and energy-based models, arXiv:1708.06008 (2019)

  8. [8]

    Le Roux and Y

    N. Le Roux and Y. Bengio, Representational power of restricted boltzmann machines and deep belief networks, Neural Computation 20, 1631 (2008)

  9. [9]

    Le Roux and Y

    N. Le Roux and Y. Bengio, Deep belief networks are com- pact universal approximators, Neural Computation 22, 2192 (2010)

  10. [10]

    Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

    G. Mont´ ufar, Deep narrow boltzmann machines are uni- versal approximators, in 3rd International Conference on Learning Representations, ICLR 2015 (2015)

  11. [11]

    Mont´ ufar and J

    G. Mont´ ufar and J. Rauh, Hierarchical models as marginals of hierarchical models, International Journal of Approximate Reasoning 88, 531 (2017)

  12. [12]

    De las Cuevas and T

    G. De las Cuevas and T. S. Cubitt, Simple universal mod- els capture all classical spin physics, Science 351, 1180 (2016)

  13. [13]

    Reinhart, B

    T. Reinhart, B. Engel, and G. D. les Coves, The structure of emulations in classical spin models: Modularity and universality, arXiv:2407.13428 (2024)

  14. [14]

    Examples are DBMs and RBMs

    Specifically, we mean machine learning models that are defined by an energy function acting on binary spins such that their output is the corresponding Boltzmann distribution over visible spins. Examples are DBMs and RBMs

  15. [15]

    S1 ∩ S2 = T1 ∩ T2

    By disjoint auxiliary spins we mean that the overlap is at most between physical spins, i.e. S1 ∩ S2 = T1 ∩ T2

  16. [16]

    We denote by 0 the all zero and by 1 the all one vector

  17. [17]

    By changing couplings from R to R′ we mean that R and R′ are defined on the same interaction hypergraph and thus act on the same spins, but differ in their local en- ergy functions and specifically the parameters that define them

  18. [18]

    Sutskever and G

    I. Sutskever and G. E. Hinton, Deep, narrow sigmoid be- lief networks are universal approximators, Neural Com- put. 20, 2629 – 2636 (2008)

  19. [19]

    Hornik, M

    K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neu- ral Networks 2, 359 (1989)

  20. [20]

    Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

    G. Cybenko, Approximation by superpositions of a sig- moidal function, Mathematics of Control, Signals and Systems 2, 303 (1989)

  21. [21]

    A survey on the expressive power of graph neural networks.arXiv:2003.04078,

    R. Sato, A survey on the expressive power of graph neural networks, arXiv:2003.04078 (2020)

  22. [22]

    C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, Are transformers universal approxima- tors of sequence-to-sequence functions?, in 8th Interna- tional Conference on Learning Representations, ICLR 2020 (2020)

  23. [23]

    L. Gu, F. Zhou, and L. Yang, Towards the representa- tional power of restricted boltzmann machines, Neuro- computing 415, 358 (2020)

  24. [24]

    Martens, A

    J. Martens, A. Chattopadhya, T. Pitassi, and R. Zemel, On the representational efficiency of restricted boltzmann machines, in Advances in Neural Information Process- ing Systems , Vol. 26, edited by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Curran Associates, Inc., 2013)

  25. [25]

    Gonda, T

    T. Gonda, T. Reinhart, S. Stengele, and G. De les Coves, A framework for universality in physics, computer sci- ence, and beyond, Compositionality 6, 3 (2024). 8 SUPPLEMENT AR Y MA TERIAL Here we provide details to the results of the main text. Lemma 9. Consider a copy system C, flag systems Rx and Ryi with flag spins fx and fyi and fields on flag spins e...