pith. sign in

arxiv: 2605.19178 · v1 · pith:IJWRSFZ6new · submitted 2026-05-18 · ❄️ cond-mat.dis-nn · cond-mat.stat-mech· cs.LG· physics.data-an

Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

Pith reviewed 2026-05-20 06:56 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn cond-mat.stat-mechcs.LGphysics.data-an
keywords Restricted Boltzmann MachinesActivation FunctionsHigher-Order InteractionsInduced CouplingsBinary Variable ModelsLearning DynamicsExponential NonlinearityMoment Analysis
0
0 comments X

The pith

RBMs using exponential activation can represent and learn strong higher-order interactions within an analytically determined parameter range.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how activation functions shape what Restricted Boltzmann Machines can represent when modeling data with interactions beyond pairwise terms among binary variables. Exploiting the known duality to effective spin models, the authors derive the moments of the induced interaction distribution for linear, step, ReLU, and exponential units. They show analytically that large higher-order couplings are difficult for any RBM to capture, yet exponential nonlinearities open a usable window of parameters where such structures become representable. Direct comparison of these moment predictions with the outcome of gradient-based training confirms the analysis holds during learning.

Core claim

The space of models representable by an RBM is fully characterized by the moments of the distribution of interactions induced on the visible variables; for exponential activations this distribution acquires a tail that permits large higher-order terms within a specific, analytically fixed range of the hidden-unit bias and weight scale.

What carries the argument

The duality mapping an RBM ensemble to an effective model of interacting binary variables, with the distribution of induced couplings characterized by its low-order moments.

If this is right

  • Data structures generated by strong higher-order interactions remain hard to represent for linear, step, and ReLU activations at any parameter value.
  • Exponential activation enlarges the representable set precisely when the hidden-unit bias and coupling scale lie inside the derived interval.
  • Quantitative agreement between moment calculations and observed learning trajectories holds across the four activations tested.
  • Optimal parameter choices for exponential units can be read directly from the analytic expressions without numerical search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar rapidly growing activations could be substituted for the exponential in deeper architectures to improve capture of multi-body correlations.
  • The moment characterization offers a diagnostic for whether a given dataset is likely to be learnable by a given RBM before training begins.
  • The same duality lens may be applied to other energy-based models to predict which activation choices favor higher-order statistics.

Load-bearing premise

The duality between RBMs and models of interacting binary variables fully determines the representable distributions through the moments of the induced interactions.

What would settle it

Training an exponential RBM on synthetic data generated from a model with large three-body or higher couplings and checking whether the recovered effective interactions match the predicted moments only inside the analytically derived bias-and-scale window.

Figures

Figures reproduced from arXiv: 2605.19178 by Giovanni di Sarra, Yasser Roudi.

Figure 1
Figure 1. Figure 1: Bipartite structure of a Restricted Boltzmann Machine. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hidden layer marginalization. The joint distribution of an RBM with N = 5 is marginalized with respect to the hidden layer to generate a fully-visible network with arbitrary orders of interaction between nodes. In the Linear RBM case, Eq. (2) corresponds to a Hopfield￾like pairwise model. In the non linear cases, Eq. (2) also includes every higher-order interaction term up to s = N. The three-body interact… view at source ↗
Figure 3
Figure 3. Figure 3: Solutions of ∆Exp s = 1 in the (σ 2 , w0) plane for M−1 0 = 0.1 (left) and M−1 0 = 0.002 (right). Eq. (13) is plotted with a color corresponding to the order of interaction. The black line shows the divergence γ1 = 0, where interaction fluctuations are infinitely larger than the expected value. interaction terms with increasing order s have larger fluctuation-dominated regions. Furthermore, the size of the… view at source ↗
Figure 4
Figure 4. Figure 4: I (s) 0 /M versus w0 from Eq. (15), for s = 1, 2 for the Linear activation function and s = 1, 2, 3 for Exponential, Step and ReLU. Interactions of higher orders are also present for all the activation functions except for Linear. I (s) 0 /M with s > 3 are smaller than I (3) 0 /M and are not shown for visualization purposes. The star indicates the transition point for the Exponential function. The RBM para… view at source ↗
Figure 5
Figure 5. Figure 5: shows a different situation for ReLU and Step, where the input to the hidden units changes the interaction structure in a more complicated way [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Is from Eq. (23) and Is from Eq. (20) (dashed line) versus w0 for g = 2. The solid line for the Exponential activation shows Eq. (7). Parameters are bi = 0, cµ = 0 ∀i, µ, N = 8, M = 20 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Is from Eq. (23) and Is from Eq. (20) (dashed line) versus g for w0 = 0.2. The solid line for the Exponential activation shows Eq. (7). Parameters are bi = 0, cµ = 0 ∀i, µ, N = 8, M = 20. the latter deviates from the analytical expressions. In fact, the γ1 = 1 transition is not captured by the expansion. Figs. 8 and 9 show a similar set of results as in Figs. 6 and 7 but for the variance of the interaction… view at source ↗
Figure 8
Figure 8. Figure 8: σ 2 s from Eq. (25) and Var(Ik1,··· ,ks ) from Eq. (24) (dashed line) versus w0 for g = 1. The solid line for the Exponential activation shows Eq. (9). Parameters are bi = 0 ∀i, cµ = 0 ∀µ, N = 8, M = 20. As in the previous cases, analytical expressions are compared with empirical averages. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: σ 2 s from Eq. (25) and Var(Ik1,··· ,ks ) from Eq. (24) (dashed line) versus g for w0 = 0.2. The solid line for the Exponential activation shows Eq. (9). The RBM parameters are bi = 0 ∀i, cµ = 0 ∀µ, N = 8 and M = 20. the order of interaction s. This is well captured by the theoretical expressions and shown both as a function of w0 ( [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Square root of I 2 s from Eq. (23) (n = 2) and square root of Is,2 from Eq. (22) (dashed line) versus w0 for g = 1. The solid line for the Exponential activation shows the first term in Eq. (9). The RBM parameters are bi = 0 ∀i, cµ = 0 ∀µ, N = 8 and M = 20. moments gives an estimate of the average magnitude of the interaction terms. Then, the figures show how lower order interactions are larger in magnitu… view at source ↗
Figure 11
Figure 11. Figure 11: RBMs with Exponential activation have a regime where different orders of interaction [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Fraction of decaying interaction models for Exponential, ReLU and Step activation [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Decaying and non decaying ground truth lattice gas models. Ground truth lat￾tice gas models with N = 3 and interactions I gt k1,...,ks ∼ N (I (s) gt , I(s) gt /5). For the decaying interaction model in Eq.(28) (upper left), I (1) gt = 0.9, I (2) gt = 0.3 and I (3) gt = 0.1. For the non decaying interaction model in Eq.(29) (lower left), the interactions are 3-body, I (3) gt = 1. Edges in the networks repr… view at source ↗
Figure 14
Figure 14. Figure 14: Learning a decaying interaction model. A RBM with N = 3 and M = 4, initialized with zero-mean Gaussian weights (σ = 0.01), is trained to match Eq.(28) for different activation functions. The model is trained for 2500 epochs with a learning rate of 0.001. The first panel in each row shows the trajectory of the interactions mapped from the RBM, compared with the ground-truth interactions (dashed lines). The… view at source ↗
Figure 15
Figure 15. Figure 15: shows the training process when the ground-truth model is the three-body interaction model in [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Learning interactions from a random RBM. An RBM with N = 3 and M = 4, initialized with zero-mean Gaussian weights (σ = 0.01), is trained to match the probability distri￾bution of a Gaussian random RBM (w0 =0.2 and g =0.2/ √ M) for different activation functions. The model is trained for 300 epoch with a learning rate of 0.02. The first panel for each activa￾tion function shows the training trajectory of t… view at source ↗
Figure 17
Figure 17. Figure 17: shows how one-body interaction models in Eq.(31) are learned by an RBM, for different values of hi and different activation functions. This kind of behavior suggests that the one-body [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Learning a pairwise lattice gas model. RBMs with N = 3 and M = 4, initialized with zero-mean Gaussian weights (σ = 0.01), are trained to match the probability distribution of ground-truth models with pairwise interactions only (Eq.(32)) for different values of Jij . Interac￾tions of order 1 and 3 are plotted versus J. Each panel shows the comparison between the pairwise ground truth interaction Jij and th… view at source ↗
Figure 19
Figure 19. Figure 19: Learning a three-body lattice gas model. RBMs with N = 3 and M = 4, initialized with zero-mean Gaussian weights (σ = 0.01), are trained to match the probability distribution of ground truth models with one three-body interaction only (Eq.(33)) for different values of Tijk. Interactions of s < 3 are plotted versus T. Each panel shows the comparison between the three￾body ground truth interaction Tijk and t… view at source ↗
Figure 20
Figure 20. Figure 20: Learning a non decaying lattice gas model with the Exponential activation. An RBM with N = 3 and M = 8, initialized with Gaussian weights (w0 = 0.3 and g = 3), is trained on a ground truth non-decaying model. The left panel shows I gt k1,··· ,ks 2 (in blue) and Ik1,··· ,ks 2 from Eq.(3) for the trained RBM with different activation functions. The lines connect Eq. (23) (n = 2) for s = 2 and s = 3. The rig… view at source ↗
Figure 21
Figure 21. Figure 21: RBMs represent three-body interaction model with T = 0.5. The weights configurations solving the non-linear set of equations given by Eq.(3) for a three-body interaction is shown for each activation function. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Solutions of ∆Lin s = 1 in the (σ 2 , w0) plane for M0 = 0.1 (left) and M0 = 0.002 (right). Eq. (13) is plotted with a color corresponding to the order of interaction. The black line shows the divergence w0 = 0, where interaction fluctuations are infinitely larger than their mean [PITH_FULL_IMAGE:figures/full_fig_p035_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Square root of I 2 s from Eq. (23) (n = 2) and square root of Is,2 from Eq. (22) (dashed line) versus g for w0 = 0.2. The solid line for the Exponential activation shows the first term in Eq. (9). The RBM parameters are bi = 0 ∀i, cµ = 0 ∀µ, N = 8 and M = 20. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Learning a large independent lattice gas model. RBMs with N = 10 and M = 15, initialized with zero-mean Gaussian weights (σ = 0.01), are trained to match the probability dis￾tribution of ground truth lattice gas models with one body interactions only (Eq.(31)) for different values of hi and for different activation functions. Each panel shows the comparison between the one-body ground truth interaction hi… view at source ↗
Figure 25
Figure 25. Figure 25: Learning a large pairwise lattice gas model. RBMs with N = 10 and M = 15, initialized with zero-mean Gaussian weights (σ = 0.01), are trained to match the probability distribution of ground truth lattice gas models with pairwise interactions only (Eq.(32)) for different values of Jij and for different activation functions. Each panel shows the comparison between the pairwise ground truth interaction Jij a… view at source ↗
Figure 26
Figure 26. Figure 26: Learning a large three-body lattice gas model. RBMs with N = 10 and M = 15, initialized with zero-mean Gaussian weights (σ = 0.01), are trained to match the probability distribution of ground truth lattice gas models with three-body interactions only (Eq.(33)) for different values of Tijk and for different activation functions. Each panel shows the comparison between the three-body ground truth interactio… view at source ↗
Figure 27
Figure 27. Figure 27: Learning a non decaying lattice gas model with the Exponential activation - details. The model is trained for 2500 epochs with a learning rate of 5 × 10−4 . The first panel in each row shows the trajectory of the interactions mapped from the RBM, compared with the ground-truth interactions (dashed lines). The second panel in each row shows the cross-entropy trajectory, where the target is the ground truth… view at source ↗
read the original abstract

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper exploits the RBM-Ising duality to map activation functions (Linear, Step, ReLU, Exponential) to distributions of induced couplings in an effective model of interacting binary variables. It analytically computes the moments of these distributions to characterize the space of representable models and identifies an analytically determined parameter window for the Exponential activation in which higher-order moments become large enough to represent data structures with strong multi-body interactions. Quantitative predictions from the moment analysis are reported to agree well with direct simulations of the training dynamics.

Significance. If the central mapping holds, the work supplies a concrete analytical handle on how activation nonlinearities control the capacity to encode higher-order statistics, which is a load-bearing issue for understanding representation power in energy-based models. The explicit parameter range for Exponential activations and the reported agreement between analytics and simulations constitute falsifiable, reproducible elements that could inform activation choice in RBMs and related architectures.

major comments (2)
  1. [§3] §3 (moment characterization): The claim that the space of representable models is fully delineated by the first few moments of the induced-interaction distribution assumes that moment matching suffices to guarantee reproduction of arbitrary higher-order statistics. For the Exponential activation, whose induced couplings are expected to be non-Gaussian, it is not shown whether residual correlations or higher cumulants outside the reported moments can still prevent the effective Hamiltonian from capturing the target multi-body terms; a bound or explicit counter-example would be needed to secure this step.
  2. [§4] §4 (simulation validation): The parameter window for Exponential is derived from the same moment calculations used to define the representable set; the reported agreement with training simulations therefore does not constitute an independent test of whether the moment truncation actually enlarges the reachable model space beyond what ReLU or Step functions achieve.
minor comments (2)
  1. [§2] Notation for the induced coupling distribution should be introduced once and used consistently; the transition from the RBM energy to the effective Ising Hamiltonian is described in two places with slightly different symbols.
  2. [Figure 3] Figure 3: the error bars on the learning curves for the Exponential case overlap with the ReLU curves in the reported regime; a statistical test or larger sample size would clarify whether the claimed advantage is significant.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive report. The comments highlight important nuances in our moment-based characterization and the nature of our simulation validation. We address each major comment below, indicating where we will revise the manuscript for greater precision while defending the core contributions of the work.

read point-by-point responses
  1. Referee: [§3] §3 (moment characterization): The claim that the space of representable models is fully delineated by the first few moments of the induced-interaction distribution assumes that moment matching suffices to guarantee reproduction of arbitrary higher-order statistics. For the Exponential activation, whose induced couplings are expected to be non-Gaussian, it is not shown whether residual correlations or higher cumulants outside the reported moments can still prevent the effective Hamiltonian from capturing the target multi-body terms; a bound or explicit counter-example would be needed to secure this step.

    Authors: We appreciate this observation. Our manuscript characterizes the space of representable models through the moments of the induced coupling distribution rather than claiming that matching the first few moments rigorously guarantees exact reproduction of arbitrary higher-order statistics. The moments quantify the expected magnitude of multi-body interactions; for the Exponential activation these higher moments grow rapidly inside the identified parameter window, indicating enhanced capacity for strong higher-order terms. We acknowledge that non-Gaussian features and residual cumulants could affect precise matching and will add a clarifying paragraph in §3 stating that the moment analysis provides a necessary indicator of representational capacity but is not proven sufficient for all target statistics. A rigorous bound or counter-example lies beyond the present scope. revision: partial

  2. Referee: [§4] §4 (simulation validation): The parameter window for Exponential is derived from the same moment calculations used to define the representable set; the reported agreement with training simulations therefore does not constitute an independent test of whether the moment truncation actually enlarges the reachable model space beyond what ReLU or Step functions achieve.

    Authors: We agree that the simulations are guided by the same analytical moment calculations and therefore do not furnish a fully independent test of the truncation's effect on reachable model space. The numerical results instead confirm that the analytically predicted window for the Exponential activation corresponds to measurably better learning of higher-order structures, while the same window yields no advantage for Linear, Step or ReLU activations. We will revise the discussion in §4 to emphasize that the simulations validate the practical utility of the moment-derived window rather than independently proving an enlargement of the model space. revision: partial

standing simulated objections not resolved
  • A rigorous bound or explicit counter-example showing whether higher cumulants or residual correlations can prevent the effective Hamiltonian from capturing target multi-body terms for the Exponential activation.

Circularity Check

0 steps flagged

Moment-based analytic characterization of RBM representable spaces is self-contained and externally validated

full rationale

The paper derives the distribution of induced interactions and their moments directly from the RBM-Ising duality for each activation function (Linear, Step, ReLU, Exponential), obtains closed-form expressions for those moments, and identifies the parameter window for Exponential activation where higher-order moments become large. These analytic results are then compared quantitatively to independent Monte Carlo simulations of the training dynamics on synthetic data generated from models with strong higher-order terms. No equation reduces a prediction to a fitted parameter by construction, no load-bearing premise rests on a self-citation chain, and the duality is used only to map activations to interaction statistics rather than to presuppose the target result. The central claim therefore remains independent of its own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard statistical-mechanics duality between RBMs and interacting binary models plus the assumption that moments of the induced-interaction distribution suffice to characterize representable data structures.

free parameters (1)
  • parameter range for exponential activation
    The specific window of parameters where exponential activation succeeds is determined analytically from the moment calculations and may implicitly depend on data assumptions.
axioms (1)
  • domain assumption Duality between RBMs and models of interacting binary variables
    Invoked to translate activation nonlinearities into statistics of effective interactions.

pith-pipeline@v0.9.0 · 5757 in / 1174 out tokens · 51575 ms · 2026-05-20T06:56:56.344437+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Allen-Zhu, Y

    Z. Allen-Zhu, Y. Li, and Y. Liang. Learning and generalization in overparameterized neural networks, going beyond two layers.Advances in neural information processing systems, 32, 2019

  2. [2]

    Arora, S

    S. Arora, S. Du, W. Hu, Z. Li, and R. Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. InInternational conference on machine learning, pages 322–332. PMLR, 2019

  3. [3]

    Allen-Zhu, Y

    Z. Allen-Zhu, Y. Li, and Z. Song. A convergence theory for deep learning via over- parameterization. InInternational conference on machine learning, pages 242–252. PMLR, 2019

  4. [4]

    On the generalization mystery in deep learning.arXiv preprint arXiv:2203.10036, 2022

    S. Chatterjee and P. Zielinski. On the generalization mystery in deep learning.arXiv preprint arXiv:2203.10036, 2022

  5. [5]

    Oymak and M

    S. Oymak and M. Soltanolkotabi. Toward moderate overparameterization: Global conver- gence guarantees for training shallow neural networks.IEEE Journal on Selected Areas in Information Theory, 1(1):84–105, 2020

  6. [6]

    Li and Y

    Y. Li and Y. Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data.Advances in neural information processing systems, 31, 2018

  7. [7]

    di Sarra, B

    G. di Sarra, B. Bravi, and Y. Roudi. The unbearable lightness of restricted boltzmann ma- chines: Theoretical insights and biological applications.Europhysics Letters, 149(2):21002, jan 2025

  8. [8]

    Nair and G

    V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. InProceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, pages 807–814, Madison, WI, USA, 2010. Omnipress. 26

  9. [9]

    Glorot, A

    X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In Geoffrey Gordon, David Dunson, and Miroslav Dud´ ık, editors,Proceedings of the Fourteenth Interna- tional Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 315–323, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR

  10. [10]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

  11. [11]

    Searching for Activation Functions

    P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions.arXiv preprint arXiv:1710.05941, 2017

  12. [12]

    Gaussian Error Linear Units (GELUs)

    D. Hendrycks and K. Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

  13. [13]

    Fukai and M

    T. Fukai and M. Shiino. Large suppression of spurious states in neural networks of nonlinear analog neurons.Phys. Rev. A, 42:7459–7466, Dec 1990

  14. [14]

    K¨ uhn, S

    R. K¨ uhn, S. B¨ os, and J. L. van Hemmen. Statistical mechanics for networks of graded-response neurons.Phys. Rev. A, 43:2084–2087, Feb 1991

  15. [15]

    Graded-response neurons and information encodings in autoassociative memories.Phys

    Alessandro Treves. Graded-response neurons and information encodings in autoassociative memories.Phys. Rev. A, 42:2418–2430, Aug 1990

  16. [16]

    Localized activity profiles and storage capacity of rate- based autoassociative networks.Phys

    Yasser Roudi and Alessandro Treves. Localized activity profiles and storage capacity of rate- based autoassociative networks.Phys. Rev. E, 73:061904, Jun 2006

  17. [17]

    Threshold-linear formal neurons in auto-associative nets.Journal of Physics A: Mathematical and General, 23(12):2631–2650, jun 1990

    A Treves. Threshold-linear formal neurons in auto-associative nets.Journal of Physics A: Mathematical and General, 23(12):2631–2650, jun 1990

  18. [18]

    Sch¨ onsberg, Y

    F. Sch¨ onsberg, Y. Roudi, and A. Treves. Efficiency of local learning rules in threshold-linear associative networks.Phys. Rev. Lett., 126:018301, Jan 2021

  19. [19]

    K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In2015 IEEE International Conference on Computer Vision (ICCV), pages 1026–1034, Los Alamitos, CA, USA, dec 2015. IEEE Computer Society

  20. [20]

    Kunc and J

    V. Kunc and J. Kl´ ema. Three decades of activations: A comprehensive survey of 400 activation functions for neural networks.arXiv preprint arXiv:2402.09092, 2024

  21. [21]

    Oostwal, M

    E. Oostwal, M. Straat, and M. Biehl. Hidden unit specialization in layered neural net- works: Relu vs. sigmoidal activation.Physica A: Statistical Mechanics and its Applications, 564:125517, 2021

  22. [22]

    Citton, F

    O. Citton, F. Richert, and M. Biehl. Phase transition analysis for shallow neural networks with arbitrary activation functions.Physica A: Statistical Mechanics and its Applications, 660:130356, 2025

  23. [23]

    Nishiyama and M

    S. Nishiyama and M. Ohzeki. Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions.Journal of the Physical Society of Japan, 94(1):014802, 2025

  24. [24]

    Manzan and D

    G. Manzan and D. Tantari. The effect of priors on learning with restricted boltzmann machines. Physica A: Statistical Mechanics and its Applications, 674:130766, 2025. 27

  25. [25]

    Smolensky.Information Processing in Dynamical Systems: Foundations of Harmony The- ory

    P. Smolensky.Information Processing in Dynamical Systems: Foundations of Harmony The- ory. In: Rumelhart, D. E., McClelland, J. S. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 1, pages 194–281. MIT Press, 1986

  26. [26]

    Fischer and C

    A. Fischer and C. Igel. An introduction to restricted boltzmann machines. In L. Alvarez, M. Mejail, L. Gomez, and J. Jacobo, editors,Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 14–36, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg

  27. [27]

    D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines. Cognitive Science, 9(1):147–169, 1985

  28. [28]

    Le Roux and Y

    N. Le Roux and Y. Bengio. Representational power of restricted boltzmann machines and deep belief networks.Neural computation, 20(6):1631–1649, 2008

  29. [29]

    Decelle and C

    A. Decelle and C. Furtlehner. Restricted boltzmann machine: Recent advances and mean-field theory.Chinese Physics B, 30(4):040202, 2021

  30. [30]

    Marullo and E

    C. Marullo and E. Agliari. Boltzmann machines as generalized hopfield networks: A review of recent results and outlooks.Entropy, 23(1), 2021

  31. [31]

    Bonnaire, G

    T. Bonnaire, G. Catania, A. Decelle, and B. Seoane. On the role of non-linear latent features in bipartite generative neural networks.SciPost Phys., 19:141, 2025

  32. [32]

    Barra, A

    A. Barra, A. Bernacchia, E. Santucci, and P. Contucci. On the equivalence of hopfield networks and boltzmann machines.Neural Netw, 34:1–9, Oct 2012

  33. [33]

    Fachechi, E

    A. Fachechi, E. Agliari, M. Aquaro, A. Coolen, and M. Mulder. Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted boltzmann machines.Journal of Physics A: Mathematical and Theoret- ical, 58(6):065004, 2025

  34. [34]

    Bulso and Y

    N. Bulso and Y. Roudi. Restricted Boltzmann Machines as Models of Interacting Variables. Neural Computation, 33(10):2646–2681, 09 2021

  35. [35]

    Decelle, A

    A. Decelle, A. Navas G´ omez, and B. Seoane. Inferring higher-order couplings with neural networks.Phys. Rev. Lett., 135:207301, Nov 2025

  36. [36]

    Barra, G

    A. Barra, G. Genovese, P. Sollich, and D. Tantari. Phase transitions in restricted boltzmann machines with generic priors.Phys. Rev. E, 96:042156, Oct 2017

  37. [37]

    Barra, G

    A. Barra, G. Genovese, P. Sollich, and D. Tantari. Phase diagram of restricted boltzmann machines and generalized hopfield networks with arbitrary priors.Phys. Rev. E, 97:022310, Feb 2018

  38. [38]

    Tubiana and R

    J. Tubiana and R. Monasson. Emergence of compositional representations in restricted boltz- mann machines.Phys. Rev. Lett., 118:138301, Mar 2017

  39. [39]

    F. E. Leonelli, E. Agliari, L. Albanese, and A. Barra. On the effective initialisation for restricted boltzmann machines via duality with hopfield model.Neural Networks, 143:314–326, 2021

  40. [40]

    Ventura, S

    E. Ventura, S. Cocco, R. Monasson, and Francesco Zamponi. Unlearning regularization for boltzmann machines.Machine Learning: Science and Technology, 5(2):025078, jun 2024. 28

  41. [41]

    H. Shah, K. Tamuly, A. Raghunathan, P. Jain, and P. Netrapalli. The pitfalls of simplicity bias in neural networks.Advances in Neural Information Processing Systems, 33:9573–9585, 2020

  42. [42]

    Rende, F

    R. Rende, F. Gerace, A. Laio, and S. Goldt. A distributional simplicity bias in the learning dynamics of transformers.arXiv preprint arXiv:2410.19637, 2024

  43. [43]

    Refinetti, A

    M. Refinetti, A. Ingrosso, and S. Goldt. Neural networks trained with sgd learn distributions of increasing complexity. InInternational Conference on Machine Learning, pages 28843–28863. PMLR, 2023

  44. [44]

    Jangjoo, G

    F. Jangjoo, G. di Sarra, M. Marsili, and Y. Roudi. Lost in retraining: Closed-loop learning and model collapse in exponential families.Phys. Rev. Lett., 136:197301, May 2026. 29 Appendix The expected interaction in the Linear case Definingn≡s−p ⟨Ik1,···,k s⟩= X µ sX n=1 (−1)s−n X 1≤j1<j2···<jn≤n 1 2    nX l=1 ⟨(wkjl ,µ)2⟩+ nX l̸=l′=1 ⟨wkjl ,µ⟩⟨wkjl′ ,µ...