Explaining the effects of non-convergent sampling in the training of Energy-Based Models

Aur\'elien Decelle; Beatriz Seoane; Elisabeth Agoritsas; Giovanni Catania

arxiv: 2301.09428 · v2 · submitted 2023-01-23 · 💻 cs.LG · cond-mat.dis-nn· cond-mat.stat-mech

Explaining the effects of non-convergent sampling in the training of Energy-Based Models

Elisabeth Agoritsas , Giovanni Catania , Aur\'elien Decelle , Beatriz Seoane This is my paper

Pith reviewed 2026-05-24 10:16 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nncond-mat.stat-mech

keywords Energy-based modelsnon-convergent samplingMarkov chainsdynamical processempirical statisticsdiffusion modelsBoltzmann machine

0 comments

The pith

EBMs trained with non-persistent short Markov chain runs reproduce empirical data statistics through a precise dynamical process rather than equilibrium convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows analytically that Energy-Based Models trained by estimating gradients with short, non-convergent Markov chains starting from random points can exactly match a set of empirical statistics from the data. This match happens through the specific dynamics created by the incomplete sampling, not by converging to the model's equilibrium distribution. A reader would care because the result explains why short-run sampling strategies produce high-quality samples efficiently in practice. It also supplies the analytical basis for treating EBMs as diffusion-style models.

Core claim

EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. The authors derive this for generic EBMs, work it out explicitly in two solvable models, and verify the predictions numerically on a ConvNet EBM and a Boltzmann machine.

What carries the argument

Non-persistent short Markov chain runs that begin from random initial conditions and induce a dynamical process whose statistics are encoded exactly into the trained parameters.

If this is right

EBMs become usable as diffusion models because the dynamical encoding replaces the need for equilibrium sampling.
Short runs from random starts constitute an efficient, high-quality sampling method whose effect is now explained from first principles.
The effect can be computed in closed form for solvable models, giving exact predictions for how parameters shift under non-convergent training.
Numerical checks on neural-network EBMs confirm that the dynamical matching holds beyond the solvable cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dynamical-matching principle could be applied to other approximate-sampling generative models to reduce the cost of training.
Training objectives might be redesigned to optimize the dynamical statistics directly instead of an equilibrium loss.
The link to diffusion models raises the question of whether EBMs inherit convergence or mixing guarantees known for diffusion processes.

Load-bearing premise

The short runs must begin from random initial conditions so that the incomplete sampling dynamics alone, without equilibrium convergence, exactly capture the empirical statistics.

What would settle it

Train an EBM with the described short runs, then compare the statistics of samples produced by the same short-run procedure against the statistics obtained from long equilibrated chains on the same trained model; mismatch in the equilibrated case would support the claim.

Figures

Figures reproduced from arXiv: 2301.09428 by Aur\'elien Decelle, Beatriz Seoane, Elisabeth Agoritsas, Giovanni Catania.

**Figure 1.** Figure 1: Left: Evolution of J (k) α (t) for different values of k (different colors) and initial conditions for J (k) α (0). Right: Convergence for a large training time t, where the higher k, the faster the convergence; dash-dotted lines are the numerical integration of Eq. (18), full lines show the exponential fit J (k) α (t) − J (k),∞ α ∼A exp(−t/τ ) with τ given by Eq. (20). Inset: the asymptotic values J (k) α… view at source ↗

**Figure 2.** Figure 2: Left: Numerical resolution of the learning dynamics in presence of two modes. The dash-lines represent the resolution using a convergent dynamics k → ∞, while the plain ones correspond to k= 1. Right: (inset) evolution of J (k) α (t) for k= 1 at various stages of the learning. (Main) The error on the correlation function, E (2) = ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Generation results obtained with a ConvNet EBM trained on CIFAR-10 using k = 150 Langevin MCMC sampling steps from random initial conditions. Left: We use the Frechet Inception Distance Score (FID) to evaluate the generation quality as a function of the sampling time k ′ . The different colors correspond to different training epochs. We see again that the best score is achieved at k ′ = k (corresponding to… view at source ↗

**Figure 4.** Figure 4: Left: Error over the covariance matrices E (2) = P i<j (⟨xixj ⟩k′ ,p0 −⟨xixj ⟩pD ) 2 / N 2 between the training set and the data generated at different learning ages vs the generation time k ′ , for a BM trained with data sampled from a 2D ferromagnetic Ising model with N = 72 at k= 5. Similarly to [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Error over the eigenvalues of the covariance matrix generated after k ′ steps of MCMC vs the dataset covariance matrix. The setting is the same of [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. Our results provide a first-principles explanation for the observations of recent works proposing the strategy of using short runs starting from random initial conditions as an efficient way to generate high-quality samples in EBMs, and lay the groundwork for using EBMs as diffusion models. After explaining this effect in generic EBMs, we analyze two solvable models in which the effect of the non-convergent sampling in the trained parameters can be described in detail. Finally, we test these predictions numerically on a ConvNet EBM and a Boltzmann machine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives that short non-convergent MCMC runs from random starts let EBMs match data statistics exactly through the finite dynamics rather than equilibrium, with exact solutions in two models and checks on ConvNet and Boltzmann cases.

read the letter

The core result is an analytical account showing why non-persistent short runs in EBM gradient estimation reproduce chosen empirical statistics via the exact finite-step Markov process, not by converging to the equilibrium measure. This matches the abstract claim and supplies the first-principles mechanism that earlier empirical papers only observed. The two solvable models make the parameter shifts explicit, and the numerical tests on a ConvNet EBM plus a Boltzmann machine confirm the predictions hold in practice. That combination of derivation plus concrete checks is the paper's real contribution. The generic-EBM section inherits the same starting-point assumption as the solvable cases: each short chain begins from random, data-independent noise. If training instead used data-dependent initials or variable chain lengths, the exact matching would not necessarily follow, and the paper does not add extra justification for relaxing that condition. The numerical section appears to follow the random-init protocol, so the tests are consistent but do not probe the boundary. No circularity or self-citation issues show up in the abstract or stress-test note. Readers working on MCMC-based EBM training or on turning EBMs into diffusion-style samplers will find the mechanism useful; the rest of the field can treat it as a clarifying note rather than a new algorithm. The work is coherent on its own terms and supplies enough formal plus numerical grounding to merit referee time, even if the generic claim needs tighter scope statements in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that EBMs trained with non-persistent short Markov chain runs (starting from random initial conditions) to estimate gradients can exactly reproduce a chosen set of empirical statistics of the data via the finite-step dynamics of those runs, rather than by matching the equilibrium measure. An analytical derivation is presented for generic EBMs, followed by exact solutions for two solvable models that make the effect explicit, and numerical validation on a ConvNet EBM and a Boltzmann machine. The work positions this as an explanation for the success of short-run sampling strategies and as groundwork for EBMs as diffusion models.

Significance. If the central analytical claim holds under its assumptions, the result supplies a first-principles account of why short non-convergent chains succeed in EBM training and training-for-sampling, together with concrete solvable cases and numerical checks. These elements constitute a genuine contribution that could inform both theory and practice in energy-based and diffusion-style models.

major comments (2)

[generic EBMs section] Generic EBMs section: the exact analytical result for generic EBMs is stated to follow from the training dynamics of short runs, yet the derivation inherits the assumption that each chain begins from random, data-independent initial conditions without additional justification that this holds when the initial distribution is data-dependent or when chain length varies; this assumption is load-bearing for the generality claim.
[solvable models sections] Solvable models sections: the exact solutions are presented for two models, but the manuscript does not report an error analysis or sensitivity study with respect to finite chain length or initialization variance; without this, it is unclear whether the claimed exact reproduction remains robust outside the idealized limits used in the derivations.

minor comments (2)

The abstract should briefly specify which empirical statistics are exactly reproduced by the dynamical process.
[numerical experiments section] Numerical experiments section: figure captions and text should state the precise chain lengths, initialization distributions, and number of gradient steps used in the ConvNet and Boltzmann machine experiments to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and the positive assessment of the significance of our work. We address the major comments below and will revise the manuscript accordingly to improve clarity and robustness.

read point-by-point responses

Referee: [generic EBMs section] Generic EBMs section: the exact analytical result for generic EBMs is stated to follow from the training dynamics of short runs, yet the derivation inherits the assumption that each chain begins from random, data-independent initial conditions without additional justification that this holds when the initial distribution is data-dependent or when chain length varies; this assumption is load-bearing for the generality claim.

Authors: The derivation in the generic EBMs section is developed specifically under the assumption of short runs initialized from random, data-independent initial conditions, which is the setting used in the non-persistent short-run training strategies that the paper seeks to explain. This assumption is stated in the manuscript and is central to the result, as different initializations would lead to different dynamics. We do not claim the result holds for data-dependent initial distributions. To address the concern, we will revise the text to more explicitly state the scope of the assumption and provide a brief justification for focusing on data-independent random initials, namely that this matches the practical strategy whose success we aim to account for. Regarding chain length variation, the result holds for any fixed finite length under the stated conditions. revision: partial
Referee: [solvable models sections] Solvable models sections: the exact solutions are presented for two models, but the manuscript does not report an error analysis or sensitivity study with respect to finite chain length or initialization variance; without this, it is unclear whether the claimed exact reproduction remains robust outside the idealized limits used in the derivations.

Authors: We acknowledge that while the solvable models allow for exact derivations in specific cases, an analysis of robustness to variations in chain length and initialization would be beneficial. In the revised version, we will add a sensitivity study section or subsection, including both analytical considerations where feasible and numerical experiments to quantify the error or deviation as chain length and initialization variance change. This will help demonstrate the robustness of the effect beyond the idealized limits. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical derivation from short-run Markov dynamics is self-contained

full rationale

The paper derives its central claim analytically from the explicit form of the gradient estimator using finite-length Markov chains initialized at random (data-independent) noise. No step reduces a prediction to a fitted parameter by construction, invokes a self-citation as the sole justification for a uniqueness claim, or renames an input statistic as an output. The assumption on initial conditions is stated explicitly in the abstract and generic-EBM section rather than smuggled in; the subsequent solvable-model sections simply instantiate the same derivation. The result therefore stands or falls on the correctness of the dynamical calculation itself, not on any definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, invented entities, or ad-hoc axioms; the derivation rests on standard properties of Markov chains and gradient estimation in EBMs.

axioms (1)

standard math Markov chain Monte Carlo sampling reaches a stationary distribution under standard conditions
Invoked when contrasting convergent vs. non-convergent short runs

pith-pipeline@v0.9.0 · 5705 in / 1205 out tokens · 29586 ms · 2026-05-24T10:16:40.223082+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

H., Hinton, G

Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. A learning algorithm for boltzmann machines. Cognitive science, 9 0 (1): 0 147--169, 1985

work page 1985
[2]

Learning a restricted boltzmann machine using biased monte carlo sampling

B \'e reux, N., Decelle, A., Furtlehner, C., and Seoane, B. Learning a restricted boltzmann machine using biased monte carlo sampling. arXiv preprint arXiv:2206.01310, 2022

work page arXiv 2022
[3]

Science , author =

Carleo, G. and Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science, 355 0 (6325): 0 602--606, 2017. doi:10.1126/science.aag2302. URL https://www.science.org/doi/abs/10.1126/science.aag2302

work page doi:10.1126/science.aag2302 2017
[4]

and Furtlehner, C

Decelle, A. and Furtlehner, C. Exact training of restricted boltzmann machines on intrinsically low dimensional data. Phys. Rev. Lett., 127: 0 158303, Oct 2021 a . doi:10.1103/PhysRevLett.127.158303. URL https://link.aps.org/doi/10.1103/PhysRevLett.127.158303

work page doi:10.1103/physrevlett.127.158303 2021
[5]

and Furtlehner, C

Decelle, A. and Furtlehner, C. Restricted boltzmann machine: Recent advances and mean-field theory. Chinese Physics B, 30 0 (4): 0 040202, 2021 b

work page 2021
[6]

Spectral dynamics of learning in restricted boltzmann machines

Decelle, A., Fissore, G., and Furtlehner, C. Spectral dynamics of learning in restricted boltzmann machines. Europhysics Letters, 119 0 (6): 0 60001, nov 2017. doi:10.1209/0295-5075/119/60001. URL https://dx.doi.org/10.1209/0295-5075/119/60001

work page doi:10.1209/0295-5075/119/60001 2017
[7]

Thermodynamics of restricted boltzmann machines and related learning dynamics

Decelle, A., Fissore, G., and Furtlehner, C. Thermodynamics of restricted boltzmann machines and related learning dynamics. Journal of Statistical Physics, 172 0 (6): 0 1576--1608, 2018. doi:https://doi.org/10.1007/s10955-018-2105-y

work page doi:10.1007/s10955-018-2105-y 2018
[8]

Equilibrium and non-equilibrium regimes in the learning of restricted boltzmann machines

Decelle, A., Furtlehner, C., and Seoane, B. Equilibrium and non-equilibrium regimes in the learning of restricted boltzmann machines. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. doi:10.48550/ARXIV.2105.13889. URL https://openreview.net/forum?id=Bq_RoftLEeN

work page doi:10.48550/arxiv.2105.13889 2021
[9]

and Nichol, A

Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34: 0 8780--8794, 2021

work page 2021
[10]

and Mordatch, I

Du, Y. and Mordatch, I. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[11]

Compositional visual generation with energy based models

Du, Y., Li, S., and Mordatch, I. Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33: 0 6637--6647, 2020

work page 2020
[12]

Structure and eigenvalues of heat-bath markov chains

Dyer, M., Greenhill, C., and Ullrich, M. Structure and eigenvalues of heat-bath markov chains. Linear Algebra and its Applications, 454: 0 57--71, 2014

work page 2014
[13]

Robust multi-output learning with highly incomplete data via restricted boltzmann machines

Fissore, G., Decelle, A., Furtlehner, C., and Han, Y. Robust multi-output learning with highly incomplete data via restricted boltzmann machines. In Proceedings of the 9th European Starting AI Researchers’ Symposium 2020. arXiv, 2019. doi:10.48550/ARXIV.1912.09382. URL https://arxiv.org/abs/1912.09382

work page doi:10.48550/arxiv.1912.09382 2020
[14]

W., and Krzakala, F

Gabri \'e , M., Tramel, E. W., and Krzakala, F. Training restricted B oltzmann machine via the T houless- A nderson- P almer free energy. In Advances in neural information processing systems, pp.\ 640--648, 2015

work page 2015
[15]

Generative adversarial networks

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. Communications of the ACM, 63 0 (11): 0 139--144, 2020

work page 2020
[16]

Layerwise systematic scan: Deep boltzmann machines and beyond

Guo, H., Kara, K., and Zhang, C. Layerwise systematic scan: Deep boltzmann machines and beyond. In International Conference on Artificial Intelligence and Statistics, pp.\ 178--187. PMLR, 2018

work page 2018
[17]

Products of experts

Hinton, G. Products of experts. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), volume 1, pp.\ 1--6 vol.1, 1999. doi:10.1049/cp:19991075

work page doi:10.1049/cp:19991075 1999
[18]

Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural computation, 14 0 (8): 0 1771--1800, 2002. doi:10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002
[19]

D., Calhoun, V

Hjelm, R. D., Calhoun, V. D., Salakhutdinov, R., Allen, E. A., Adali, T., and Plis, S. M. Restricted boltzmann machines for neuroimaging: An application in identifying intrinsic networks. NeuroImage, 96: 0 245--260, 2014. ISSN 1053-8119. doi:https://doi.org/10.1016/j.neuroimage.2014.03.048. URL https://www.sciencedirect.com/science/article/pii/S1053811914002080

work page doi:10.1016/j.neuroimage.2014.03.048 2014
[20]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 0 6840--6851, 2020

work page 2020
[21]

Kappen, H. J. and Rodríguez, F. B. Efficient Learning in Boltzmann Machines Using Linear Response Theory . Neural Computation, 10 0 (5): 0 1137--1156, 07 1998. ISSN 0899-7667. doi:10.1162/089976698300017386. URL https://doi.org/10.1162/089976698300017386

work page doi:10.1162/089976698300017386 1998
[22]

On the solutions and the steady states of a master equation

Keizer, J. On the solutions and the steady states of a master equation. Journal of Statistical Physics, 6 0 (2): 0 67--72, 1972

work page 1972
[23]

Deep Directed Generative Models with Energy-Based Probability Estimation

Kim, T. and Bengio, Y. Deep directed generative models with energy-based probability estimation. arXiv preprint arXiv:1606.03439, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018

work page 2018
[25]

Maximum Entropy Generators for Energy-Based Models

Kumar, R., Ozair, S., Goyal, A., Courville, A., and Bengio, Y. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[26]

A tutorial on energy-based learning

LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F. A tutorial on energy-based learning. Predicting structured data, 1 0 (0), 2006

work page 2006
[27]

J., and Hinton, G

Liao, R., Kornblith, S., Ren, M., Fleet, D. J., and Hinton, G. Gaussian-bernoulli rbms without tears. arXiv preprint arXiv:2210.10318, 2022

work page arXiv 2022
[28]

G., Carleo, G., Carrasquilla, J., and Cirac, J

Melko, R. G., Carleo, G., Carrasquilla, J., and Cirac, J. I. Restricted boltzmann machines in quantum physics. Nature Physics, 15 0 (9): 0 887--892, 2019

work page 2019
[29]

S., Sander, C., Zecchina, R., Onuchic, J

Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T., and Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences, 108 0 (49): 0 E1293--E1301, 2011

work page 2011
[30]

P., Pagnani, A., Weigt, M., and Zamponi, F

Muntoni, A. P., Pagnani, A., Weigt, M., and Zamponi, F. adabmdca: adaptive boltzmann machine learning for biological sequences. BMC bioinformatics, 22 0 (1): 0 1--19, 2021

work page 2021
[31]

Nguyen, H. C. and Berg, J. Bethe–peierls approximation and the inverse ising problem. Journal of Statistical Mechanics: Theory and Experiment, 2012 0 (03): 0 P03004, mar 2012. doi:10.1088/1742-5468/2012/03/P03004. URL https://dx.doi.org/10.1088/1742-5468/2012/03/P03004

work page doi:10.1088/1742-5468/2012/03/p03004 2012
[32]

C., Zecchina, R., and Berg, J

Nguyen, H. C., Zecchina, R., and Berg, J. Inverse statistical problems: from the inverse ising problem to data science. Advances in Physics, 66 0 (3): 0 197--261, 2017. doi:10.1080/00018732.2017.1341604. URL https://doi.org/10.1080/00018732.2017.1341604

work page doi:10.1080/00018732.2017.1341604 2017
[33]

Nijkamp, E., Hill, M., Zhu, S.-C., and Wu, Y. N. Learning non-convergent non-persistent short-run mcmc toward energy-based model. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alch\' e -Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips...

work page 2019
[34]

Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., and Wu, Y. N. On the anatomy of mcmc-based maximum likelihood learning of energy-based models. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (04): 0 5272--5280, Apr. 2020. doi:10.1609/aaai.v34i04.5973. URL https://ojs.aaai.org/index.php/AAAI/article/view/5973

work page doi:10.1609/aaai.v34i04.5973 2020
[35]

The bethe approximation for solving the inverse ising problem: a comparison with other inference methods

Ricci-Tersenghi, F. The bethe approximation for solving the inverse ising problem: a comparison with other inference methods. Journal of Statistical Mechanics: Theory and Experiment, 2012 0 (08): 0 P08015, aug 2012. doi:10.1088/1742-5468/2012/08/P08015. URL https://dx.doi.org/10.1088/1742-5468/2012/08/P08015

work page doi:10.1088/1742-5468/2012/08/p08015 2012
[36]

and Hinton, G

Salakhutdinov, R. and Hinton, G. Deep boltzmann machines. In van Dyk, D. and Welling, M. (eds.), Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pp.\ 448--455, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16--18 Apr 2009. PMLR. URL https:/...

work page 2009
[37]

Information Processing in Dynamical Systems: Foundations of Harmony Theory, volume 6

Smolensky, P. Information Processing in Dynamical Systems: Foundations of Harmony Theory, volume 6. 1987. ISBN 9780262291408

work page 1987
[38]

Deep unsupervised learning using nonequilibrium thermodynamics

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265. PMLR, 2015

work page 2015
[39]

and Kubo, R

Suzuki, M. and Kubo, R. Dynamics of the ising model near the critical point. i. Journal of the Physical Society of Japan, 24 0 (1): 0 51--60, 1968. doi:10.1143/JPSJ.24.51

work page doi:10.1143/jpsj.24.51 1968
[40]

Vincent, H

Tieleman, T. Training restricted B oltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pp.\ 1064--1071, 2008. doi:10.1145/1390156.1390290

work page doi:10.1145/1390156.1390290 2008
[41]

Learning protein constitutive motifs from sequence data

Tubiana, J., Cocco, S., and Monasson, R. Learning protein constitutive motifs from sequence data. Elife, 8: 0 e39397, 2019

work page 2019
[42]

Creating artificial human genomes using generative neural networks

Yelmen, B., Decelle, A., Ongaro, L., Marnetto, D., Tallec, C., Montinaro, F., Furtlehner, C., Pagani, L., and Jay, F. Creating artificial human genomes using generative neural networks. PLOS Genetics, 17 0 (2): 0 1--22, 02 2021. doi:10.1371/journal.pgen.1009303. URL https://doi.org/10.1371/journal.pgen.1009303

work page doi:10.1371/journal.pgen.1009303 2021
[43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[1] [1]

H., Hinton, G

Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. A learning algorithm for boltzmann machines. Cognitive science, 9 0 (1): 0 147--169, 1985

work page 1985

[2] [2]

Learning a restricted boltzmann machine using biased monte carlo sampling

B \'e reux, N., Decelle, A., Furtlehner, C., and Seoane, B. Learning a restricted boltzmann machine using biased monte carlo sampling. arXiv preprint arXiv:2206.01310, 2022

work page arXiv 2022

[3] [3]

Science , author =

Carleo, G. and Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science, 355 0 (6325): 0 602--606, 2017. doi:10.1126/science.aag2302. URL https://www.science.org/doi/abs/10.1126/science.aag2302

work page doi:10.1126/science.aag2302 2017

[4] [4]

and Furtlehner, C

Decelle, A. and Furtlehner, C. Exact training of restricted boltzmann machines on intrinsically low dimensional data. Phys. Rev. Lett., 127: 0 158303, Oct 2021 a . doi:10.1103/PhysRevLett.127.158303. URL https://link.aps.org/doi/10.1103/PhysRevLett.127.158303

work page doi:10.1103/physrevlett.127.158303 2021

[5] [5]

and Furtlehner, C

Decelle, A. and Furtlehner, C. Restricted boltzmann machine: Recent advances and mean-field theory. Chinese Physics B, 30 0 (4): 0 040202, 2021 b

work page 2021

[6] [6]

Spectral dynamics of learning in restricted boltzmann machines

Decelle, A., Fissore, G., and Furtlehner, C. Spectral dynamics of learning in restricted boltzmann machines. Europhysics Letters, 119 0 (6): 0 60001, nov 2017. doi:10.1209/0295-5075/119/60001. URL https://dx.doi.org/10.1209/0295-5075/119/60001

work page doi:10.1209/0295-5075/119/60001 2017

[7] [7]

Thermodynamics of restricted boltzmann machines and related learning dynamics

Decelle, A., Fissore, G., and Furtlehner, C. Thermodynamics of restricted boltzmann machines and related learning dynamics. Journal of Statistical Physics, 172 0 (6): 0 1576--1608, 2018. doi:https://doi.org/10.1007/s10955-018-2105-y

work page doi:10.1007/s10955-018-2105-y 2018

[8] [8]

Equilibrium and non-equilibrium regimes in the learning of restricted boltzmann machines

Decelle, A., Furtlehner, C., and Seoane, B. Equilibrium and non-equilibrium regimes in the learning of restricted boltzmann machines. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021. doi:10.48550/ARXIV.2105.13889. URL https://openreview.net/forum?id=Bq_RoftLEeN

work page doi:10.48550/arxiv.2105.13889 2021

[9] [9]

and Nichol, A

Dhariwal, P. and Nichol, A. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34: 0 8780--8794, 2021

work page 2021

[10] [10]

and Mordatch, I

Du, Y. and Mordatch, I. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[11] [11]

Compositional visual generation with energy based models

Du, Y., Li, S., and Mordatch, I. Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33: 0 6637--6647, 2020

work page 2020

[12] [12]

Structure and eigenvalues of heat-bath markov chains

Dyer, M., Greenhill, C., and Ullrich, M. Structure and eigenvalues of heat-bath markov chains. Linear Algebra and its Applications, 454: 0 57--71, 2014

work page 2014

[13] [13]

Robust multi-output learning with highly incomplete data via restricted boltzmann machines

Fissore, G., Decelle, A., Furtlehner, C., and Han, Y. Robust multi-output learning with highly incomplete data via restricted boltzmann machines. In Proceedings of the 9th European Starting AI Researchers’ Symposium 2020. arXiv, 2019. doi:10.48550/ARXIV.1912.09382. URL https://arxiv.org/abs/1912.09382

work page doi:10.48550/arxiv.1912.09382 2020

[14] [14]

W., and Krzakala, F

Gabri \'e , M., Tramel, E. W., and Krzakala, F. Training restricted B oltzmann machine via the T houless- A nderson- P almer free energy. In Advances in neural information processing systems, pp.\ 640--648, 2015

work page 2015

[15] [15]

Generative adversarial networks

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. Communications of the ACM, 63 0 (11): 0 139--144, 2020

work page 2020

[16] [16]

Layerwise systematic scan: Deep boltzmann machines and beyond

Guo, H., Kara, K., and Zhang, C. Layerwise systematic scan: Deep boltzmann machines and beyond. In International Conference on Artificial Intelligence and Statistics, pp.\ 178--187. PMLR, 2018

work page 2018

[17] [17]

Products of experts

Hinton, G. Products of experts. In 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), volume 1, pp.\ 1--6 vol.1, 1999. doi:10.1049/cp:19991075

work page doi:10.1049/cp:19991075 1999

[18] [18]

Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural computation, 14 0 (8): 0 1771--1800, 2002. doi:10.1162/089976602760128018

work page doi:10.1162/089976602760128018 2002

[19] [19]

D., Calhoun, V

Hjelm, R. D., Calhoun, V. D., Salakhutdinov, R., Allen, E. A., Adali, T., and Plis, S. M. Restricted boltzmann machines for neuroimaging: An application in identifying intrinsic networks. NeuroImage, 96: 0 245--260, 2014. ISSN 1053-8119. doi:https://doi.org/10.1016/j.neuroimage.2014.03.048. URL https://www.sciencedirect.com/science/article/pii/S1053811914002080

work page doi:10.1016/j.neuroimage.2014.03.048 2014

[20] [20]

Denoising diffusion probabilistic models

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 0 6840--6851, 2020

work page 2020

[21] [21]

Kappen, H. J. and Rodríguez, F. B. Efficient Learning in Boltzmann Machines Using Linear Response Theory . Neural Computation, 10 0 (5): 0 1137--1156, 07 1998. ISSN 0899-7667. doi:10.1162/089976698300017386. URL https://doi.org/10.1162/089976698300017386

work page doi:10.1162/089976698300017386 1998

[22] [22]

On the solutions and the steady states of a master equation

Keizer, J. On the solutions and the steady states of a master equation. Journal of Statistical Physics, 6 0 (2): 0 67--72, 1972

work page 1972

[23] [23]

Deep Directed Generative Models with Energy-Based Probability Estimation

Kim, T. and Bengio, Y. Deep directed generative models with energy-based probability estimation. arXiv preprint arXiv:1606.03439, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018

work page 2018

[25] [25]

Maximum Entropy Generators for Energy-Based Models

Kumar, R., Ozair, S., Goyal, A., Courville, A., and Bengio, Y. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[26] [26]

A tutorial on energy-based learning

LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F. A tutorial on energy-based learning. Predicting structured data, 1 0 (0), 2006

work page 2006

[27] [27]

J., and Hinton, G

Liao, R., Kornblith, S., Ren, M., Fleet, D. J., and Hinton, G. Gaussian-bernoulli rbms without tears. arXiv preprint arXiv:2210.10318, 2022

work page arXiv 2022

[28] [28]

G., Carleo, G., Carrasquilla, J., and Cirac, J

Melko, R. G., Carleo, G., Carrasquilla, J., and Cirac, J. I. Restricted boltzmann machines in quantum physics. Nature Physics, 15 0 (9): 0 887--892, 2019

work page 2019

[29] [29]

S., Sander, C., Zecchina, R., Onuchic, J

Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T., and Weigt, M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences, 108 0 (49): 0 E1293--E1301, 2011

work page 2011

[30] [30]

P., Pagnani, A., Weigt, M., and Zamponi, F

Muntoni, A. P., Pagnani, A., Weigt, M., and Zamponi, F. adabmdca: adaptive boltzmann machine learning for biological sequences. BMC bioinformatics, 22 0 (1): 0 1--19, 2021

work page 2021

[31] [31]

Nguyen, H. C. and Berg, J. Bethe–peierls approximation and the inverse ising problem. Journal of Statistical Mechanics: Theory and Experiment, 2012 0 (03): 0 P03004, mar 2012. doi:10.1088/1742-5468/2012/03/P03004. URL https://dx.doi.org/10.1088/1742-5468/2012/03/P03004

work page doi:10.1088/1742-5468/2012/03/p03004 2012

[32] [32]

C., Zecchina, R., and Berg, J

Nguyen, H. C., Zecchina, R., and Berg, J. Inverse statistical problems: from the inverse ising problem to data science. Advances in Physics, 66 0 (3): 0 197--261, 2017. doi:10.1080/00018732.2017.1341604. URL https://doi.org/10.1080/00018732.2017.1341604

work page doi:10.1080/00018732.2017.1341604 2017

[33] [33]

Nijkamp, E., Hill, M., Zhu, S.-C., and Wu, Y. N. Learning non-convergent non-persistent short-run mcmc toward energy-based model. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alch\' e -Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips...

work page 2019

[34] [34]

Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., and Wu, Y. N. On the anatomy of mcmc-based maximum likelihood learning of energy-based models. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (04): 0 5272--5280, Apr. 2020. doi:10.1609/aaai.v34i04.5973. URL https://ojs.aaai.org/index.php/AAAI/article/view/5973

work page doi:10.1609/aaai.v34i04.5973 2020

[35] [35]

The bethe approximation for solving the inverse ising problem: a comparison with other inference methods

Ricci-Tersenghi, F. The bethe approximation for solving the inverse ising problem: a comparison with other inference methods. Journal of Statistical Mechanics: Theory and Experiment, 2012 0 (08): 0 P08015, aug 2012. doi:10.1088/1742-5468/2012/08/P08015. URL https://dx.doi.org/10.1088/1742-5468/2012/08/P08015

work page doi:10.1088/1742-5468/2012/08/p08015 2012

[36] [36]

and Hinton, G

Salakhutdinov, R. and Hinton, G. Deep boltzmann machines. In van Dyk, D. and Welling, M. (eds.), Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pp.\ 448--455, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16--18 Apr 2009. PMLR. URL https:/...

work page 2009

[37] [37]

Information Processing in Dynamical Systems: Foundations of Harmony Theory, volume 6

Smolensky, P. Information Processing in Dynamical Systems: Foundations of Harmony Theory, volume 6. 1987. ISBN 9780262291408

work page 1987

[38] [38]

Deep unsupervised learning using nonequilibrium thermodynamics

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.\ 2256--2265. PMLR, 2015

work page 2015

[39] [39]

and Kubo, R

Suzuki, M. and Kubo, R. Dynamics of the ising model near the critical point. i. Journal of the Physical Society of Japan, 24 0 (1): 0 51--60, 1968. doi:10.1143/JPSJ.24.51

work page doi:10.1143/jpsj.24.51 1968

[40] [40]

Vincent, H

Tieleman, T. Training restricted B oltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pp.\ 1064--1071, 2008. doi:10.1145/1390156.1390290

work page doi:10.1145/1390156.1390290 2008

[41] [41]

Learning protein constitutive motifs from sequence data

Tubiana, J., Cocco, S., and Monasson, R. Learning protein constitutive motifs from sequence data. Elife, 8: 0 e39397, 2019

work page 2019

[42] [42]

Creating artificial human genomes using generative neural networks

Yelmen, B., Decelle, A., Ongaro, L., Marnetto, D., Tallec, C., Montinaro, F., Furtlehner, C., Pagani, L., and Jay, F. Creating artificial human genomes using generative neural networks. PLOS Genetics, 17 0 (2): 0 1--22, 02 2021. doi:10.1371/journal.pgen.1009303. URL https://doi.org/10.1371/journal.pgen.1009303

work page doi:10.1371/journal.pgen.1009303 2021

[43] [43]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page