pith. sign in

arxiv: 2512.11415 · v2 · submitted 2025-12-12 · ❄️ cond-mat.stat-mech · cs.LG

Emergence of Nonequilibrium Latent Cycles in Unsupervised Generative Modeling

Pith reviewed 2026-05-16 23:08 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech cs.LG
keywords nonequilibrium dynamicslatent cyclesgenerative modelingMarkov chainslikelihood maximizationentropy productionunsupervised learningdetailed balance
0
0 comments X

The pith

Likelihood maximization spontaneously generates nonequilibrium latent cycles that enhance generative performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A generative model couples visible and hidden variables through two separately parametrized transition matrices, creating an inherently out-of-equilibrium Markov chain. When trained by maximizing the likelihood of observed data, the system evolves toward steady states with finite entropy production and circulating probability currents in the latent space. These cycles arise without explicit design and allow the model to escape the poor performance region typical of nearly reversible dynamics. As a result, the trained models reproduce the empirical distribution of data classes more accurately than equilibrium alternatives such as restricted Boltzmann machines.

Core claim

The central discovery is that nonequilibrium dynamics play a constructive role in unsupervised learning: likelihood maximization drives a Markov chain defined by two independent transition matrices toward steady states characterized by persistent latent cycles, reduced self-transition probabilities, and finite entropy production. These emergent cycles break detailed balance and improve the model's fidelity to the data distribution compared with reversible equilibrium models.

What carries the argument

A Markov chain whose transitions between visible and hidden variables are governed by two independently optimized matrices, whose nonequilibrium steady state carries probability currents.

If this is right

  • The model develops persistent probability currents in the latent space.
  • Self-transition probabilities decrease during training.
  • The dynamics avoid the low-log-likelihood regime associated with reversible models.
  • The log-likelihood gradient depends explicitly on the last two steps of the Markov chain.
  • Generative fidelity to empirical data distributions exceeds that of equilibrium models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Irreversibility may improve learning in other latent-variable architectures beyond this specific Markov setup.
  • The framework links training dynamics to physical nonequilibrium processes that organize states.
  • Similar cycles could be searched for in contrastive or energy-based training methods.
  • Deeper versions of the model might produce hierarchical cycle structures across layers.

Load-bearing premise

Optimizing two independent transition matrices through likelihood maximization will consistently produce nonequilibrium cycles rather than collapsing to reversible dynamics.

What would settle it

Training the model on data and checking whether the resulting steady state shows zero entropy production or whether its reproduction of class distributions is no better than a reversible baseline model.

Figures

Figures reproduced from arXiv: 2512.11415 by Alberto Rosso, Marco Baiesi.

Figure 1
Figure 1. Figure 1: FIG. 1. Markov chain dynamics alternating visible states [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. (a) Log-likelihood as a function the epochs, for the 36 models trained independently. The color code follows the final [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: shows two representative runs: one that reaches both large σ and large L ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. For the 36 trained models, log-linear plots of several indicators as a function of the entropy production rate: (a) KL [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

We show that nonequilibrium dynamics can play a constructive role in unsupervised machine learning by inducing the spontaneous emergence of latent-state cycles. We introduce a model in which visible and hidden variables interact through two independently parametrized transition matrices, defining a Markov chain whose steady state is intrinsically out of equilibrium. Likelihood maximization drives this system toward nonequilibrium steady states with finite entropy production, reduced self-transition probabilities, and persistent probability currents in the latent space. These cycles are not imposed by the architecture but arise from training, and models that develop them avoid the low-log-likelihood regime associated with nearly reversible dynamics while more faithfully reproducing the empirical distribution of data classes. Compared with equilibrium approaches such as restricted Boltzmann machines, our model breaks the detailed balance between the forward and backward conditional transitions and relies on a log-likelihood gradient that depends explicitly on the last two steps of the Markov chain. Hence, this exploration of the interface between nonequilibrium statistical physics and modern machine learning suggests that introducing irreversibility into latent-variable models can enhance generative performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a latent-variable generative model in which visible and hidden states interact through two independently parametrized transition matrices, forming a Markov chain whose steady state is out of equilibrium by construction. Likelihood maximization is claimed to drive the system spontaneously to nonequilibrium steady states featuring finite entropy production, reduced self-transition probabilities, and persistent probability currents in latent space. These cycles are asserted to emerge from training rather than architecture, to avoid the low-log-likelihood regime of nearly reversible dynamics, and to yield more faithful reproduction of empirical data-class distributions than equilibrium models such as restricted Boltzmann machines. The log-likelihood gradient is stated to depend explicitly on the last two steps of the chain because detailed balance is broken between forward and backward transitions.

Significance. If the central claims are substantiated, the work establishes a constructive role for nonequilibrium dynamics in unsupervised learning: irreversibility in the latent chain can be harnessed to improve generative fidelity without explicit architectural constraints. It supplies a concrete interface between entropy production, probability currents, and modern generative modeling, potentially motivating new classes of latent-variable models that exploit rather than suppress nonequilibrium effects.

major comments (2)
  1. Abstract: the assertion that 'likelihood maximization drives this system toward nonequilibrium steady states with finite entropy production' is presented without a derivation or argument showing that the likelihood surface has no critical points (or has only lower-likelihood critical points) on the reversible manifold where the two transition matrices satisfy detailed balance. Without this, the emergence of cycles remains compatible with convergence to a reversible fixed point, as the two-step gradient dependence vanishes exactly on that manifold.
  2. Abstract: the claim that models developing cycles 'more faithfully reproducing the empirical distribution of data classes' is unsupported by any reported quantitative evidence (log-likelihood values, KL divergences, class-conditional statistics, or direct RBM baselines) in the provided text, rendering the performance advantage impossible to evaluate.
minor comments (1)
  1. The abstract would benefit from a single sentence stating the datasets or data types on which the model was tested.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive criticism of our manuscript. We address each major comment below and have revised the manuscript to strengthen the supporting arguments and evidence where possible.

read point-by-point responses
  1. Referee: Abstract: the assertion that 'likelihood maximization drives this system toward nonequilibrium steady states with finite entropy production' is presented without a derivation or argument showing that the likelihood surface has no critical points (or has only lower-likelihood critical points) on the reversible manifold where the two transition matrices satisfy detailed balance. Without this, the emergence of cycles remains compatible with convergence to a reversible fixed point, as the two-step gradient dependence vanishes exactly on that manifold.

    Authors: We agree that the original presentation would benefit from a clearer argument on this point. The manuscript relies on the explicit structure of the log-likelihood gradient, which includes a two-step term that vanishes identically on the reversible manifold. In the revised version we have added a dedicated paragraph in Section II.C explaining that any critical point on the reversible manifold must satisfy a reduced gradient condition equivalent to that of an equilibrium model, and we provide numerical evidence that such points correspond to lower-likelihood attractors than the nonequilibrium states reached from generic initializations. While a exhaustive analytical proof that no higher-likelihood reversible critical points exist in the full parameter space is beyond the present scope, the combination of the gradient structure and the observed training dynamics supports the claim that likelihood maximization drives the system away from the reversible manifold. revision: partial

  2. Referee: Abstract: the claim that models developing cycles 'more faithfully reproducing the empirical distribution of data classes' is unsupported by any reported quantitative evidence (log-likelihood values, KL divergences, class-conditional statistics, or direct RBM baselines) in the provided text, rendering the performance advantage impossible to evaluate.

    Authors: The full manuscript contains quantitative comparisons with RBM baselines, including log-likelihood values and class-distribution statistics, primarily in Figures 3–5 and the associated supplementary tables. These were not referenced explicitly enough in the abstract or the opening paragraphs of the results. In the revision we have updated the abstract to cite the specific performance metrics and added a concise summary paragraph in Section III.B that directly reports the log-likelihood improvement, the reduction in KL divergence to the empirical class distribution, and the RBM baseline values. This makes the quantitative advantage immediately evaluable. revision: yes

standing simulated objections not resolved
  • A complete analytical demonstration that the likelihood surface contains no reversible critical points of higher likelihood than the observed nonequilibrium attractors.

Circularity Check

0 steps flagged

No significant circularity; emergence of cycles is an empirical outcome of likelihood optimization on an explicitly irreversible two-matrix chain

full rationale

The paper constructs the generative model by defining visible-hidden interactions via two independently parametrized transition matrices, which explicitly breaks detailed balance and yields a composite Markov operator whose steady state can carry probability currents. Likelihood maximization is then performed on this chain, and the resulting finite entropy production and latent cycles are reported as outcomes of the training dynamics rather than imposed by fiat. No step equates a claimed prediction to its own fitted parameters by construction, no load-bearing uniqueness theorem is imported from self-citations, and no ansatz is smuggled in via prior work. The central result therefore remains an independent consequence of the objective applied to the defined irreversible architecture, not a tautological renaming of the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a Markov chain defined by two independent transition matrices possesses an intrinsically nonequilibrium steady state whose properties are altered by likelihood training; no free parameters are explicitly named beyond the transition matrices themselves, and no new entities are postulated.

free parameters (1)
  • parameters of the two transition matrices
    Independently parametrized forward and backward matrices whose values are adjusted during likelihood maximization.
axioms (1)
  • domain assumption The steady state of the Markov chain is intrinsically out of equilibrium when the two transition matrices are independently parametrized
    Stated directly in the abstract as the foundation of the model.

pith-pipeline@v0.9.0 · 5469 in / 1414 out tokens · 44483 ms · 2026-05-16T23:08:12.774020+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

  1. [1]

    For forward weights w, we choose an initialization such that a random hidden configurationzgenerates a visible state x≃ ⟨x⟩ x∈D, close to the empirical average of the data

    Initialization of weights The backward weights and biases wand aare set to zero so that the hidden stateszare sampled uniformly during the early stages of learning. For forward weights w, we choose an initialization such that a random hidden configurationzgenerates a visible state x≃ ⟨x⟩ x∈D, close to the empirical average of the data. In the one-hot enco...

  2. [2]

    The ratiosA ∗ zx/⟨A ∗ zx⟩p(z) can then replace Azx/⟨A zx⟩p(z) in gradient computations, while the log- likelihoods computed withA ∗ include a correction term −DlnQ

    Factorized transition probabilities A numerically more stable variant replacesA zx by A∗ zx = DY i=1 exi(P α wiαzα+ai) e P α wiαzα+ai + 1 Q ,(A1) whereQ≈1.5–1.8 compensates for typical sigmoid mag- nitudes. The ratiosA ∗ zx/⟨A ∗ zx⟩p(z) can then replace Azx/⟨A zx⟩p(z) in gradient computations, while the log- likelihoods computed withA ∗ include a correcti...

  3. [3]

    A more precise estimate uses long Markov chains (e.g

    Efficient log-likelihood estimation A fast on-the-fly estimate of the log-likelihood for a mini-batchNis obtained from LN ≈ 1 N X x∈N ln⟨A zx⟩z where the average is over sampledzstates and time steps t. A more precise estimate uses long Markov chains (e.g. T= 10 5) and allN D data points, which requires evalu- atingT×N D transition probabilities. When the...

  4. [4]

    Mehta, M

    P. Mehta, M. Bukov, C.-H. Wang, A. G. Day, C. Richard- son, C. K. Fisher, and D. J. Schwab, A high-bias, low- variance introduction to machine learning for physicists, Physics Reports810, 1 (2019)

  5. [5]

    Smolensky, Information processing in dynamical sys- tems: Foundations of harmony theory (MIT Press, 1986) 8 Chap

    P. Smolensky, Information processing in dynamical sys- tems: Foundations of harmony theory (MIT Press, 1986) 8 Chap. 6

  6. [6]

    G. E. Hinton, A practical guide to training restricted Boltzmann machines, inNeural Networks: Tricks of the Trade: Second Edition(Springer, 2012) pp. 599–619

  7. [7]

    Malbranke, D

    C. Malbranke, D. Bikard, S. Cocco, R. Monasson, and J. Tubiana, Machine learning for evolutionary-based and physics-inspired protein design: Current and future syn- ergies, Current Opinion in Structural Biology80, 102571 (2023)

  8. [8]

    Fernandez-de Cossio-Diaz, P

    J. Fernandez-de Cossio-Diaz, P. Hardouin, F.-X. L. Du Moutier, A. Di Gioacchino, B. Marchand, Y. Ponty, B. Sargueil, R. Monasson, and S. Cocco, Designing molecular RNA switches with restricted Boltzmann ma- chines, bioRxiv , 2023 (2023)

  9. [9]

    Decelle, B

    A. Decelle, B. Seoane, and L. Rosset, Unsupervised hi- erarchical clustering using the learning dynamics of re- stricted Boltzmann machines, Physical Review E108, 014110 (2023)

  10. [10]

    Braghetto, E

    A. Braghetto, E. Orlandini, and M. Baiesi, Interpretable machine learning of amino acid patterns in proteins: a statistical ensemble approach, Journal of Chemical The- ory and Computation19, 6011 (2023)

  11. [11]

    Roussel, S

    C. Roussel, S. Cocco, and R. Monasson, Barriers and dynamical paths in alternating Gibbs sampling of re- stricted Boltzmann machines, Physical Review E104, 034109 (2021)

  12. [12]

    Agoritsas, G

    E. Agoritsas, G. Catania, A. Decelle, and B. Seoane, Explaining the effects of non-convergent MCMC in the training of energy-based models, inInternational Confer- ence on Machine Learning(PMLR, 2023) pp. 322–336

  13. [13]

    B´ ereux, A

    N. B´ ereux, A. Decelle, C. Furtlehner, L. Rosset, and B. Seoane, Fast training and sampling of restricted Boltz- mann machines, in13th International Conference on Learning Representations-ICLR 2025(2025)

  14. [14]

    M. E. Newman and G. T. Barkema,Monte Carlo methods in statistical physics(Clarendon Press, 1999)

  15. [15]

    G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal, The” wake-sleep” algorithm for unsupervised neural networks, Science268, 1158 (1995)

  16. [16]

    Salakhutdinov and G

    R. Salakhutdinov and G. Hinton, Deep Boltzmann ma- chines, inArtificial intelligence and statistics(PMLR,

  17. [17]

    Reweighted Wake-Sleep

    J. Bornschein and Y. Bengio, Reweighted wake-sleep, arXiv preprint arXiv:1406.2751 (2014)

  18. [18]

    D. P. Kingma and M. Welling, Auto-encoding variational Bayes, arXiv preprint arXiv:1312.6114 (2014)

  19. [19]

    D. J. Rezende, S. Mohamed, and D. Wierstra, Stochastic backpropagation and approximate inference in deep gen- erative models, inInternational conference on machine learning(PMLR, 2014) pp. 1278–1286

  20. [20]

    Sohl-Dickstein, E

    J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, Deep unsupervised learning using nonequi- librium thermodynamics, inInternational conference on machine learning(pmlr, 2015) pp. 2256–2265

  21. [21]

    Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, Score-based generative modeling through stochastic differential equations, arXiv preprint arXiv:2011.13456 (2020)

  22. [22]

    Biroli, T

    G. Biroli, T. Bonnaire, V. De Bortoli, and M. M´ ezard, Dynamical regimes of diffusion models, Nature Commu- nications15, 9957 (2024)

  23. [23]

    Proceedings of the 42nd International Conference on Machine Learning , pages=

    M. Kamb and S. Ganguli, An analytic theory of cre- ativity in convolutional diffusion models, arXiv preprint arXiv:2412.20292 (2024)

  24. [24]

    Sclocchi, A

    A. Sclocchi, A. Favero, and M. Wyart, A phase transition in diffusion models reveals the hierarchical nature of data, Proceedings of the National Academy of Sciences122, e2408799121 (2025)

  25. [25]

    Ikeda, T

    K. Ikeda, T. Uda, D. Okanohara, and S. Ito, Speed- accuracy relations for diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport, Physical Review X15, 031031 (2025)

  26. [26]

    Diaconis, S

    P. Diaconis, S. Holmes, and R. M. Neal, Analysis of a nonreversible Markov chain sampler, Annals of Applied Probability , 726 (2000)

  27. [27]

    E. P. Bernard, W. Krauth, and D. B. Wilson, Event-chain Monte Carlo algorithms for hard-sphere systems, Phys- ical Review E—Statistical, Nonlinear, and Soft Matter Physics80, 056704 (2009)

  28. [28]

    Michel, S

    M. Michel, S. C. Kapfer, and W. Krauth, Generalized event-chain Monte Carlo: Constructing rejection-free global-balance algorithms from infinitesimal steps, The Journal of chemical physics140(2014)

  29. [29]

    Vucelja, Lifting—a nonreversible Markov chain Monte Carlo algorithm, American Journal of Physics84, 958 (2016)

    M. Vucelja, Lifting—a nonreversible Markov chain Monte Carlo algorithm, American Journal of Physics84, 958 (2016)

  30. [30]

    Montavon and K.-R

    G. Montavon and K.-R. M¨ uller, Deep Boltzmann ma- chines and the centering trick, inNeural networks: tricks of the trade(Springer, 2012) pp. 621–637

  31. [31]

    Barra, G

    A. Barra, G. Genovese, P. Sollich, and D. Tantari, Phase transitions in restricted Boltzmann machines with generic priors, Physical Review E96, 042156 (2017)

  32. [32]

    Tubiana and R

    J. Tubiana and R. Monasson, Emergence of composi- tional representations in restricted Boltzmann machines, Physical Review Letters118, 138301 (2017)

  33. [33]

    Ventura, S

    E. Ventura, S. Cocco, R. Monasson, and F. Zamponi, Un- learning regularization for Boltzmann machines, Machine Learning: Science and Technology5, 025078 (2024)

  34. [34]

    Decelle, C

    A. Decelle, C. Furtlehner, and B. Seoane, Equilibrium and non-equilibrium regimes in the learning of restricted Boltzmann machines, Journal of Statistical Mechanics: Theory and Experiment2022, 114009 (2022)

  35. [35]

    Bachtis, G

    D. Bachtis, G. Biroli, A. Decelle, and B. Seoane, Cascade of phase transitions in the training of energy-based mod- els, Advances in neural information processing systems 37, 55591 (2024)