pith. machine review for the scientific record. sign in

arxiv: 2601.01253 · v2 · submitted 2026-01-03 · ❄️ cond-mat.stat-mech

Recognition: no theorem link

Stochastic Thermodynamics of Associative Memory

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:30 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech
keywords dense associative memorystochastic thermodynamicsdynamical mean field theoryentropy productionmemory retrievalout-of-equilibrium dynamicsHopfield networkscorrupted inputs
0
0 comments X

The pith

Dense associative memory networks incur thermodynamic entropy production that trades off retrieval accuracy against operating speed when driven by corrupted inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper brings stochastic thermodynamics to dense associative memory models, which generalize Hopfield networks and link to transformers and diffusion models through shared energy-landscape dynamics. It applies dynamical mean-field theory to polynomial versions at large size and intermediate load, tracking how the system responds when fed noisy or corrupted memories instead of clean ones. This yields explicit expressions for the work required to complete retrieval, the time needed for state transitions, and the rate of entropy production generated along the way. The analysis also uncovers a failure mode in higher-order networks that appears only at finite temperature. A central result is the existence of quantitative tradeoffs: achieving higher accuracy or faster operation requires higher entropy production.

Core claim

By defining thermodynamic entropy production for the stochastic dynamics of polynomial DenseAMs, the work shows that, in the mean-field limit at large system sizes and intermediate or low memory loads, driving the network with corrupted inputs produces measurable work costs and finite transition times whose values depend on the degree of corruption. The same framework identifies a temperature-dependent failure mode absent from the zero-temperature limit and supplies a practical method for computing average power in the mean-field regime.

What carries the argument

Dynamical mean-field theory applied to the stochastic Langevin dynamics of polynomial dense associative memories, used to track entropy production and work when the system is driven by corrupted memory patterns.

If this is right

  • Work and power requirements for memory retrieval can be computed explicitly in the mean-field limit for any degree of input corruption.
  • Higher-order polynomial networks exhibit an out-of-equilibrium failure mode that is invisible in the zero-temperature deterministic limit.
  • Entropy production increases when the network is required to achieve higher retrieval accuracy or to operate faster.
  • Memory transition times grow with the severity of input corruption and can be predicted from the mean-field equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mean-field machinery could be used to compare the thermodynamic efficiency of different polynomial orders for a given accuracy target.
  • Hardware implementations of associative memory might exploit the identified tradeoffs to minimize energy per retrieval at acceptable error rates.
  • The failure mode at finite temperature suggests that noise can destabilize retrieval in ways that deterministic analyses miss, which may matter for biological or analog implementations.

Load-bearing premise

Dynamical mean-field theory remains quantitatively accurate for polynomial dense associative memories at intermediate loads even when the driving inputs are corrupted and the system is far from equilibrium.

What would settle it

A direct stochastic simulation of a polynomial DenseAM with several thousand neurons, intermediate load, and systematically corrupted inputs that produces entropy-production or transition-time values differing by more than a few percent from the mean-field formulas would falsify the claimed characterization.

Figures

Figures reproduced from arXiv: 2601.01253 by David Wolpert, Dmitry Krotov, Spencer Rooke, Vijay Balasubramanian.

Figure 1
Figure 1. Figure 1: FIG. 1. Memories ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. The free energy landscape of single memory polynomial DenseAM network as a function of memory alignment for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. The change in free energy density in the [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Numerical Demonstration of mean field theory for [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Recovery performance and work cost for DenseAM networks. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

Dense Associative Memory networks (DenseAMs) unify several popular paradigms in Artificial Intelligence (AI), such as Hopfield Networks, transformers, and diffusion models, while casting their computational properties into the language of dynamical systems and energy landscapes. This formulation provides a natural setting for studying thermodynamics and computation in neural systems, because DenseAMs are simultaneously simple enough to admit analytic treatment and rich enough to implement nontrivial computational function. Aspects of these networks have been studied at equilibrium and at zero temperature, but the thermodynamic costs associated with their operation out of equilibrium are largely unexplored. Here, we define the thermodynamic entropy production associated with the operation of such networks, and study polynomial DenseAMs at intermediate memory load. At large system sizes and intermediate and low load, we use dynamical mean field theory to characterize out-of-equilibrium properties, work requirements, and memory transition times when driving the system with corrupted memories. We characterize a failure mode of higher order networks not observed at zero temperature. Further, we develop a method for calculating work and power costs in the mean field limit. Finally, we find tradeoffs between entropy production, memory retrieval accuracy, and operation speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript applies stochastic thermodynamics to Dense Associative Memory (DenseAM) networks, which unify Hopfield networks, transformers, and diffusion models. At large system sizes and intermediate-to-low memory loads, dynamical mean-field theory (DMFT) is used to characterize out-of-equilibrium entropy production, work and power costs, memory transition times, and retrieval accuracy when the system is driven by corrupted memories. A failure mode specific to higher-order networks is identified, and tradeoffs among entropy production, accuracy, and speed are reported.

Significance. If the DMFT results hold, the work supplies a concrete thermodynamic accounting of computational costs in a broad class of associative-memory models, including explicit expressions for work in the mean-field limit and quantitative tradeoffs that could guide energy-efficient implementations. The analytic treatment of non-equilibrium driving and the identification of a higher-order failure mode not visible at zero temperature constitute the main advances.

major comments (2)
  1. [DMFT section (out-of-equilibrium characterization)] The central quantitative claims on entropy-production–accuracy–speed tradeoffs rest on the applicability of DMFT to polynomial DenseAMs driven out of equilibrium by corrupted inputs at intermediate loads. No explicit closure check, fluctuation analysis, or comparison to finite-N simulations is provided to confirm that higher-order correlations remain negligible under strong driving; this is load-bearing for the reported tradeoffs.
  2. [Work and power costs subsection] The method for calculating work and power costs in the mean-field limit is introduced without a derivation that starts from the microscopic stochastic dynamics and arrives at the mean-field expression; the connection between the entropy-production definition and the work functional therefore remains formal rather than explicit.
minor comments (2)
  1. [Abstract] The abstract states that DMFT is applied “at large system sizes and intermediate and low load,” yet the precise range of loads for which the closure is claimed to remain valid is not quantified in the text.
  2. [Results] Notation for the polynomial order of the DenseAM and for the corruption level of the driving patterns should be introduced once and used consistently; several symbols appear without prior definition in the results paragraphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the presentation of our results. We address each major point below and will revise the manuscript to strengthen the validation and derivations as suggested.

read point-by-point responses
  1. Referee: The central quantitative claims on entropy-production–accuracy–speed tradeoffs rest on the applicability of DMFT to polynomial DenseAMs driven out of equilibrium by corrupted inputs at intermediate loads. No explicit closure check, fluctuation analysis, or comparison to finite-N simulations is provided to confirm that higher-order correlations remain negligible under strong driving; this is load-bearing for the reported tradeoffs.

    Authors: We agree that explicit checks on the DMFT closure under out-of-equilibrium driving would strengthen the central claims. Although the standard DMFT framework for associative memory models assumes higher-order correlations vanish at large N and moderate loads, we will add in the revised manuscript direct comparisons between DMFT predictions and finite-N Monte Carlo simulations for representative loads and driving strengths. These will include checks on fluctuation spectra and residual higher-order correlations to confirm the regime of validity for the reported tradeoffs. revision: yes

  2. Referee: The method for calculating work and power costs in the mean-field limit is introduced without a derivation that starts from the microscopic stochastic dynamics and arrives at the mean-field expression; the connection between the entropy-production definition and the work functional therefore remains formal rather than explicit.

    Authors: We acknowledge that the derivation of the mean-field work and power expressions was not expanded from the microscopic level in the original text. In the revised manuscript we will provide a detailed derivation (in the main text or a dedicated appendix) that begins from the underlying stochastic differential equations or master equation for the network dynamics, explicitly obtains the mean-field work functional, and demonstrates its direct connection to the entropy-production rate defined in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper applies standard dynamical mean-field theory to derive out-of-equilibrium thermodynamic quantities (entropy production, work costs, transition times) for polynomial DenseAMs under corrupted-memory driving. No load-bearing step reduces a claimed prediction to a fitted parameter, self-citation chain, or definitional tautology; the DMFT closure is invoked as an external technique whose validity is stated as an assumption rather than derived from the target observables. Thermodynamic relations are obtained within the mean-field limit without renaming known empirical patterns or smuggling ansatzes via prior self-citations. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the applicability of dynamical mean-field theory to finite-temperature polynomial networks and on the standard definitions of stochastic thermodynamic entropy production.

axioms (1)
  • domain assumption Dynamical mean-field theory accurately captures the large-N limit of polynomial DenseAMs driven by corrupted memories
    Invoked to characterize work requirements and transition times at large system sizes.

pith-pipeline@v0.9.0 · 5503 in / 1142 out tokens · 28520 ms · 2026-05-16T17:30:32.243858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stochastic Thermodynamics for Autoregressive Generative Models: A Non-Markovian Perspective

    cond-mat.stat-mech 2026-04 unverdicted novelty 7.0

    A stochastic thermodynamics framework quantifies entropy production in non-Markovian autoregressive generative models, with efficient estimation from trajectories and exact decomposition into non-negative retrospectiv...

  2. Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory

    cond-mat.dis-nn 2026-04 unverdicted novelty 6.0

    Geometric entropy on the N-sphere sets retrieval phase boundaries in continuous thermal dense associative memories, achieving maximum capacity α=0.5 at zero temperature with kernel-dependent critical lines separating ...

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    For the simple relaxation described in this subsection, the work is zero, and the only relevant thermodynamics are due to changes in free energy

    Relaxation Thermodynamics As previously mentioned, over a finite time interval [t0, tf], the (irreversible) entropy produced is given by the work performed by the network and the change in free energy over that time interval: ∆Stot =β(W t0→tf −∆F) (35) We are interested in entropy and work densities, ∆s tot = 1 N ∆Stot andw= 1 N Was we takeN→ ∞. For the s...

  2. [2]

    D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines.Cogn. Sci., 9:147–169,

  3. [3]

    URLhttps://api.semanticscholar.org/CorpusID:12174018. 16

  4. [4]

    Ambrogioni

    L. Ambrogioni. In search of dispersed memories: Generative diffusion models are associative memory networks.Entropy, 26(5):381, 2024

  5. [5]

    D. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.Physical review A, Atomic, molecular, and optical physics, 32, 09 1985. doi:10.1103/PhysRevA.32.1007

  6. [6]

    D. Amit, H. Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55:0–3, 10 1985. doi:10.1103/PhysRevLett.55.1530

  7. [7]

    Attwell and S

    D. Attwell and S. B. Laughlin. An energy budget for signaling in the grey matter of the brain.Journal of Cerebral Blood Flow & Metabolism, 21(10):1133–1145, 2001

  8. [8]

    Balasubramanian

    V. Balasubramanian. Heterogeneity and efficiency in the brain.Proceedings of the IEEE, 103(8):1346–1358, 2015

  9. [9]

    Balasubramanian

    V. Balasubramanian. Brain power.Proceedings of the National Academy of Sciences, 118(32):e2107022118, 2021. doi: 10.1073/pnas.2107022118. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2107022118

  10. [10]

    Balasubramanian and M

    V. Balasubramanian and M. J. Berry II. A test of metabolically efficient coding in the retina.Network: Computation in Neural Systems, 13(4):531, 2002

  11. [11]

    Balasubramanian, D

    V. Balasubramanian, D. Kimber, and M. J. Berry II. Metabolically efficient information processing.Neural computation, 13(4):799–815, 2001

  12. [12]

    A. C. Barato and U. Seifert. Thermodynamic uncertainty relation for biomolecular processes.Physical Review Letters, 114 (15), Apr. 2015. ISSN 1079-7114. doi:10.1103/physrevlett.114.158101. URLhttp://dx.doi.org/10.1103/PhysRevLett. 114.158101

  13. [13]

    Bender and S

    C. Bender and S. Orszag.Advanced Mathematical Methods for Scientists and Engineers: Asymptotic Methods and Pertur- bation Theory, volume 1. 01 1999. ISBN 978-1-4419-3187-0. doi:10.1007/978-1-4757-3069-2

  14. [14]

    Julia:AFreshApproachtoNumerical Computing

    J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing.SIAM Review, 59(1):65–98, 2017. doi:10.1137/141000671. URLhttps://epubs.siam.org/doi/10.1137/141000671

  15. [15]

    Bleistein and R

    N. Bleistein and R. Handelsman.Asymptotic Expansions of Integrals. Dover Books on Mathematics Series. Dover Publi- cations, 1986. ISBN 9780486650821. URLhttps://books.google.com/books?id=3GZf-bCLFxcC

  16. [16]

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

  17. [17]

    Chaudhry, J

    H. Chaudhry, J. Zavatone-Veth, D. Krotov, and C. Pehlevan. Long sequence hopfield memory.Advances in Neural Information Processing Systems, 36:54300–54340, 2023

  18. [18]

    A. Coolen. Chapter 15 statistical mechanics of recurrent neural networks ii — dynamics. In F. Moss and S. Gielen, editors,Neuro-Informatics and Neural Modelling, volume 4 ofHandbook of Biological Physics, pages 619–684. North- Holland, 2001. doi:https://doi.org/10.1016/S1383-8121(01)80018-X. URLhttps://www.sciencedirect.com/science/ article/pii/S138381210180018X

  19. [19]

    Makie.jl: Flexible high-performance data visualization for Julia

    S. Danisch and J. Krumbiegel. Makie.jl: Flexible high-performance data visualization for Julia.Journal of Open Source Software, 6(65):3349, 2021. doi:10.21105/joss.03349. URLhttps://doi.org/10.21105/joss.03349

  20. [20]

    Demircigil, J

    M. Demircigil, J. Heusel, M. L¨ owe, S. Upgang, and F. Vermet. On a model of associative memory with huge storage capacity.Journal of Statistical Physics, 168(2):288–299, 2017

  21. [21]

    Esposito and C

    M. Esposito and C. Van den Broeck. Three faces of the second law. i. master equation formulation.Physical Review E, 82(1), July 2010. ISSN 1550-2376. doi:10.1103/physreve.82.011143. URLhttp://dx.doi.org/10.1103/PhysRevE.82.011143

  22. [22]

    E. Gardner. Multiconnected neural network models.Journal of Physics A: Mathematical and General, 20(11):3453, aug

  23. [23]

    URLhttps://dx.doi.org/10.1088/0305-4470/20/11/046

    doi:10.1088/0305-4470/20/11/046. URLhttps://dx.doi.org/10.1088/0305-4470/20/11/046

  24. [24]

    E. Gardner. The space of interactions in neural network models.Journal of Physics A: Mathematical and General, 21(1): 257, jan 1988. doi:10.1088/0305-4470/21/1/030. URLhttps://doi.org/10.1088/0305-4470/21/1/030

  25. [25]

    R. J. Glauber. Time-dependent statistics of the ising model.Journal of Mathematical Physics, 4(2):294–307, 02 1963. ISSN 0022-2488. doi:10.1063/1.1703954. URLhttps://doi.org/10.1063/1.1703954

  26. [26]

    Goldt and U

    S. Goldt and U. Seifert. Stochastic thermodynamics of learning.Physical Review Letters, 118(1), Jan. 2017. ISSN 1079-7114. doi:10.1103/physrevlett.118.010601. URLhttp://dx.doi.org/10.1103/PhysRevLett.118.010601

  27. [27]

    Herron, P

    L. Herron, P. Sartori, and B. Xue. Robust retrieval of dynamic sequences through interaction modulation.PRX Life, 1 (2):023012, 2023

  28. [28]

    Hoover, Y

    B. Hoover, Y. Liang, B. Pham, R. Panda, H. Strobelt, D. H. Chau, M. Zaki, and D. Krotov. Energy transformer.Advances in neural information processing systems, 36:27532–27559, 2023

  29. [29]

    Hoover, H

    B. Hoover, H. Strobelt, D. Krotov, J. Hoffman, Z. Kira, and D. H. Chau. Memory in plain sight: Surveying the uncanny resemblances of associative memories and diffusion models.arXiv preprint arXiv:2309.16750, 2023

  30. [30]

    J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982. doi:10.1073/pnas.79.8.2554. URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.79.8.2554

  31. [31]

    Jumper, R

    J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. ˇZ´ ıdek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstei...

  32. [32]

    URLhttps://doi.org/10.1038/s41586-021-03819-2

  33. [33]

    Karuvally, T

    A. Karuvally, T. Sejnowski, and H. T. Siegelmann. General sequential episodic memory model. InInternational Conference 17 on Machine Learning, pages 15900–15910. PMLR, 2023

  34. [34]

    Kozachkov, J.-J

    L. Kozachkov, J.-J. Slotine, and D. Krotov. Neuron–astrocyte associative memory.Proceedings of the National Academy of Sciences, 122(21):e2417788122, 2025

  35. [35]

    Krizhevsky, I

    A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URLhttps://proceedings.neurips.cc/paper_files/paper/2012/file/ c399862d3b9d6b76c8436e924...

  36. [36]

    D. Krotov. Hierarchical associative memory.arXiv preprint arXiv:2107.06446, 2021

  37. [37]

    Krotov and J

    D. Krotov and J. J. Hopfield. Dense associative memory for pattern recognition. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 1180–1188, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819

  38. [38]

    Krotov and J

    D. Krotov and J. J. Hopfield. Large associative memory problem in neurobiology and machine learning. InInternational Conference on Learning Representations, 2021

  39. [39]

    Krotov, B

    D. Krotov, B. Hoover, P. Ram, and B. Pham. Modern methods in associative memory.arXiv preprint arXiv:2507.06211, 2025

  40. [40]

    W. B. Levy and R. A. Baxter. Energy efficient neural codes.Neural computation, 8(3):531–543, 1996

  41. [41]

    W. B. Levy and V. G. Calvert. Communication consumes 35 times more energy than computation in the human cortex, but both costs are needed to predict synapse number.Proceedings of the National Academy of Sciences, 118(18):e2008173118, 2021

  42. [42]

    W. Little. The existence of persistent states in the brain.Mathematical Biosciences, 19(1):101–120, 1974. ISSN 0025-

  43. [43]

    URLhttps://www.sciencedirect.com/science/article/pii/ 0025556474900315

    doi:https://doi.org/10.1016/0025-5564(74)90031-5. URLhttps://www.sciencedirect.com/science/article/pii/ 0025556474900315

  44. [44]

    J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa. Thermodynamics of information.Nature Physics, 11(2):131–139, Feb

  45. [45]

    doi:10.1038/nphys3230

    ISSN 1745-2481. doi:10.1038/nphys3230. URLhttps://doi.org/10.1038/nphys3230

  46. [46]

    Carbon Emissions and Large Neural Network Training

    D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350, 2021

  47. [47]

    J. A. Perge, J. E. Niven, E. Mugnaini, V. Balasubramanian, and P. Sterling. Why do axons differ in caliber?Journal of Neuroscience, 32(2):626–638, 2012

  48. [48]

    B. Pham, G. Raya, M. Negri, M. J. Zaki, L. Ambrogioni, and D. Krotov. Memorization to generalization: Emergence of diffusion models from associative memory.arXiv preprint arXiv:2505.21777, 2025

  49. [49]

    Rackauckas and Q

    C. Rackauckas and Q. Nie. DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equa- tions in Julia.Journal of Open Research Software, 5(1), 2017

  50. [50]

    Ramsauer, B

    H. Ramsauer, B. Sch¨ afl, J. Lehner, P. Seidl, M. Widrich, L. Gruber, M. Holzleitner, T. Adler, D. Kreil, M. K. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter. Hopfield networks is all you need. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=tL89RnzIiCd

  51. [51]

    Rieger, M

    H. Rieger, M. Schreckenberg, and J. Zittartz. Glauber dynamics of neural network models.Journal of Physics A: Mathematical and General, 21:L263, 01 1999. doi:10.1088/0305-4470/21/4/014

  52. [52]

    Roudi and J

    Y. Roudi and J. Hertz. Dynamical tap equations for non-equilibrium ising spin glasses.Journal of Statistical Mechanics: Theory and Experiment, 2011(03):P03031, mar 2011. doi:10.1088/1742-5468/2011/03/P03031. URLhttps://doi.org/ 10.1088/1742-5468/2011/03/P03031

  53. [53]

    U. Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines.Reports on Progress in Physics, 75(12):126001, Nov. 2012. ISSN 1361-6633. doi:10.1088/0034-4885/75/12/126001. URLhttp://dx.doi.org/10.1088/ 0034-4885/75/12/126001

  54. [54]

    Suzuki and R

    M. Suzuki and R. Kubo. Dynamics of the ising model near the critical point. i.Journal of the Physical Society of Japan, 24(1):51–60, 1968. doi:10.1143/JPSJ.24.51. URLhttps://doi.org/10.1143/JPSJ.24.51

  55. [55]

    Th´ eriault and D

    R. Th´ eriault and D. Tantari. Dense hopfield networks in the teacher-student setting.SciPost Physics, 17(2), Aug. 2024. ISSN 2542-4653. doi:10.21468/scipostphys.17.2.040. URLhttp://dx.doi.org/10.21468/SciPostPhys.17.2.040

  56. [56]

    C. E. Tripp, J. Perr-Sauer, J. Gafur, A. Nag, A. Purkayastha, S. Zisman, and E. A. Bensen. Measuring the energy consumption and efficiency of deep neural networks: An empirical analysis and design recommendations.arXiv preprint arXiv:2403.08151, 2024

  57. [57]

    Van den Broeck and M

    C. Van den Broeck and M. Esposito. Ensemble and trajectory thermodynamics: A brief introduction.Physica A: Statistical Mechanics and its Applications, 418:6–16, Jan. 2015. ISSN 0378-4371. doi:10.1016/j.physa.2014.04.035. URLhttp: //dx.doi.org/10.1016/j.physa.2014.04.035

  58. [58]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964

  59. [59]

    D. H. Wolpert. The stochastic thermodynamics of computation.Journal of Physics A: Mathematical and Theoretical, 52 (19):193001, Apr. 2019. ISSN 1751-8121. doi:10.1088/1751-8121/ab0850. URLhttp://dx.doi.org/10.1088/1751-8121/ ab0850

  60. [60]

    D. H. Wolpert, J. Korbel, C. W. Lynn, F. Tasnim, J. A. Grochow, G. Karde¸ s, J. B. Aimone, V. Balasubramanian, E. De Giuli, D. Doty, et al. Is stochastic thermodynamics the key to understanding the energy costs of computation? Proceedings of the National Academy of Sciences, 121(45):e2321112121, 2024. 18 Appendix A: MFT for Dense Associative Nets With no ...