arxiv: 2601.01253 · v2 · submitted 2026-01-03 · ❄️ cond-mat.stat-mech

Recognition: no theorem link

Stochastic Thermodynamics of Associative Memory

Spencer Rooke , Dmitry Krotov , Vijay Balasubramanian , David Wolpert

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:30 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech

keywords dense associative memorystochastic thermodynamicsdynamical mean field theoryentropy productionmemory retrievalout-of-equilibrium dynamicsHopfield networkscorrupted inputs

0 comments

The pith

Dense associative memory networks incur thermodynamic entropy production that trades off retrieval accuracy against operating speed when driven by corrupted inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper brings stochastic thermodynamics to dense associative memory models, which generalize Hopfield networks and link to transformers and diffusion models through shared energy-landscape dynamics. It applies dynamical mean-field theory to polynomial versions at large size and intermediate load, tracking how the system responds when fed noisy or corrupted memories instead of clean ones. This yields explicit expressions for the work required to complete retrieval, the time needed for state transitions, and the rate of entropy production generated along the way. The analysis also uncovers a failure mode in higher-order networks that appears only at finite temperature. A central result is the existence of quantitative tradeoffs: achieving higher accuracy or faster operation requires higher entropy production.

Core claim

By defining thermodynamic entropy production for the stochastic dynamics of polynomial DenseAMs, the work shows that, in the mean-field limit at large system sizes and intermediate or low memory loads, driving the network with corrupted inputs produces measurable work costs and finite transition times whose values depend on the degree of corruption. The same framework identifies a temperature-dependent failure mode absent from the zero-temperature limit and supplies a practical method for computing average power in the mean-field regime.

What carries the argument

Dynamical mean-field theory applied to the stochastic Langevin dynamics of polynomial dense associative memories, used to track entropy production and work when the system is driven by corrupted memory patterns.

If this is right

Work and power requirements for memory retrieval can be computed explicitly in the mean-field limit for any degree of input corruption.
Higher-order polynomial networks exhibit an out-of-equilibrium failure mode that is invisible in the zero-temperature deterministic limit.
Entropy production increases when the network is required to achieve higher retrieval accuracy or to operate faster.
Memory transition times grow with the severity of input corruption and can be predicted from the mean-field equations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mean-field machinery could be used to compare the thermodynamic efficiency of different polynomial orders for a given accuracy target.
Hardware implementations of associative memory might exploit the identified tradeoffs to minimize energy per retrieval at acceptable error rates.
The failure mode at finite temperature suggests that noise can destabilize retrieval in ways that deterministic analyses miss, which may matter for biological or analog implementations.

Load-bearing premise

Dynamical mean-field theory remains quantitatively accurate for polynomial dense associative memories at intermediate loads even when the driving inputs are corrupted and the system is far from equilibrium.

What would settle it

A direct stochastic simulation of a polynomial DenseAM with several thousand neurons, intermediate load, and systematically corrupted inputs that produces entropy-production or transition-time values differing by more than a few percent from the mean-field formulas would falsify the claimed characterization.

Figures

Figures reproduced from arXiv: 2601.01253 by David Wolpert, Dmitry Krotov, Spencer Rooke, Vijay Balasubramanian.

**Figure 2.** Figure 2: FIG. 2. The free energy landscape of single memory polynomial DenseAM network as a function of memory alignment for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. The change in free energy density in the [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Numerical Demonstration of mean field theory for [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Recovery performance and work cost for DenseAM networks. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Dense Associative Memory networks (DenseAMs) unify several popular paradigms in Artificial Intelligence (AI), such as Hopfield Networks, transformers, and diffusion models, while casting their computational properties into the language of dynamical systems and energy landscapes. This formulation provides a natural setting for studying thermodynamics and computation in neural systems, because DenseAMs are simultaneously simple enough to admit analytic treatment and rich enough to implement nontrivial computational function. Aspects of these networks have been studied at equilibrium and at zero temperature, but the thermodynamic costs associated with their operation out of equilibrium are largely unexplored. Here, we define the thermodynamic entropy production associated with the operation of such networks, and study polynomial DenseAMs at intermediate memory load. At large system sizes and intermediate and low load, we use dynamical mean field theory to characterize out-of-equilibrium properties, work requirements, and memory transition times when driving the system with corrupted memories. We characterize a failure mode of higher order networks not observed at zero temperature. Further, we develop a method for calculating work and power costs in the mean field limit. Finally, we find tradeoffs between entropy production, memory retrieval accuracy, and operation speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They define entropy production for out-of-equilibrium DenseAMs and use DMFT to extract tradeoffs in work, accuracy, and speed plus a new finite-temperature failure mode, but the lack of shown derivations leaves the quantitative claims thin.

read the letter

The main takeaway is that this work brings stochastic thermodynamics to polynomial Dense Associative Memories at finite temperature. They define entropy production for the driven case, apply dynamical mean-field theory at large N and intermediate loads, and report tradeoffs between entropy production, retrieval accuracy, and transition times when the system is fed corrupted memories. They also note a higher-order network failure mode that does not appear in zero-temperature analyses. That combination is the actual novelty here, not a routine extension of equilibrium results. The framing that unifies Hopfield networks, transformers, and diffusion models under one energy-landscape picture is useful for anyone thinking about physical costs of computation. The mean-field treatment looks independent of the thermodynamic quantities they target, which avoids the most obvious circularity. The stress-test concern about DMFT closure under strong driving is worth taking seriously. Standard DMFT assumptions can break when persistent higher-order correlations appear, and the abstract gives no explicit equations, error estimates, or simulation checks to confirm the closure still holds. Without those, the reported tradeoffs remain plausible but unverified. This paper is for people working on non-equilibrium thermodynamics of neural systems or energy accounting in large-scale models. A reader who already knows DMFT and wants a concrete starting point for entropy-production calculations will find usable ideas. It deserves a serious referee to check the derivations and any numerical support that may be in the full text. I would send it to review rather than desk-reject, with a clear request for the mean-field equations and validation against direct simulations.

Referee Report

2 major / 2 minor

Summary. The manuscript applies stochastic thermodynamics to Dense Associative Memory (DenseAM) networks, which unify Hopfield networks, transformers, and diffusion models. At large system sizes and intermediate-to-low memory loads, dynamical mean-field theory (DMFT) is used to characterize out-of-equilibrium entropy production, work and power costs, memory transition times, and retrieval accuracy when the system is driven by corrupted memories. A failure mode specific to higher-order networks is identified, and tradeoffs among entropy production, accuracy, and speed are reported.

Significance. If the DMFT results hold, the work supplies a concrete thermodynamic accounting of computational costs in a broad class of associative-memory models, including explicit expressions for work in the mean-field limit and quantitative tradeoffs that could guide energy-efficient implementations. The analytic treatment of non-equilibrium driving and the identification of a higher-order failure mode not visible at zero temperature constitute the main advances.

major comments (2)

[DMFT section (out-of-equilibrium characterization)] The central quantitative claims on entropy-production–accuracy–speed tradeoffs rest on the applicability of DMFT to polynomial DenseAMs driven out of equilibrium by corrupted inputs at intermediate loads. No explicit closure check, fluctuation analysis, or comparison to finite-N simulations is provided to confirm that higher-order correlations remain negligible under strong driving; this is load-bearing for the reported tradeoffs.
[Work and power costs subsection] The method for calculating work and power costs in the mean-field limit is introduced without a derivation that starts from the microscopic stochastic dynamics and arrives at the mean-field expression; the connection between the entropy-production definition and the work functional therefore remains formal rather than explicit.

minor comments (2)

[Abstract] The abstract states that DMFT is applied “at large system sizes and intermediate and low load,” yet the precise range of loads for which the closure is claimed to remain valid is not quantified in the text.
[Results] Notation for the polynomial order of the DenseAM and for the corruption level of the driving patterns should be introduced once and used consistently; several symbols appear without prior definition in the results paragraphs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the presentation of our results. We address each major point below and will revise the manuscript to strengthen the validation and derivations as suggested.

read point-by-point responses

Referee: The central quantitative claims on entropy-production–accuracy–speed tradeoffs rest on the applicability of DMFT to polynomial DenseAMs driven out of equilibrium by corrupted inputs at intermediate loads. No explicit closure check, fluctuation analysis, or comparison to finite-N simulations is provided to confirm that higher-order correlations remain negligible under strong driving; this is load-bearing for the reported tradeoffs.

Authors: We agree that explicit checks on the DMFT closure under out-of-equilibrium driving would strengthen the central claims. Although the standard DMFT framework for associative memory models assumes higher-order correlations vanish at large N and moderate loads, we will add in the revised manuscript direct comparisons between DMFT predictions and finite-N Monte Carlo simulations for representative loads and driving strengths. These will include checks on fluctuation spectra and residual higher-order correlations to confirm the regime of validity for the reported tradeoffs. revision: yes
Referee: The method for calculating work and power costs in the mean-field limit is introduced without a derivation that starts from the microscopic stochastic dynamics and arrives at the mean-field expression; the connection between the entropy-production definition and the work functional therefore remains formal rather than explicit.

Authors: We acknowledge that the derivation of the mean-field work and power expressions was not expanded from the microscopic level in the original text. In the revised manuscript we will provide a detailed derivation (in the main text or a dedicated appendix) that begins from the underlying stochastic differential equations or master equation for the network dynamics, explicitly obtains the mean-field work functional, and demonstrates its direct connection to the entropy-production rate defined in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper applies standard dynamical mean-field theory to derive out-of-equilibrium thermodynamic quantities (entropy production, work costs, transition times) for polynomial DenseAMs under corrupted-memory driving. No load-bearing step reduces a claimed prediction to a fitted parameter, self-citation chain, or definitional tautology; the DMFT closure is invoked as an external technique whose validity is stated as an assumption rather than derived from the target observables. Thermodynamic relations are obtained within the mean-field limit without renaming known empirical patterns or smuggling ansatzes via prior self-citations. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the applicability of dynamical mean-field theory to finite-temperature polynomial networks and on the standard definitions of stochastic thermodynamic entropy production.

axioms (1)

domain assumption Dynamical mean-field theory accurately captures the large-N limit of polynomial DenseAMs driven by corrupted memories
Invoked to characterize work requirements and transition times at large system sizes.

pith-pipeline@v0.9.0 · 5503 in / 1142 out tokens · 28520 ms · 2026-05-16T17:30:32.243858+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stochastic Thermodynamics for Autoregressive Generative Models: A Non-Markovian Perspective
cond-mat.stat-mech 2026-04 unverdicted novelty 7.0

A stochastic thermodynamics framework quantifies entropy production in non-Markovian autoregressive generative models, with efficient estimation from trajectories and exact decomposition into non-negative retrospectiv...
Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory
cond-mat.dis-nn 2026-04 unverdicted novelty 6.0

Geometric entropy on the N-sphere sets retrieval phase boundaries in continuous thermal dense associative memories, achieving maximum capacity α=0.5 at zero temperature with kernel-dependent critical lines separating ...

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · cited by 2 Pith papers · 2 internal anchors

[1]

For the simple relaxation described in this subsection, the work is zero, and the only relevant thermodynamics are due to changes in free energy

Relaxation Thermodynamics As previously mentioned, over a finite time interval [t0, tf], the (irreversible) entropy produced is given by the work performed by the network and the change in free energy over that time interval: ∆Stot =β(W t0→tf −∆F) (35) We are interested in entropy and work densities, ∆s tot = 1 N ∆Stot andw= 1 N Was we takeN→ ∞. For the s...

work page
[2]

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines.Cogn. Sci., 9:147–169,

work page
[3]

URLhttps://api.semanticscholar.org/CorpusID:12174018. 16

work page
[4]

Ambrogioni

L. Ambrogioni. In search of dispersed memories: Generative diffusion models are associative memory networks.Entropy, 26(5):381, 2024

work page 2024
[5]

D. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.Physical review A, Atomic, molecular, and optical physics, 32, 09 1985. doi:10.1103/PhysRevA.32.1007

work page doi:10.1103/physreva.32.1007 1985
[6]

D. Amit, H. Gutfreund, and H. Sompolinsky. Storing infinite numbers of patterns in a spin-glass model of neural networks. Physical Review Letters, 55:0–3, 10 1985. doi:10.1103/PhysRevLett.55.1530

work page doi:10.1103/physrevlett.55.1530 1985
[7]

Attwell and S

D. Attwell and S. B. Laughlin. An energy budget for signaling in the grey matter of the brain.Journal of Cerebral Blood Flow & Metabolism, 21(10):1133–1145, 2001

work page 2001
[8]

Balasubramanian

V. Balasubramanian. Heterogeneity and efficiency in the brain.Proceedings of the IEEE, 103(8):1346–1358, 2015

work page 2015
[9]

Balasubramanian

V. Balasubramanian. Brain power.Proceedings of the National Academy of Sciences, 118(32):e2107022118, 2021. doi: 10.1073/pnas.2107022118. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.2107022118

work page doi:10.1073/pnas.2107022118 2021
[10]

Balasubramanian and M

V. Balasubramanian and M. J. Berry II. A test of metabolically efficient coding in the retina.Network: Computation in Neural Systems, 13(4):531, 2002

work page 2002
[11]

Balasubramanian, D

V. Balasubramanian, D. Kimber, and M. J. Berry II. Metabolically efficient information processing.Neural computation, 13(4):799–815, 2001

work page 2001
[12]

A. C. Barato and U. Seifert. Thermodynamic uncertainty relation for biomolecular processes.Physical Review Letters, 114 (15), Apr. 2015. ISSN 1079-7114. doi:10.1103/physrevlett.114.158101. URLhttp://dx.doi.org/10.1103/PhysRevLett. 114.158101

work page doi:10.1103/physrevlett.114.158101 2015
[13]

Bender and S

C. Bender and S. Orszag.Advanced Mathematical Methods for Scientists and Engineers: Asymptotic Methods and Pertur- bation Theory, volume 1. 01 1999. ISBN 978-1-4419-3187-0. doi:10.1007/978-1-4757-3069-2

work page doi:10.1007/978-1-4757-3069-2 1999
[14]

Julia:AFreshApproachtoNumerical Computing

J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing.SIAM Review, 59(1):65–98, 2017. doi:10.1137/141000671. URLhttps://epubs.siam.org/doi/10.1137/141000671

work page doi:10.1137/141000671 2017
[15]

Bleistein and R

N. Bleistein and R. Handelsman.Asymptotic Expansions of Integrals. Dover Books on Mathematics Series. Dover Publi- cations, 1986. ISBN 9780486650821. URLhttps://books.google.com/books?id=3GZf-bCLFxcC

work page 1986
[16]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[17]

Chaudhry, J

H. Chaudhry, J. Zavatone-Veth, D. Krotov, and C. Pehlevan. Long sequence hopfield memory.Advances in Neural Information Processing Systems, 36:54300–54340, 2023

work page 2023
[18]

A. Coolen. Chapter 15 statistical mechanics of recurrent neural networks ii — dynamics. In F. Moss and S. Gielen, editors,Neuro-Informatics and Neural Modelling, volume 4 ofHandbook of Biological Physics, pages 619–684. North- Holland, 2001. doi:https://doi.org/10.1016/S1383-8121(01)80018-X. URLhttps://www.sciencedirect.com/science/ article/pii/S138381210180018X

work page doi:10.1016/s1383-8121(01)80018-x 2001
[19]

Makie.jl: Flexible high-performance data visualization for Julia

S. Danisch and J. Krumbiegel. Makie.jl: Flexible high-performance data visualization for Julia.Journal of Open Source Software, 6(65):3349, 2021. doi:10.21105/joss.03349. URLhttps://doi.org/10.21105/joss.03349

work page doi:10.21105/joss.03349 2021
[20]

Demircigil, J

M. Demircigil, J. Heusel, M. L¨ owe, S. Upgang, and F. Vermet. On a model of associative memory with huge storage capacity.Journal of Statistical Physics, 168(2):288–299, 2017

work page 2017
[21]

Esposito and C

M. Esposito and C. Van den Broeck. Three faces of the second law. i. master equation formulation.Physical Review E, 82(1), July 2010. ISSN 1550-2376. doi:10.1103/physreve.82.011143. URLhttp://dx.doi.org/10.1103/PhysRevE.82.011143

work page doi:10.1103/physreve.82.011143 2010
[22]

E. Gardner. Multiconnected neural network models.Journal of Physics A: Mathematical and General, 20(11):3453, aug

work page
[23]

URLhttps://dx.doi.org/10.1088/0305-4470/20/11/046

doi:10.1088/0305-4470/20/11/046. URLhttps://dx.doi.org/10.1088/0305-4470/20/11/046

work page doi:10.1088/0305-4470/20/11/046
[24]

E. Gardner. The space of interactions in neural network models.Journal of Physics A: Mathematical and General, 21(1): 257, jan 1988. doi:10.1088/0305-4470/21/1/030. URLhttps://doi.org/10.1088/0305-4470/21/1/030

work page doi:10.1088/0305-4470/21/1/030 1988
[25]

R. J. Glauber. Time-dependent statistics of the ising model.Journal of Mathematical Physics, 4(2):294–307, 02 1963. ISSN 0022-2488. doi:10.1063/1.1703954. URLhttps://doi.org/10.1063/1.1703954

work page doi:10.1063/1.1703954 1963
[26]

Goldt and U

S. Goldt and U. Seifert. Stochastic thermodynamics of learning.Physical Review Letters, 118(1), Jan. 2017. ISSN 1079-7114. doi:10.1103/physrevlett.118.010601. URLhttp://dx.doi.org/10.1103/PhysRevLett.118.010601

work page doi:10.1103/physrevlett.118.010601 2017
[27]

Herron, P

L. Herron, P. Sartori, and B. Xue. Robust retrieval of dynamic sequences through interaction modulation.PRX Life, 1 (2):023012, 2023

work page 2023
[28]

Hoover, Y

B. Hoover, Y. Liang, B. Pham, R. Panda, H. Strobelt, D. H. Chau, M. Zaki, and D. Krotov. Energy transformer.Advances in neural information processing systems, 36:27532–27559, 2023

work page 2023
[29]

Hoover, H

B. Hoover, H. Strobelt, D. Krotov, J. Hoffman, Z. Kira, and D. H. Chau. Memory in plain sight: Surveying the uncanny resemblances of associative memories and diffusion models.arXiv preprint arXiv:2309.16750, 2023

work page arXiv 2023
[30]

J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 79(8):2554–2558, 1982. doi:10.1073/pnas.79.8.2554. URLhttps://www.pnas.org/doi/ abs/10.1073/pnas.79.8.2554

work page doi:10.1073/pnas.79.8.2554 1982
[31]

Jumper, R

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. ˇZ´ ıdek, A. Potapenko, A. Bridgland, C. Meyer, S. A. A. Kohl, A. J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstei...

work page doi:10.1038/s41586-021-03819- 2021
[32]

URLhttps://doi.org/10.1038/s41586-021-03819-2

work page doi:10.1038/s41586-021-03819-2
[33]

Karuvally, T

A. Karuvally, T. Sejnowski, and H. T. Siegelmann. General sequential episodic memory model. InInternational Conference 17 on Machine Learning, pages 15900–15910. PMLR, 2023

work page 2023
[34]

Kozachkov, J.-J

L. Kozachkov, J.-J. Slotine, and D. Krotov. Neuron–astrocyte associative memory.Proceedings of the National Academy of Sciences, 122(21):e2417788122, 2025

work page 2025
[35]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URLhttps://proceedings.neurips.cc/paper_files/paper/2012/file/ c399862d3b9d6b76c8436e924...

work page 2012
[36]

D. Krotov. Hierarchical associative memory.arXiv preprint arXiv:2107.06446, 2021

work page arXiv 2021
[37]

Krotov and J

D. Krotov and J. J. Hopfield. Dense associative memory for pattern recognition. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 1180–1188, Red Hook, NY, USA, 2016. Curran Associates Inc. ISBN 9781510838819

work page 2016
[38]

Krotov and J

D. Krotov and J. J. Hopfield. Large associative memory problem in neurobiology and machine learning. InInternational Conference on Learning Representations, 2021

work page 2021
[39]

Krotov, B

D. Krotov, B. Hoover, P. Ram, and B. Pham. Modern methods in associative memory.arXiv preprint arXiv:2507.06211, 2025

work page arXiv 2025
[40]

W. B. Levy and R. A. Baxter. Energy efficient neural codes.Neural computation, 8(3):531–543, 1996

work page 1996
[41]

W. B. Levy and V. G. Calvert. Communication consumes 35 times more energy than computation in the human cortex, but both costs are needed to predict synapse number.Proceedings of the National Academy of Sciences, 118(18):e2008173118, 2021

work page 2021
[42]

W. Little. The existence of persistent states in the brain.Mathematical Biosciences, 19(1):101–120, 1974. ISSN 0025-

work page 1974
[43]

URLhttps://www.sciencedirect.com/science/article/pii/ 0025556474900315

doi:https://doi.org/10.1016/0025-5564(74)90031-5. URLhttps://www.sciencedirect.com/science/article/pii/ 0025556474900315

work page doi:10.1016/0025-5564(74)90031-5
[44]

J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa. Thermodynamics of information.Nature Physics, 11(2):131–139, Feb

work page
[45]

doi:10.1038/nphys3230

ISSN 1745-2481. doi:10.1038/nphys3230. URLhttps://doi.org/10.1038/nphys3230

work page doi:10.1038/nphys3230
[46]

Carbon Emissions and Large Neural Network Training

D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean. Carbon emissions and large neural network training.arXiv preprint arXiv:2104.10350, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[47]

J. A. Perge, J. E. Niven, E. Mugnaini, V. Balasubramanian, and P. Sterling. Why do axons differ in caliber?Journal of Neuroscience, 32(2):626–638, 2012

work page 2012
[48]

B. Pham, G. Raya, M. Negri, M. J. Zaki, L. Ambrogioni, and D. Krotov. Memorization to generalization: Emergence of diffusion models from associative memory.arXiv preprint arXiv:2505.21777, 2025

work page arXiv 2025
[49]

Rackauckas and Q

C. Rackauckas and Q. Nie. DifferentialEquations.jl–a performant and feature-rich ecosystem for solving differential equa- tions in Julia.Journal of Open Research Software, 5(1), 2017

work page 2017
[50]

Ramsauer, B

H. Ramsauer, B. Sch¨ afl, J. Lehner, P. Seidl, M. Widrich, L. Gruber, M. Holzleitner, T. Adler, D. Kreil, M. K. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter. Hopfield networks is all you need. InInternational Conference on Learning Representations, 2021. URLhttps://openreview.net/forum?id=tL89RnzIiCd

work page 2021
[51]

Rieger, M

H. Rieger, M. Schreckenberg, and J. Zittartz. Glauber dynamics of neural network models.Journal of Physics A: Mathematical and General, 21:L263, 01 1999. doi:10.1088/0305-4470/21/4/014

work page doi:10.1088/0305-4470/21/4/014 1999
[52]

Roudi and J

Y. Roudi and J. Hertz. Dynamical tap equations for non-equilibrium ising spin glasses.Journal of Statistical Mechanics: Theory and Experiment, 2011(03):P03031, mar 2011. doi:10.1088/1742-5468/2011/03/P03031. URLhttps://doi.org/ 10.1088/1742-5468/2011/03/P03031

work page doi:10.1088/1742-5468/2011/03/p03031 2011
[53]

U. Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines.Reports on Progress in Physics, 75(12):126001, Nov. 2012. ISSN 1361-6633. doi:10.1088/0034-4885/75/12/126001. URLhttp://dx.doi.org/10.1088/ 0034-4885/75/12/126001

work page doi:10.1088/0034-4885/75/12/126001 2012
[54]

Suzuki and R

M. Suzuki and R. Kubo. Dynamics of the ising model near the critical point. i.Journal of the Physical Society of Japan, 24(1):51–60, 1968. doi:10.1143/JPSJ.24.51. URLhttps://doi.org/10.1143/JPSJ.24.51

work page doi:10.1143/jpsj.24.51 1968
[55]

Th´ eriault and D

R. Th´ eriault and D. Tantari. Dense hopfield networks in the teacher-student setting.SciPost Physics, 17(2), Aug. 2024. ISSN 2542-4653. doi:10.21468/scipostphys.17.2.040. URLhttp://dx.doi.org/10.21468/SciPostPhys.17.2.040

work page doi:10.21468/scipostphys.17.2.040 2024
[56]

C. E. Tripp, J. Perr-Sauer, J. Gafur, A. Nag, A. Purkayastha, S. Zisman, and E. A. Bensen. Measuring the energy consumption and efficiency of deep neural networks: An empirical analysis and design recommendations.arXiv preprint arXiv:2403.08151, 2024

work page arXiv 2024
[57]

Van den Broeck and M

C. Van den Broeck and M. Esposito. Ensemble and trajectory thermodynamics: A brief introduction.Physica A: Statistical Mechanics and its Applications, 418:6–16, Jan. 2015. ISSN 0378-4371. doi:10.1016/j.physa.2014.04.035. URLhttp: //dx.doi.org/10.1016/j.physa.2014.04.035

work page doi:10.1016/j.physa.2014.04.035 2015
[58]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964

work page 2017
[59]

D. H. Wolpert. The stochastic thermodynamics of computation.Journal of Physics A: Mathematical and Theoretical, 52 (19):193001, Apr. 2019. ISSN 1751-8121. doi:10.1088/1751-8121/ab0850. URLhttp://dx.doi.org/10.1088/1751-8121/ ab0850

work page doi:10.1088/1751-8121/ab0850 2019
[60]

D. H. Wolpert, J. Korbel, C. W. Lynn, F. Tasnim, J. A. Grochow, G. Karde¸ s, J. B. Aimone, V. Balasubramanian, E. De Giuli, D. Doty, et al. Is stochastic thermodynamics the key to understanding the energy costs of computation? Proceedings of the National Academy of Sciences, 121(45):e2321112121, 2024. 18 Appendix A: MFT for Dense Associative Nets With no ...

work page 2024