Entropy production bounds for systems running computer programs
Pith reviewed 2026-05-23 17:03 UTC · model grok-4.3
The pith
Mismatch cost provides a lower bound on entropy production that scales at least linearly with heat flow and increases when time intervals are subdivided.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mismatch cost is a universally applicable lower bound on the entropy production of any fixed physical process across a given time interval. For physical systems that implement modern digital computers, this bound equals the minimal entropy production associated with running a given program and can be computed via a general framework that maps program steps to thermodynamic costs. The framework is applied to show that bubble sort and bucket sort have different mismatch costs that depend on input size and structure, and it extends to programs calling subroutines.
What carries the argument
Mismatch cost (MMC), a universally applicable lower bound on entropy production for any fixed physical process across a given time interval.
If this is right
- Mismatch cost scales at least linearly with the total heat flow in the worst case over initial distributions.
- The mismatch cost lower bound over a given time interval never decreases if the interval is subdivided into sub-intervals, and often increases.
- The minimal entropy production for running a program depends on input size and structure such as the presence of repeated entries.
- Different algorithms incur different mismatch costs, as shown by direct comparison of bubble sort and bucket sort.
- The framework extends directly to programs that call subroutines.
Where Pith is reading between the lines
- Finer time resolution in the analysis of program execution would generally produce stricter lower bounds on thermodynamic cost.
- The framework could be used to rank algorithms by their minimal energy requirements for large inputs without running them on hardware.
- If the mapping from digital programs to physical systems holds, the same approach might quantify costs for programs that include conditional branches or loops.
Load-bearing premise
Any physical system that implements a modern digital computer can be mapped to the general framework for computing minimal entropy production associated with running a given program.
What would settle it
A measurement on a physical computer running bubble sort on a known input where total entropy production falls below the mismatch cost value computed by the framework for that program and input.
Figures
read the original abstract
Mismatch cost (MMC) is a universally applicable lower bound on the entropy production (EP) of any fixed physical process across a given time interval. In the first part of the paper, we establish results concerning MMC to prove that it scales at least linearly with the total heat flow in the worst case over initial distributions. We also prove that the MMC lower bound over a given time interval never decreases if the time interval is subdivided into a sequence of sub-intervals, and that the bound often increases. In the second part of the paper, we introduce a general framework for computing the minimal EP (i.e., the MMC) associated with running a computer program on any physical system that implements a modern digital computer. We apply this general framework to compare MMC of running two canonical sorting algorithms, bubble sort and bucket sort. The framework enables us to investigate how thermodynamic cost depends on features like input size and structure (e.g., with or without repeated entries). Finally, we extend the framework to programs that call subroutines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that mismatch cost (MMC) is a lower bound on entropy production that scales at least linearly with total heat flow in the worst case over initial distributions, and that the MMC bound over an interval is non-decreasing (and often strictly increasing) under subdivision into sub-intervals. It then introduces a general framework for computing MMC when running a computer program on any physical system implementing a modern digital computer, applies the framework to compare the MMC of bubble sort versus bucket sort (including dependence on input size and repeated entries), and extends the framework to programs that call subroutines.
Significance. The MMC scaling and subdivision results, if rigorously derived, are general mathematical statements about a lower bound on entropy production that could apply across non-equilibrium thermodynamics. The computational framework, if the required mapping from arbitrary physical dynamics to the MMC stochastic process is established, would enable algorithm-specific thermodynamic bounds that depend on program structure rather than implementation details, offering a route to compare the minimal EP of different algorithms such as sorting routines.
major comments (2)
- [Second part (framework introduction and sorting applications)] The central claim of the second part—that MMC can be computed for running programs like bubble sort versus bucket sort on any physical implementation—rests on the existence of a general mapping from arbitrary physical dynamics (including those with continuous degrees of freedom or non-Markovian effects) to the MMC framework that preserves initial distributions, heat flows, and the stochastic process over computational states. The abstract asserts this mapping without indicating an explicit construction or proof of validity and tightness, which is load-bearing for all applications to concrete algorithms.
- [First part (MMC properties) and its linkage to second part] The MMC scaling result (linear in worst-case heat flow) and the subdivision monotonicity result are presented as independent of the computational mapping, but their utility for the paper's overall conclusions on program thermodynamics requires that the mapping preserve the quantities appearing in those bounds; no verification is indicated that the mapping does so for the sorting examples.
minor comments (1)
- [Abstract] The abstract refers to 'the MMC lower bound' without a forward reference to the precise definition or equation that will be used in the framework; adding an explicit pointer would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for highlighting the need for greater clarity on the mapping from physical dynamics to the MMC framework. We address each major comment below, indicating revisions where appropriate to strengthen the manuscript.
read point-by-point responses
-
Referee: The central claim of the second part—that MMC can be computed for running programs like bubble sort versus bucket sort on any physical implementation—rests on the existence of a general mapping from arbitrary physical dynamics (including those with continuous degrees of freedom or non-Markovian effects) to the MMC framework that preserves initial distributions, heat flows, and the stochastic process over computational states. The abstract asserts this mapping without indicating an explicit construction or proof of validity and tightness, which is load-bearing for all applications to concrete algorithms.
Authors: We agree that the manuscript introduces the framework at a high level without a fully explicit construction or proof of the mapping for arbitrary physical systems (including continuous degrees of freedom or non-Markovian dynamics). The framework relies on coarse-graining any physical implementation of a digital computer to the discrete computational states while preserving the relevant thermodynamic quantities by definition of MMC. In the revision we will add an explicit section constructing the mapping, discussing its validity under the stated assumptions, and addressing tightness for the sorting applications. revision: yes
-
Referee: The MMC scaling result (linear in worst-case heat flow) and the subdivision monotonicity result are presented as independent of the computational mapping, but their utility for the paper's overall conclusions on program thermodynamics requires that the mapping preserve the quantities appearing in those bounds; no verification is indicated that the mapping does so for the sorting examples.
Authors: The MMC scaling and subdivision results are general mathematical statements derived independently of any specific mapping. In the computational framework, the mapping is constructed precisely so that initial distributions, heat flows, and the stochastic process over computational states are preserved; thus the bounds apply directly. We will add explicit verification of this preservation for the bubble-sort and bucket-sort examples in the revised manuscript to make the linkage clear. revision: partial
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper's first part establishes new mathematical results on MMC scaling linearly with heat flow and non-decreasing under subdivision; these are presented as independent proofs without reference to fitted parameters or self-referential definitions. The second part introduces a new general framework for mapping physical computer implementations to MMC calculations and applies it to sorting algorithms and subroutines. No equations, self-citations, or ansatzes are quoted that reduce claims to inputs by construction. The framework and applications constitute original content rather than renaming or self-definition. The mapping assertion is a modeling choice, not a circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mismatch cost is a universally applicable lower bound on the entropy production of any fixed physical process across a given time interval.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Mismatch cost (MMC) … MC(pX0) = D(pX0 ∥ qX0) − D(GpX0 ∥ GqX0)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RASP model … adjacency matrix G … pXi+1 = G pXi
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
The Thermodynamic Costs of Simple Linear Regression
Thermodynamic lower bounds are approximated for exact and SGD linear regression, producing energy-aware scaling laws for optimal training dataset size given a target generalization error.
Reference graph
Works this paper leans on
-
[1]
Heaviside program Consider the heaviside program introduced in Fig.3, whosestatespaceisdepictedinFig.4,generatedthroughsim- ulation of the corresponding low-level RASP code. The state of the program is given by the joint value of variablesx, c, a, pc , where x and c are the input variables,a is the con- ditional flag variable, andpc is the program counter...
-
[2]
Bubble sort program We now turn to a sorting programs for integer arrays of length n. Fig. 6 depicts the source code forBubbleSortpro- gram and the corresponding low level RASP like code. The state of the program consists of an arrayarr of length n, boolean flagswapped, loop counteri, and a temporary vari- able temp. The full state of the program at any s...
-
[3]
Stochastic thermodynamics, fluctuation theorems and molecular machines
Udo Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines. Reports on Progress in Physics , 75(12):126001, 2012
work page 2012
-
[4]
Juan MR Parrondo, Jordan M Horowitz, and Takahiro Sagawa. Thermodynamics of information. Nature Physics, 11(2):131– 139, 2015
work page 2015
-
[5]
The stochastic thermodynamics of compu- tation
David H Wolpert. The stochastic thermodynamics of compu- tation. Journal of Physics A: Mathematical and Theoretical , 52(19):193001, 2019. See arXiv:1905.05669 for updated ver- sion
-
[6]
Artemy Kolchinsky and David H Wolpert. Dependence of in- tegrated, instantaneous, and fluctuating entropy production on the initial state in quantum and classical processes.Physical Review E, 104(5):054107, 2021
work page 2021
-
[7]
On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings
Leo Szilard. On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings. Behavioral Science, 9(4):301–310, 1964
work page 1964
-
[8]
Irreversibility and heat generation in the com- puting process
Rolf Landauer. Irreversibility and heat generation in the com- puting process. IBM Journal of Research and Development , 5(3):183–191, 1961
work page 1961
-
[9]
The thermodynamics of computation—a review
Charles H Bennett. The thermodynamics of computation—a review. International Journal of Theoretical Physics , 21:905– 940, 1982
work page 1982
-
[10]
Thermodynamic and logical reversibilities revisited
Takahiro Sagawa. Thermodynamic and logical reversibilities revisited. Journal of Statistical Mechanics: Theory and Exper- iment, 2014(3):P03025, 2014
work page 2014
-
[11]
ChristianVandenBroeckandMassimilianoEsposito.Ensemble and trajectory thermodynamics: a brief introduction.Physica A, 418:6–16, 2015
work page 2015
-
[12]
Minimal entropy production rate of interact- ing systems.New Journal of Physics, 22(11):113013, 2020
David H Wolpert. Minimal entropy production rate of interact- ing systems.New Journal of Physics, 22(11):113013, 2020
work page 2020
-
[13]
Dependenceofdissi- pationontheinitialdistributionoverstates
ArtemyKolchinskyandDavidHWolpert. Dependenceofdissi- pationontheinitialdistributionoverstates. Journal of Statistical Mechanics: Theory and Experiment , 2017(8):083202, 2017
work page 2017
-
[14]
Thermodynamics of computingwithcircuits
David H Wolpert and Artemy Kolchinsky. Thermodynamics of computingwithcircuits. New Journal of Physics,22(6):063047, 2020
work page 2020
-
[15]
David H Wolpert. Uncertainty relations and fluctuation theo- rems for bayes nets.Physical Review Letters, 125(20):200602, 2020
work page 2020
-
[16]
New Journal of Physics, 25(12):123013, 2023
ThomasEOuldridgeandDavidHWolpert.Thermodynamicsof deterministicfiniteautomataoperatinglocallyandperiodically. New Journal of Physics, 25(12):123013, 2023
work page 2023
-
[17]
Andreas Dechant. Minimum entropy production, detailed bal- ance and wasserstein distance for continuous-time markov pro- cesses. Journal of Physics A: Mathematical and Theoretical , 55(9):094001, 2022
work page 2022
-
[18]
Thermodynamic uncertainty relation for biomolecular processes
Andre C Barato and Udo Seifert. Thermodynamic uncertainty relation for biomolecular processes. Physical review letters , 114(15):158101, 2015
work page 2015
-
[19]
Dissipation bounds all steady-state current fluctuations
Todd R Gingrich, Jordan M Horowitz, Nikolay Perunov, and Jeremy L England. Dissipation bounds all steady-state current fluctuations. Physical review letters, 116(12):120601, 2016
work page 2016
-
[20]
Jordan M Horowitz and Todd R Gingrich. Thermodynamic un- certainty relations constrain non-equilibrium fluctuations.Na- ture Physics, 16(1):15–20, 2020
work page 2020
-
[21]
A stochastic model of mathematicsandscience
David H Wolpert and David B Kinney. A stochastic model of mathematicsandscience. Foundations of Physics,2024. Invited Contribution
work page 2024
-
[22]
Lance Fortnow. Michael sipser. introduction to the theory of computation. pws publishing company, boston etc. 1997, xv + 396 pp. Journal of Symbolic Logic , 64(1):403–403, March 15 1999
work page 1997
-
[23]
Cambridge University Press, Cambridge, 2009
Sanjeev Arora and Boaz Barak.Computational Complexity: A Modern Approach. Cambridge University Press, Cambridge, 2009
work page 2009
-
[24]
Lecturenotesoninformation theory
YuryPolyanskiyandYihongWu. Lecturenotesoninformation theory. Lecture Notes for ECE563 (UIUC) and,6(2012-2016):7, 2014
work page 2012
-
[25]
arXiv preprint arXiv:2208.06895, 2022
ThomasEOuldridgeandDavidHWolpert.Thermodynamicsof deterministicfiniteautomataoperatinglocallyandperiodically. arXiv preprint arXiv:2208.06895, 2022
-
[26]
Speed limit for classical stochastic processes
Naoto Shiraishi, Ken Funo, and Keiji Saito. Speed limit for classical stochastic processes. Physical review letters , 121(7):070601, 2018
work page 2018
-
[27]
Physical Review E, 102(6):062132, 2020
VanTuanVo,TanVanVu,andYoshihikoHasegawa.Unifiedap- proach to classical speed limit and thermodynamic uncertainty relation. Physical Review E, 102(6):062132, 2020
work page 2020
-
[28]
Unified thermodynamic–kineticuncertaintyrelation
Tan Van Vu, Yoshihiko Hasegawa, et al. Unified thermodynamic–kineticuncertaintyrelation. Journal of Physics A: Mathematical and Theoretical, 55(40):405004, 2022
work page 2022
-
[29]
Ken Hiura and Shin-ichi Sasa. Kinetic uncertainty relation on first-passage time for accumulated current.Physical Review E, 103(5):L050103, 2021. Appendix A: Methods of estimation of prior distribution Considerasystemwithdiscretespace X andlet ∆X denotethesimplexonthestatespace. Supposethataninitialdistribution pX0 ∈ ∆X transforms to a final distributionp...
work page 2021
-
[30]
2 subject to the constrain thatP x∈X pX0(x) = 1 , we use the method of Lagrange multipliers
By using an iterative method In order to find the minimum ofC(pX0) in Eq. 2 subject to the constrain thatP x∈X pX0(x) = 1 , we use the method of Lagrange multipliers. Evaluating the partial derivative ofC(pX0) with respect topX0(x′) for anx′ ∈ X; ∂ ∂pX0(x′) C(pX0) = f(x′) − X x∈X G(x|x′) lnGpX0(x) + lnpX0(x′) (A7) The normalization constraintg(pX0) = P x ...
-
[31]
Start with a randomly initialized probability vector q ∈ ∆X
By using Monte Carlo method To minimize the cost function A2, we employ a simulated annealing algorithm that iteratively refines a candidate distribution over the simplex∆X while gradually reducing randomness to ensure convergence. Start with a randomly initialized probability vector q ∈ ∆X. Set the initial temperature T0, cooling rate α < 1, and interpol...
-
[32]
Sample a new random distributionr ∈ ∆X
-
[33]
Form a candidate distribution: q′ = (1 − s)r + sq, and normalize if necessary
-
[34]
Evaluate the cost: C(q′) = S(Gq′) − S(q′) + ⟨f ⟩q′
-
[35]
Accept q′ with probability: P = ( 1, if C(q′) < C(q), exp − C(q′)−C(q) T , otherwise. 17 Commands used in the RASP Symbol Definition LOAD Rn, value Load a constant or memory value into registerRn STORE Rn, addr Store value in registerRnto memory addressaddr READ Rn, addr Read memory value ataddrinto registerRn ADD R1, R2, R3 R3 ← R1 + R2 SUB R1, R2, R3 R3...
-
[36]
If accepted, setq ← q′ and update the best cost
-
[37]
Update the temperature:T ← αT, and reduces gradually. The algorithm terminates after a fixed number of iterations or when successive updates no longer improve the cost. The final distribution q approximates the minimizer ofC(pX0). This annealing strategy ensures global exploration early on and local refinement as the temperature cools, making it suitable ...
-
[38]
In this setting, the prior distributionqX0 is unique
Lower bound on the worst-case mismatch cost To derive a lower bound on the MMC as a function of the state-dependent termf, we focus on the case where the mapG induces a single island decomposition. In this setting, the prior distributionqX0 is unique. The prior distributionqX0 satisfies Eq. A9, derived using the method of Lagrange multipliers: ln qX0(x) +...
-
[39]
Proof that prior is always in the interior of the simplex Consideraconditionaldistribution P (y|x)thatspecifiestheprobabilityof“output” y ∈ Y given“input”x ∈ X,where X and Y are finite. Given some Z ⊆ X , the island decompositionLZ(P ) of P, and anyp ∈ ∆X, let p(c) = P x∈c p(x) indicate the total probability within islandc, and pc(x) := ( p(x) p(c) if x ∈...
-
[40]
Define ˜f : Z → R via ˜f(x) = f(x) for x ∈ Z
-
[41]
Define the conditional distribution˜P (y|x) for y ∈ Y , x ∈ Z via ˜P (y|x) = P (y|x) for ally ∈ Y , x ∈ Z. In addition, for any distributionp ∈ ∆X with supp p ⊆ Z, let ˜p be a distribution overZ defined via˜p(x) = p(x) for x ∈ Z. Now, by inspection, it can be verified that for anyp ∈ ∆X with supp p ⊆ Z, C(p) = S( ˜P ˜p) − S(˜p) + E˜p[ ˜f] =: ˜C(˜p) (B33) ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.