pith. sign in

arxiv: 2605.31063 · v1 · pith:643FVPRVnew · submitted 2026-05-29 · 📊 stat.ML · cs.LG· physics.chem-ph· physics.comp-ph

Free energy Estimation on Any State Space

Pith reviewed 2026-06-28 21:13 UTC · model grok-4.3

classification 📊 stat.ML cs.LGphysics.chem-phphysics.comp-ph
keywords free energy estimationneural transportarbitrary state spacesdiscrete spacesmultimodal distributionsDoob h-transformtime reversaldihedral group
0
0 comments X

The pith

Neural transports generalize free energy estimation to arbitrary state spaces including discrete and multimodal domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a neural transport learning method for accelerating free energy estimation, originally developed for continuous spaces, extends directly to any state space. A sympathetic reader would care because free energy calculations underpin work in physics, statistics, and machine learning, yet classical techniques remain slow in non-continuous settings. The authors introduce a generalized transport approach and test it on discrete, multimodal, and autoregressive problems. They further derive algebraic identities that connect infinitesimal time reversal with generalized Doob's h-transforms. These identities show the compositions form a generalized dihedral group.

Core claim

The central claim is that the neural transport framework for free energy estimation generalizes to arbitrary state spaces, with experiments confirming effectiveness and efficiency on discrete, multimodal, and autoregressive settings. Beyond estimation, algebraic identities are established that link infinitesimal time reversal and generalized Doob's h-transforms, with their compositions forming a generalized dihedral group.

What carries the argument

Generalized neural transport learning approach that accelerates finite-time free energy estimation, extended by algebraic identities to reveal a group-theoretic structure in which time reversal and h-transform compositions form a generalized dihedral group.

If this is right

  • The method delivers efficiency gains on discrete and multimodal spaces.
  • Performance extends to autoregressive settings without change to the core procedure.
  • Algebraic identities hold between the transport operations.
  • Compositions of the operations form a generalized dihedral group.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The group structure may let researchers generate new transport estimators by applying group operations rather than deriving them from scratch.
  • Success on non-continuous spaces suggests the same learning procedure could be applied to combinatorial problems that admit a free-energy formulation.
  • Links between time reversal and h-transforms may connect the framework to symmetry-based methods in statistical mechanics.

Load-bearing premise

Neural networks can learn effective transport maps on discrete and multimodal state spaces with the same efficiency gains seen in the continuous case.

What would settle it

An experiment on a discrete state space in which the proposed transport method shows no efficiency improvement over standard estimators, or a calculation showing that the claimed compositions of time reversal and h-transforms fail to satisfy the relations of a generalized dihedral group.

Figures

Figures reproduced from arXiv: 2605.31063 by Carles Domingo-Enrich, Francisco Vargas, Jiajun He, Jos\'e Miguel Hern\'andez-Lobato, Yingzhen Li, Yuanqi Du, Zijing Ou.

Figure 1
Figure 1. Figure 1: Pipeline of learning the transport. Theorem 3.6 (Generalized Doob’s h-transform and Marginal prescription for it). Let (Lt)t∈[0,1] be a family of time-inhomogeneous Markov generators, with adjoints (L † t )t∈[0,1]. Let ht : E → (0, ∞) be sufficiently regular. The generalized Doob’s h-transform generator is given by L h t f = h −1 t Lt(htf) − h −1 t f Ltht. (18) Let πt be a strictly positive function and de… view at source ↗
Figure 4
Figure 4. Figure 4: Comparing standard SDE and transport with momentum for free energy of 40D GMMs. hypothesize that this is because AR training provides a particularly strong learning signal, allowing the model to more easily capture dependencies across positions in the target distribution. Besides these standard settings, we further demonstrate the flexibility of our framework. In particular, we show that it is compatible w… view at source ↗
Figure 5
Figure 5. Figure 5: Generated samples from CTMC FEAT variants vs. ground truth on Ising model transport [PITH_FULL_IMAGE:figures/full_fig_p063_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Generated samples from CTMC FEAT variants vs. ground truth on Ising model transport [PITH_FULL_IMAGE:figures/full_fig_p064_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Free energy difference estimates (∆dF F , ∆dF B, ∆dF BAR) over training iterations for CTMC FEAT variants on Ising model transport (β = 0.2 ↔ 0.4) across lattice sizes. The dashed red line indicates the reference value. 64 [PITH_FULL_IMAGE:figures/full_fig_p064_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Free energy difference estimates (∆dF F , ∆dF B, ∆dF BAR) over training iterations for CTMC FEAT variants on Ising model transport (β = 0.2 ↔ 0.6) across lattice sizes. The dashed red line indicates the reference value [PITH_FULL_IMAGE:figures/full_fig_p065_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Samples generated by AR model at different [PITH_FULL_IMAGE:figures/full_fig_p065_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of 55 particles of Ising fluid (first row: [PITH_FULL_IMAGE:figures/full_fig_p066_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Average spin and pairwise distances for different values of [PITH_FULL_IMAGE:figures/full_fig_p067_11.png] view at source ↗
read the original abstract

Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript generalizes a neural transport framework for finite-time free energy estimation, previously developed for continuous spaces, to arbitrary state spaces including discrete, multimodal, and autoregressive settings. It develops a generalized neural transport learning approach, validates effectiveness via experiments, and derives algebraic identities plus a group-theoretic structure in which infinitesimal time reversal and generalized Doob h-transforms compose to form a generalized dihedral group.

Significance. If the generalization is shown to preserve efficiency gains and the group-theoretic claims are rigorously established independent of state-space topology, the work would meaningfully extend efficient free-energy methods to domains where continuous assumptions fail, such as discrete statistical models and multimodal sampling problems. The algebraic and group-theoretic results could supply reusable tools for analyzing time-reversal operations beyond the immediate estimation task.

major comments (2)
  1. [Abstract and generalized neural transport section] Abstract (generalization paragraph) and the section developing the generalized neural transport: the central claim that the same neural architectures and objectives yield comparable variance reduction on discrete and multimodal spaces rests on the unexamined assumption that the transport map remains well-defined and trainable without extra structure; no theorem or derivation is indicated showing that the Doob h-transform identities survive discretization or that optimization issues specific to disconnected modes are avoided. This assumption is load-bearing for the claim that efficiency gains extend beyond continuous settings.
  2. [Section on algebraic identities and group-theoretic structure] The group-theoretic claim (compositions form a generalized dihedral group) is stated at the same level of generality as the free-energy result; the manuscript must demonstrate that this structure is independent of the topology of the state space, as any dependence on continuity would undermine the assertion that the identities hold for arbitrary spaces.
minor comments (1)
  1. [Abstract] The abstract cites the 2025 prior work but does not delineate which components are carried over versus newly derived for the arbitrary-space case; a short explicit comparison paragraph would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the presentation of the generalization.

read point-by-point responses
  1. Referee: [Abstract and generalized neural transport section] Abstract (generalization paragraph) and the section developing the generalized neural transport: the central claim that the same neural architectures and objectives yield comparable variance reduction on discrete and multimodal spaces rests on the unexamined assumption that the transport map remains well-defined and trainable without extra structure; no theorem or derivation is indicated showing that the Doob h-transform identities survive discretization or that optimization issues specific to disconnected modes are avoided. This assumption is load-bearing for the claim that efficiency gains extend beyond continuous settings.

    Authors: We agree that an explicit derivation is needed to support the claim. The manuscript formulates the transport via pushforwards on general measurable spaces, so the h-transform identities follow from the same change-of-measure algebra used in the continuous case. In the revision we will insert a short theorem in the generalized neural transport section proving that the identities hold verbatim on arbitrary (including discrete) spaces. We will also add a brief discussion of multimodal training, referencing the autoregressive experiments that already demonstrate stable optimization without extra structure. revision: yes

  2. Referee: [Section on algebraic identities and group-theoretic structure] The group-theoretic claim (compositions form a generalized dihedral group) is stated at the same level of generality as the free-energy result; the manuscript must demonstrate that this structure is independent of the topology of the state space, as any dependence on continuity would undermine the assertion that the identities hold for arbitrary spaces.

    Authors: The group structure is obtained solely from the algebraic relations (involution of time reversal and conjugation by the h-transform) and does not invoke continuity or any topological property. In the revision we will add an explicit remark and one-line proof in the algebraic-identities section confirming that the dihedral relations hold on the group of measurable maps for any state space. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context cite prior work by overlapping authors only to establish the continuous-case baseline for neural transport learning, then present the generalization to arbitrary state spaces, new experiments on discrete/multimodal/autoregressive settings, and algebraic/group-theoretic identities as independent contributions. No equation, definition, or claim is shown to reduce to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no uniqueness theorem or ansatz is smuggled via self-citation. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5693 in / 968 out tokens · 21993 ms · 2026-06-28T21:13:08.487989+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 3 canonical work pages

  1. [1]

    Albergo, N

    M. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research, 26 0 (209): 0 1--80, 2025

  2. [2]

    M. S. Albergo and E. Vanden-Eijnden. Nets: A non-equilibrium transport sampler. arXiv preprint arXiv:2410.02711, 2024

  3. [3]

    C. H. Bennett. Efficient estimation of free energy differences from monte carlo data. Journal of Computational Physics, 22 0 (2): 0 245--268, 1976. ISSN 0021-9991. doi:https://doi.org/10.1016/0021-9991(76)90078-4. URL https://www.sciencedirect.com/science/article/pii/0021999176900784

  4. [4]

    Benton, Y

    J. Benton, Y. Shi, V. De Bortoli, G. Deligiannidis, and A. Doucet. From denoising diffusions to denoising markov models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86 0 (2): 0 286--301, 2024

  5. [5]

    Blessing, J

    D. Blessing, J. Berner, L. Richter, and G. Neumann. Underdamped diffusion bridges with applications to sampling. arXiv preprint arXiv:2503.01006, 2025

  6. [6]

    T. Chen, J. Gu, L. Dinh, E. A. Theodorou, J. Susskind, and S. Zhai. Generative modeling with phase stochastic bridges. arXiv preprint arXiv:2310.07805, 2023 a

  7. [7]

    Chen, G.-h

    T. Chen, G.-h. Liu, M. Tao, and E. A. Theodorou. Deep multi-marginal momentum schr \"o dinger bridge. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pages 57058--57086, 2023 b

  8. [8]

    Chetrite and S

    R. Chetrite and S. Gupta. Two refreshing views of fluctuation theorems through kinematics elements and exponential martingale. Journal of Statistical Physics, 143 0 (3): 0 543--584, 2011

  9. [9]

    Chetrite and H

    R. Chetrite and H. Touchette. Nonequilibrium microcanonical and canonical ensembles and their equivalence. Physical review letters, 111 0 (12): 0 120601, 2013

  10. [10]

    Chetrite and H

    R. Chetrite and H. Touchette. Nonequilibrium markov processes conditioned on large deviations. In Annales Henri Poincar \'e , volume 16, pages 2005--2057. Springer, 2015

  11. [11]

    G. E. Crooks. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Physical Review E, 60 0 (3): 0 2721, 1999

  12. [12]

    Denker, F

    A. Denker, F. Vargas, S. Padhy, K. Didi, S. Mathis, V. Dutordoir, R. Barbano, E. Mathieu, U. J. Komorowska, and P. Lio. Deft: Efficient fine-tuning of diffusion models by learning the generalised h -transform. Advances in Neural Information Processing Systems, 37: 0 19636--19682, 2024

  13. [13]

    Ding and B

    X. Ding and B. Zhang. Deepbar: A fast and exact method for binding free energy computation. The Journal of Physical Chemistry Letters, 12 0 (10): 0 2509--2515, 2021. doi:10.1021/acs.jpclett.1c00189. URL https://doi.org/10.1021/acs.jpclett.1c00189. PMID: 33719449

  14. [14]

    Dockhorn, A

    T. Dockhorn, A. Vahdat, and K. Kreis. Score-based generative modeling with critically-damped langevin diffusion. arXiv preprint arXiv:2112.07068, 2021

  15. [15]

    Doucet, W

    A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets annealed importance sampling. Advances in Neural Information Processing Systems, 35: 0 21482--21494, 2022

  16. [16]

    W. Du, H. Zhang, T. Yang, and Y. Du. A flexible diffusion model. In International Conference on Machine Learning, pages 8678--8696. PMLR, 2023

  17. [17]

    A. E. Ferdinand and M. E. Fisher. Bounded and inhomogeneous ising models. i. specific-heat anomaly of a finite lattice. Physical Review, 185 0 (2): 0 832, 1969

  18. [18]

    I. Gat, T. Remez, N. Shaul, F. Kreuk, R. T. Chen, G. Synnaeve, Y. Adi, and Y. Lipman. Discrete flow matching. Advances in Neural Information Processing Systems, 37: 0 133345--133385, 2024

  19. [19]

    D. T. Gillespie. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of chemical physics, 115 0 (4): 0 1716--1733, 2001

  20. [20]

    W. Guo, M. Tao, and Y. Chen. Complexity analysis of normalizing constant estimation: from jarzynski equality to annealed importance sampling and beyond. In International Conference on Learning Representations, 2026

  21. [21]

    A. M. Hahn and H. Then. Characteristic of bennett's acceptance ratio method. Phys. Rev. E, 80: 0 031111, Sep 2009. doi:10.1103/PhysRevE.80.031111. URL https://link.aps.org/doi/10.1103/PhysRevE.80.031111

  22. [22]

    J. He, Y. Du, F. Vargas, Y. Wang, C. P. Gomes, J. M. Hern \'a ndez-Lobato, and E. Vanden-Eijnden. Feat: Free energy estimators with adaptive transport. NeurIPS, 2025

  23. [23]

    J. Heng, V. De Bortoli, A. Doucet, and J. Thornton. Simulating diffusion bridges with score matching. Biometrika, 112 0 (4): 0 asaf048, 2021

  24. [24]

    Holderrieth, M

    P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Chen, and Y. Lipman. Generator matching: Generative modeling with arbitrary markov processes. arXiv preprint arXiv:2410.20587, 2024

  25. [25]

    Holderrieth, M

    P. Holderrieth, M. S. Albergo, and T. Jaakkola. Leaps: A discrete neural sampler via locally equivariant networks. arXiv preprint arXiv:2502.10843, 2025

  26. [26]

    Hyv \"a rinen and P

    A. Hyv \"a rinen and P. Dayan. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6 0 (4), 2005

  27. [27]

    Jarzynski

    C. Jarzynski. Nonequilibrium equality for free energy differences. Physical Review Letters, 78 0 (14): 0 2690, 1997

  28. [28]

    J. Jo, S. Lee, and S. J. Hwang. Score-based generative modeling of graphs via the system of stochastic differential equations. In International conference on machine learning, pages 10362--10383. PMLR, 2022

  29. [29]

    Leli \`e vre, M

    T. Leli \`e vre, M. Rousset, and G. Stoltz. Free Energy Computations: A Mathematical Perspective. World Scientific, 2010

  30. [30]

    Lipman, R

    Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022

  31. [31]

    A. Lou, C. Meng, and S. Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. arXiv preprint arXiv:2310.16834, 2023

  32. [32]

    M \'a t \'e and F

    B. M \'a t \'e and F. Fleuret. Learning interpolations between boltzmann densities. arXiv preprint arXiv:2301.07388, 2023

  33. [33]

    M \'a t \'e , F

    B. M \'a t \'e , F. Fleuret, and T. Bereau. Neural thermodynamic integration: Free energies from energy-based diffusion models. The Journal of Physical Chemistry Letters, 15 0 (45): 0 11395--11404, 2024 a

  34. [34]

    M \'a t \'e , F

    B. M \'a t \'e , F. Fleuret, and T. Bereau. Solvation free energies from neural thermodynamic integration. arXiv preprint arXiv:2410.15815, 2024 b

  35. [35]

    D. D. Minh and J. D. Chodera. Optimal estimators and asymptotic variances for nonequilibrium path-ensemble averages. The Journal of chemical physics, 131 0 (13), 2009

  36. [36]

    E. Nelson. Dynamical Theories of Brownian Motion. Princeton University Press, 1967. ISBN 9780691079509

  37. [37]

    Omelyan, I

    I. Omelyan, I. Mryglod, R. Folk, and W. Fenz. Ising fluids in an external magnetic field: An integral equation approach. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 69 0 (6): 0 061506, 2004

  38. [38]

    Y. Ren, G. M. Rotskoff, and L. Ying. A unified approach to analysis and design of denoising markov models. arXiv preprint arXiv:2504.01938, 2025

  39. [39]

    Rojas, Y

    K. Rojas, Y. Zhu, S. Zhu, F. X.-F. Ye, and M. Tao. Diffuse everything: Multimodal diffusion models on arbitrary state spaces. arXiv preprint arXiv:2506.07903, 2025

  40. [40]

    J. L. Rosa-Ra \' ces and D. T. Limmer. Nonadiabatic force matching for alchemical free-energy estimation. Journal of Chemical Theory and Computation, 21 0 (22): 0 11455--11462, 2025

  41. [41]

    Schebek, J

    M. Schebek, J. He, E. Hoffmann, Y. Du, F. No \'e , and J. Rogal. Assessing generative modeling approaches for free energy estimates in condensed matter. arXiv preprint arXiv:2512.23930, 2025

  42. [42]

    J. Shi, K. Han, Z. Wang, A. Doucet, and M. Titsias. Simplified and generalized masked diffusion for discrete data. Advances in neural information processing systems, 37: 0 103131--103167, 2024

  43. [43]

    M. R. Shirts and J. D. Chodera. Statistically optimal analysis of samples from multiple equilibrium states. The Journal of chemical physics, 129 0 (12), 2008

  44. [44]

    M. R. Shirts, E. Bair, G. Hooker, and V. S. Pande. Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Physical review letters, 91 0 (14): 0 140601, 2003

  45. [45]

    Singhal, M

    R. Singhal, M. Goldstein, and R. Ranganath. Where to diffuse, how to diffuse, and how to get back: Automated learning for multivariate diffusions. arXiv preprint arXiv:2302.07261, 2023

  46. [46]

    Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020

  47. [47]

    Theodoropoulos, A

    P. Theodoropoulos, A. D. Saravanos, E. A. Theodorou, and G.-H. Liu. Momentum multi-marginal schr " odinger bridge matching. arXiv preprint arXiv:2506.10168, 2025

  48. [48]

    M. E. Tuckerman. Statistical mechanics: theory and molecular simulation. Oxford university press, 2023

  49. [49]

    Vaikuntanathan and C

    S. Vaikuntanathan and C. Jarzynski. Escorted free energy simulations: Improving convergence by reducing dissipation. Physical Review Letters, 100 0 (19): 0 190601, 2008

  50. [50]

    Vaikuntanathan and C

    S. Vaikuntanathan and C. Jarzynski. Escorted free energy simulations. The Journal of chemical physics, 134 0 (5), 2011

  51. [51]

    Van den Oord, N

    A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. Advances in neural information processing systems, 29, 2016

  52. [52]

    Vargas, S

    F. Vargas, S. Padhy, D. Blessing, and N. N \"u sken. Transport meets variational inference: Controlled monte carlo diffusions. The Twelfth International Conference on Learning Representations, 2024

  53. [53]

    P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23 0 (7): 0 1661--1674, 2011

  54. [54]

    Wirnsberger, A

    P. Wirnsberger, A. J. Ballard, G. Papamakarios, S. Abercrombie, S. Racani \`e re, A. Pritzel, D. Jimenez Rezende, and C. Blundell. Targeted free energy estimation via learned mappings. The Journal of Chemical Physics, 153 0 (14), 2020

  55. [55]

    Zhang, P

    L. Zhang, P. Potaptchik, J. He, Y. Du, A. Doucet, F. Vargas, H.-D. Dau, and S. Syed. Accelerated parallel tempering via neural transports. arXiv preprint arXiv:2502.10328, 2025

  56. [56]

    Zhao and L

    L. Zhao and L. Wang. Bounding free energy difference with flow matching. Chinese Physics Letters, 40 0 (12): 0 120201, 2023

  57. [57]

    Zhong, B

    A. Zhong, B. Kuznets-Speck, and M. R. DeWeese. Time-asymmetric fluctuation theorem and efficient free-energy estimation. Physical Review E, 110 0 (3): 0 034121, 2024