A Transport-Based Geometry of Belief-Cost
Pith reviewed 2026-06-30 10:33 UTC · model grok-4.3
The pith
Belief revision costs are given by the Wasserstein metric conformally reweighted by Fisher information when one nat of knowledge has uniform metric cost everywhere.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the postulate that revision cost is a scalar price on optimal transport (beliefs live in Wasserstein space) and the postulate that one nat of knowledge costs the same metric length everywhere (eikonal condition), the cost metric is the conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} of the Wasserstein metric by Fisher information, and among continuous reliefs uniform pricing holds if and only if the relief is proportional to the Fisher information. Certainty therefore sits at infinite cost-distance once the relief dominates the Fisher information, so any well-posed inference carries a cost floor that diverges at certainty.
What carries the argument
The conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} of the Wasserstein metric by a relief function U, which enforces uniform pricing of knowledge across all beliefs.
If this is right
- Certainty lies at infinite cost-distance once the relief dominates the Fisher information.
- On location-scale families the induced geometry is hyperbolic and the Gaussian is the most curved case at zero background energy.
- Any well-posed inference therefore possesses a cost floor that diverges as certainty is approached.
- Via the Landauer relation the cost floor becomes an energy floor for reaching certainty.
Where Pith is reading between the lines
- The same geometry could supply explicit lower bounds on the physical energy required for any inference task that approaches certainty.
- Empirical tests could check whether measured costs of belief revision follow the predicted pointwise scaling by Fisher information.
- Relaxing the uniform-pricing postulate would leave open other conformal factors and might connect the framework to alternative information geometries.
Load-bearing premise
One nat of knowledge costs the same metric length at every belief.
What would settle it
A concrete measurement or calculation showing finite cost to reach certainty when the relief grows faster than Fisher information would falsify the infinite-distance claim.
Figures
read the original abstract
A finite agent, a machine's digital twin or any bounded reasoner, infers a fixed and noisy world through finite sensors, so its coherent output is a belief: a probability density over states (the Bayes posterior). Such an agent stops short of certainty, and revising a belief carries a cost. We propose an axiomatic framework for transport-based belief costs, motivated by these facts. We pose two postulates. P0 (the arena): a revision cost is a scalar price on optimal transport, so beliefs live in Wasserstein space. P1 (uniform pricing): one nat of knowledge costs the same metric length everywhere, the eikonal condition. Among conceivable pricing rules we study this one. Under P0 and P1 the cost metric is optimal transport conformally reweighted by Fisher information, $\tilde g_{e,U}=2(e+U)\,g_{W_2}$, and the Fisher family is a characterization: among continuous reliefs, uniform pricing is equivalent to $U=cJ$. Two consequences follow on the conformal class. Certainty sits at infinite cost-distance once the relief dominates the Fisher information, so a well-posed inference has a cost floor diverging at certainty (necessity conjectural beyond power laws). On location-scale leaves the geometry is hyperbolic, and the Stam bound places the Gaussian as the most curved one (at $e=0$). The results are geometric, in nats. Via Landauer (one nat worth $k_BT$) the cost floor becomes an energy floor: revising toward certainty would demand unbounded energy. Physics anchors the unit and enters no theorem. Removing either postulate leaves the selection open.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an axiomatic framework for belief revision costs. Under P0 (beliefs live in Wasserstein space with revision cost as a scalar price on optimal transport) and P1 (uniform pricing via the eikonal condition that one nat costs the same metric length everywhere), the cost metric is the Wasserstein metric conformally reweighted by Fisher information: ilde g_{e,U}=2(e+U) g_{W_2}. Among continuous reliefs, uniform pricing is equivalent to U=cJ (Fisher family characterization). Consequences include infinite cost-distance to certainty when the relief dominates Fisher information (conjectural beyond power laws), hyperbolic geometry on location-scale leaves, and the Gaussian as most curved at e=0 via the Stam bound. Results are in nats; Landauer links to energy but enters no theorem.
Significance. If the derivation holds, the work supplies a parameter-free geometric model linking optimal transport, information geometry, and belief costs with no free parameters or fitted entities. Credit is due for the axiomatic construction from two postulates, the explicit characterization of the Fisher family, and the concrete geometric consequences (hyperbolic leaves, curvature bound). The framework is falsifiable in principle via the conjectural infinite-distance claim and offers a transport-based alternative to standard information measures.
major comments (2)
- [derivation of conformal metric] The central derivation of the conformal factor ilde g_{e,U}=2(e+U) g_{W_2} from P0 and P1 is presented as following directly, but the manuscript must supply the explicit steps showing how the eikonal condition uniquely selects the Fisher reweighting among pricing rules (abstract and § on derivation).
- [consequences on conformal class] The infinite cost-distance claim at certainty is qualified as conjectural beyond power laws; this qualification should be elevated to a precise statement of the domain of the result, as it is load-bearing for the cost-floor consequence (abstract, consequences paragraph).
minor comments (2)
- [abstract and introduction] Notation for g_{W_2} and the relief U should be defined at first use with a brief reminder of their relation to the Wasserstein metric and Fisher information.
- [final paragraph] The Landauer link is correctly noted as anchoring the unit without entering theorems, but a short clarifying sentence would prevent misreading as a physical derivation.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of significance, and constructive suggestions for minor revision. We address each major comment below.
read point-by-point responses
-
Referee: [derivation of conformal metric] The central derivation of the conformal factor ilde g_{e,U}=2(e+U) g_{W_2} from P0 and P1 is presented as following directly, but the manuscript must supply the explicit steps showing how the eikonal condition uniquely selects the Fisher reweighting among pricing rules (abstract and § on derivation).
Authors: We agree that the steps from the eikonal condition to the unique selection of the Fisher reweighting should be fully explicit rather than summarized. In the revised manuscript we will expand the derivation section with the complete sequence: starting from P0 (scalar pricing on W_2 transport), imposing P1 (eikonal: unit length per nat independent of location), deriving the conformal factor 2(e+U), and showing that among continuous reliefs this forces U proportional to Fisher information J. revision: yes
-
Referee: [consequences on conformal class] The infinite cost-distance claim at certainty is qualified as conjectural beyond power laws; this qualification should be elevated to a precise statement of the domain of the result, as it is load-bearing for the cost-floor consequence (abstract, consequences paragraph).
Authors: We will revise both the abstract and the consequences paragraph to state the domain precisely: the infinite cost-distance to certainty holds whenever the relief dominates the Fisher information, with necessity proven for power-law families and left as a conjecture for general continuous reliefs beyond that class. This makes the load-bearing status of the claim fully transparent. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is an axiomatic construction: it explicitly posits P0 (beliefs in Wasserstein space with revision cost as scalar price on OT) and P1 (eikonal/uniform pricing condition), then derives the conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} and the U=cJ characterization directly from those postulates. No step reduces a claimed prediction or uniqueness result to a fitted parameter, a self-citation chain, or a definition that presupposes the output. No self-citations appear in the load-bearing derivation. The geometry follows from the chosen axioms by construction, which is the intended structure rather than circularity. The infinite-distance consequence is already flagged as conjectural.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption P0: a revision cost is a scalar price on optimal transport, so beliefs live in Wasserstein space.
- domain assumption P1: one nat of knowledge costs the same metric length everywhere, the eikonal condition.
Reference graph
Works this paper leans on
-
[1]
Amari, S.-i. (2016). Information Geometry and Its Applications. Springer
2016
-
[2]
Amari, S.-i., Matsuda, T. (2024). Information geometry of Wasserstein statistics on shapes and affine deformations. Information Geometry, 7(2), 285--309
2024
-
[3]
Ambrosio, L., Gigli, N., Savaré, G. (2008). Gradient Flows in Metric Spaces and in the Space of Probability Measures, 2nd ed. Birkhäuser
2008
-
[4]
Ambrosio, L., Santambrogio, F. (2007). Necessary optimality conditions for geodesics in weighted Wasserstein spaces. Rend. Lincei Mat. Appl., 18; arXiv:math/0603435
work page internal anchor Pith review Pith/arXiv arXiv 2007
-
[5]
Arnold, V. I. (1989). Mathematical Methods of Classical Mechanics, 2nd ed. Springer, Graduate Texts in Mathematics 60
1989
-
[6]
Atkinson, C., Mitchell, A. F. S. (1981). Rao's distance measure. Sankhyā A, 43, 345--365
1981
-
[7]
Aurell, E., Mejía-Monasterio, C., Muratore-Ginanneschi, P. (2011). Optimal protocols and optimal transport in stochastic thermodynamics. Phys. Rev. Lett., 106, 250601
2011
- [8]
-
[9]
Bekenstein, J. D. (1981). Universal upper bound on the entropy-to-energy ratio for bounded systems. Phys. Rev. D, 23(2), 287--298
1981
-
[10]
Benamou, J.-D., Brenier, Y. (2000). A computational fluid mechanics solution to the Monge--Kantorovich mass transfer problem. Numer. Math., 84, 375--393
2000
-
[11]
Bennett, C. H. (1982). The thermodynamics of computation: a review. Int. J. Theor. Phys., 21, 905--940
1982
-
[12]
Bérut, A., et al. (2012). Experimental verification of Landauer's principle. Nature, 483, 187--189
2012
-
[13]
Bobkov, S., Ledoux, M. (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances. Mem. AMS 261
2019
-
[14]
Burago, D., Burago, Y., Ivanov, S. (2001). A Course in Metric Geometry. AMS, GSM 33
2001
-
[15]
Buttazzo, G., Giaquinta, M., Hildebrandt, S. (1998). One-dimensional Variational Problems. Oxford University Press
1998
-
[16]
Chentsov, N. N. (1972). Statistical Decision Rules and Optimal Inference. AMS Transl. Math. Monogr. 53
1972
-
[17]
Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.-X. (2018). Unbalanced optimal transport: dynamic and Kantorovich formulations. J. Funct. Anal., 274, 3090--3123
2018
-
[18]
Conforti, G., Pavon, M. (2018). Extremal flows on Wasserstein space. J. Math. Phys., 59(6), 063502; preprint arXiv:1712.02257; short version in Proc. GSI 2017, Springer, 92--99
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Costa, S. I. R., Santos, S. A., Strapasson, J. E. (2015). Fisher information distance: a geometrical reading. Discrete Appl. Math., 197, 59--69
2015
-
[20]
M., Thomas, J
Cover, T. M., Thomas, J. A. (2006). Elements of Information Theory, 2nd ed. Wiley
2006
-
[21]
Cox, R. T. (1946). Probability, frequency and reasonable expectation. Am. J. Phys., 14(1), 1--13
1946
- [22]
-
[23]
do Carmo, M. P. (1976). Differential Geometry of Curves and Surfaces. Prentice-Hall, Englewood Cliffs
1976
-
[24]
Dupuis, P., Ellis, R. S. (1997). A Weak Convergence Approach to the Theory of Large Deviations. Wiley
1997
-
[25]
Einstein, A. (1916). Die Grundlage der allgemeinen Relativitätstheorie. Annalen der Physik, 354(7), 769--822
1916
-
[26]
Fisher, R. A. (1925). Theory of statistical estimation. Proc.\ Cambridge Philos.\ Soc., 22, 700--725
1925
-
[27]
Friston, K. (2019). A free energy principle for a particular physics. Preprint arXiv:1906.10184
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[28]
Gigli, N. (2012). Second order analysis on ( P_2(M),W_2) . Mem. Amer. Math. Soc., 216(1018)
2012
-
[29]
R., Shortt, R
Givens, C. R., Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions. Michigan Math. J., 31, 231--240
1984
-
[30]
Grieves, M., Vickers, J. (2017). Digital twin: mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives on Complex Systems, Springer, 85--113
2017
-
[31]
Huber, P. J. (1981). Robust Statistics. Wiley
1981
-
[32]
Huszár, F. (2018). Note on the quadratic penalties in elastic weight consolidation. Proc. Natl. Acad. Sci. USA, 115, E2496--E2497
2018
-
[33]
Hoffmann, J., Borgeaud, S., Mensch, A., et al. (2022). Training compute-optimal large language models. Adv. Neural Inf. Process. Syst. (NeurIPS) 35; arXiv:2203.15556
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [34]
-
[35]
Ito, S. (2018). Stochastic thermodynamic interpretation of information geometry. Phys. Rev. Lett., 121, 030605
2018
- [36]
-
[37]
Ito, S., Sagawa, T. (2016). Information flow and entropy production on Bayesian networks. In Math. Found. Appl. Graph Entropy, 63--99; arXiv:1506.08519
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[38]
Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press
2003
-
[39]
Jordan, R., Kinderlehrer, D., Otto, F. (1998). The variational formulation of the Fokker--Planck equation. SIAM J. Math. Anal., 29, 1--17
1998
-
[40]
Jordan, P., von Neumann, J. (1935). On inner products in linear, metric spaces. Ann. of Math., 36(3), 719--723
1935
-
[41]
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Trans.\ ASME J.\ Basic Eng., 82(1), 35--45
1960
-
[42]
Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling laws for neural language models. arXiv:2001.08361
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[43]
Khinchin, A. I. (1957). Mathematical Foundations of Information Theory. Dover, New York
1957
-
[44]
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA, 114, 3521--3526
2017
-
[45]
Kleijn, B. J. K., van der Vaart, A. W. (2012). The Bernstein--von Mises theorem under misspecification. Electron. J. Stat., 6, 354--381
2012
-
[46]
Kolchinsky, A., Wolpert, D. H. (2018). Semantic information, autonomous agency and non-equilibrium statistical physics. Interface Focus, 8, 20180041
2018
-
[47]
Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM J. Res. Dev., 5, 183--191
1961
- [48]
-
[49]
Li, W., Montúfar, G. (2018). Natural gradient via optimal transport. Information Geometry, 1(2), 181--214
2018
-
[50]
Li, W., Zhao, J. (2023). Wasserstein information matrix. Information Geometry, 6(1), 203--255
2023
-
[51]
Lindley, D. V. (1991). Making Decisions, 2nd ed. Wiley, London. (Cromwell's rule.)
1991
-
[52]
Lions, P.-L. (1984). The concentration-compactness principle in the calculus of variations. Ann. IHP Anal. Non Linéaire, 1, 109--145, 223--283
1984
-
[53]
Lott, J., Villani, C. (2009). Ricci curvature for metric-measure spaces via optimal transport. Ann. of Math., 169(3), 903--991
2009
-
[54]
Sturm, K.-T. (2006). On the geometry of metric measure spaces I, II. Acta Math., 196(1), 65--131, 133--177
2006
-
[55]
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press
2003
-
[56]
McCann, R. J. (1997). A convexity principle for interacting gases. Adv. Math., 128(1), 153--179
1997
-
[57]
B., Duarte Queirós, S
Melo, P. B., Duarte Queirós, S. M., Morgado, W. A. M. (2025). Stochastic thermodynamics of Fisher information. Phys. Rev. E, 111, 014101
2025
- [58]
-
[59]
Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations, 26(1--2), 101--174
2001
-
[60]
Otto, F., Villani, C. (2000). Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal., 173, 361--400
2000
-
[61]
Parrondo, J. M. R., Horowitz, J. M., Sagawa, T. (2015). Thermodynamics of information. Nat. Phys., 11, 131--139
2015
-
[62]
Sagawa, T., Ueda, M. (2010). Generalized Jarzynski equality under nonequilibrium feedback control. Phys. Rev. Lett., 104, 090602
2010
-
[63]
Said, S., Bombrun, L., Berthoumieu, Y. (2017). Warped Riemannian metrics for location-scale models. Preprint arXiv:1707.07163
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [64]
-
[65]
Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians. Birkhäuser (§2: optimal transport in one dimension)
2015
-
[66]
Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J., 27, 379--423, 623--656
1948
-
[67]
A., Crooks, G
Sivak, D. A., Crooks, G. E. (2012). Thermodynamic metrics and optimal paths. Phys. Rev. Lett., 108, 190602
2012
-
[68]
Stam, A. J. (1959). Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control, 2, 101--112
1959
-
[69]
A., Bell, A
Still, S., Sivak, D. A., Bell, A. J., Crooks, G. E. (2012). Thermodynamics of prediction. Phys. Rev. Lett., 109, 120604
2012
-
[70]
Takatsu, A. (2011). Wasserstein geometry of Gaussian measures. Osaka J. Math., 48(4), 1005--1026
2011
-
[71]
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. (Ch. 10, Bernstein--von Mises.)
1998
-
[72]
Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der math. Wiss. 338, Springer
2009
-
[73]
von Renesse, M.-K. (2012). An optimal transport view of Schrödinger's equation. Canad. Math. Bull., 55, 858--869
2012
-
[74]
Zhang, J., Wong, T.-K. L. (2022). -Deformation: a canonical framework for statistical manifolds of constant curvature. Entropy, 24(2), 193
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.