pith. sign in

arxiv: 2606.21585 · v4 · pith:ABNFNEYYnew · submitted 2026-06-19 · 💻 cs.LG · cs.IT· math.DG· math.IT· math.ST· stat.TH

A Transport-Based Geometry of Belief-Cost

Pith reviewed 2026-06-30 10:33 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.DGmath.ITmath.STstat.TH
keywords belief revision costoptimal transportWasserstein spaceFisher informationconformal metricuniform pricingeikonal condition
0
0 comments X

The pith

Belief revision costs are given by the Wasserstein metric conformally reweighted by Fisher information when one nat of knowledge has uniform metric cost everywhere.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

A finite agent revises its beliefs at a cost modeled as a scalar price on optimal transport between probability densities in Wasserstein space. The paper imposes a second condition that one nat of knowledge must cost the same metric length at every belief. These two rules together force the cost metric to be the Wasserstein metric scaled pointwise by twice the sum of a background energy and a relief function, and they single out the Fisher information as the unique continuous relief that satisfies the uniform-price rule. Certainty then lies at infinite distance from any interior belief once the relief grows at least as fast as the Fisher information, producing a cost floor that diverges at the boundary.

Core claim

Under the postulate that revision cost is a scalar price on optimal transport (beliefs live in Wasserstein space) and the postulate that one nat of knowledge costs the same metric length everywhere (eikonal condition), the cost metric is the conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} of the Wasserstein metric by Fisher information, and among continuous reliefs uniform pricing holds if and only if the relief is proportional to the Fisher information. Certainty therefore sits at infinite cost-distance once the relief dominates the Fisher information, so any well-posed inference carries a cost floor that diverges at certainty.

What carries the argument

The conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} of the Wasserstein metric by a relief function U, which enforces uniform pricing of knowledge across all beliefs.

If this is right

  • Certainty lies at infinite cost-distance once the relief dominates the Fisher information.
  • On location-scale families the induced geometry is hyperbolic and the Gaussian is the most curved case at zero background energy.
  • Any well-posed inference therefore possesses a cost floor that diverges as certainty is approached.
  • Via the Landauer relation the cost floor becomes an energy floor for reaching certainty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same geometry could supply explicit lower bounds on the physical energy required for any inference task that approaches certainty.
  • Empirical tests could check whether measured costs of belief revision follow the predicted pointwise scaling by Fisher information.
  • Relaxing the uniform-pricing postulate would leave open other conformal factors and might connect the framework to alternative information geometries.

Load-bearing premise

One nat of knowledge costs the same metric length at every belief.

What would settle it

A concrete measurement or calculation showing finite cost to reach certainty when the relief grows faster than Fisher information would falsify the infinite-distance claim.

Figures

Figures reproduced from arXiv: 2606.21585 by Laurent Caraffa.

Figure 1
Figure 1. Figure 1: The inference chain: from a noisy world to a belief [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The cost geometry of belief. (a) The location-scale leaf in the Poincaré disk (the conformally equivalent rendering of the {σ > 0} half-plane used in the proofs), drawn at the eikonal limit e = 0, where honesty (Postulate 1) makes the leaf exactly hyperbolic. The faint backdrop is the modular tiling: at e = 0 the constant-curvature metric makes its tiles congruent (equal cost-area) — a uniform cost grid; t… view at source ↗
Figure 2
Figure 2. Figure 2: The cost geometry of belief. (a) The location-scale leaf (a single location-scale family, a 2-D submanifold of P2) in the Poincaré disk (the conformally equivalent rendering of the {σ > 0} half-plane used in the proofs), drawn at the eikonal limit e = 0, where uniform pricing (Postulate 1) makes the leaf exactly hyperbolic. The faint backdrop is the modular tiling: at e = 0 the constant-curvature metric ma… view at source ↗
Figure 3
Figure 3. Figure 3: The universe of cost geometries. The nested rings F ⊂ C ⊂ W ⊂ U: the wall W (theorem 4.10), the well-posed class C, the honest Fisher family F (theorem 4.15), and at their center the Gaussian, the most hyperbolic location-scale belief (Stam rigidity, theorem 4.19). The physics P motivates the boundaries (dashed) without entering any proof (theorem B.1). τ being the narrow convergence. To each U ∈ U and eac… view at source ↗
read the original abstract

A finite agent, a machine's digital twin or any bounded reasoner, infers a fixed and noisy world through finite sensors, so its coherent output is a belief: a probability density over states (the Bayes posterior). Such an agent stops short of certainty, and revising a belief carries a cost. We propose an axiomatic framework for transport-based belief costs, motivated by these facts. We pose two postulates. P0 (the arena): a revision cost is a scalar price on optimal transport, so beliefs live in Wasserstein space. P1 (uniform pricing): one nat of knowledge costs the same metric length everywhere, the eikonal condition. Among conceivable pricing rules we study this one. Under P0 and P1 the cost metric is optimal transport conformally reweighted by Fisher information, $\tilde g_{e,U}=2(e+U)\,g_{W_2}$, and the Fisher family is a characterization: among continuous reliefs, uniform pricing is equivalent to $U=cJ$. Two consequences follow on the conformal class. Certainty sits at infinite cost-distance once the relief dominates the Fisher information, so a well-posed inference has a cost floor diverging at certainty (necessity conjectural beyond power laws). On location-scale leaves the geometry is hyperbolic, and the Stam bound places the Gaussian as the most curved one (at $e=0$). The results are geometric, in nats. Via Landauer (one nat worth $k_BT$) the cost floor becomes an energy floor: revising toward certainty would demand unbounded energy. Physics anchors the unit and enters no theorem. Removing either postulate leaves the selection open.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an axiomatic framework for belief revision costs. Under P0 (beliefs live in Wasserstein space with revision cost as a scalar price on optimal transport) and P1 (uniform pricing via the eikonal condition that one nat costs the same metric length everywhere), the cost metric is the Wasserstein metric conformally reweighted by Fisher information: ilde g_{e,U}=2(e+U) g_{W_2}. Among continuous reliefs, uniform pricing is equivalent to U=cJ (Fisher family characterization). Consequences include infinite cost-distance to certainty when the relief dominates Fisher information (conjectural beyond power laws), hyperbolic geometry on location-scale leaves, and the Gaussian as most curved at e=0 via the Stam bound. Results are in nats; Landauer links to energy but enters no theorem.

Significance. If the derivation holds, the work supplies a parameter-free geometric model linking optimal transport, information geometry, and belief costs with no free parameters or fitted entities. Credit is due for the axiomatic construction from two postulates, the explicit characterization of the Fisher family, and the concrete geometric consequences (hyperbolic leaves, curvature bound). The framework is falsifiable in principle via the conjectural infinite-distance claim and offers a transport-based alternative to standard information measures.

major comments (2)
  1. [derivation of conformal metric] The central derivation of the conformal factor ilde g_{e,U}=2(e+U) g_{W_2} from P0 and P1 is presented as following directly, but the manuscript must supply the explicit steps showing how the eikonal condition uniquely selects the Fisher reweighting among pricing rules (abstract and § on derivation).
  2. [consequences on conformal class] The infinite cost-distance claim at certainty is qualified as conjectural beyond power laws; this qualification should be elevated to a precise statement of the domain of the result, as it is load-bearing for the cost-floor consequence (abstract, consequences paragraph).
minor comments (2)
  1. [abstract and introduction] Notation for g_{W_2} and the relief U should be defined at first use with a brief reminder of their relation to the Wasserstein metric and Fisher information.
  2. [final paragraph] The Landauer link is correctly noted as anchoring the unit without entering theorems, but a short clarifying sentence would prevent misreading as a physical derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of significance, and constructive suggestions for minor revision. We address each major comment below.

read point-by-point responses
  1. Referee: [derivation of conformal metric] The central derivation of the conformal factor ilde g_{e,U}=2(e+U) g_{W_2} from P0 and P1 is presented as following directly, but the manuscript must supply the explicit steps showing how the eikonal condition uniquely selects the Fisher reweighting among pricing rules (abstract and § on derivation).

    Authors: We agree that the steps from the eikonal condition to the unique selection of the Fisher reweighting should be fully explicit rather than summarized. In the revised manuscript we will expand the derivation section with the complete sequence: starting from P0 (scalar pricing on W_2 transport), imposing P1 (eikonal: unit length per nat independent of location), deriving the conformal factor 2(e+U), and showing that among continuous reliefs this forces U proportional to Fisher information J. revision: yes

  2. Referee: [consequences on conformal class] The infinite cost-distance claim at certainty is qualified as conjectural beyond power laws; this qualification should be elevated to a precise statement of the domain of the result, as it is load-bearing for the cost-floor consequence (abstract, consequences paragraph).

    Authors: We will revise both the abstract and the consequences paragraph to state the domain precisely: the infinite cost-distance to certainty holds whenever the relief dominates the Fisher information, with necessity proven for power-law families and left as a conjecture for general continuous reliefs beyond that class. This makes the load-bearing status of the claim fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an axiomatic construction: it explicitly posits P0 (beliefs in Wasserstein space with revision cost as scalar price on OT) and P1 (eikonal/uniform pricing condition), then derives the conformal reweighting ilde g_{e,U}=2(e+U) g_{W_2} and the U=cJ characterization directly from those postulates. No step reduces a claimed prediction or uniqueness result to a fitted parameter, a self-citation chain, or a definition that presupposes the output. No self-citations appear in the load-bearing derivation. The geometry follows from the chosen axioms by construction, which is the intended structure rather than circularity. The infinite-distance consequence is already flagged as conjectural.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on two explicitly stated postulates; no free parameters, invented entities, or additional axioms are mentioned in the abstract.

axioms (2)
  • domain assumption P0: a revision cost is a scalar price on optimal transport, so beliefs live in Wasserstein space.
    Stated as the first postulate in the abstract.
  • domain assumption P1: one nat of knowledge costs the same metric length everywhere, the eikonal condition.
    Stated as the second postulate in the abstract.

pith-pipeline@v0.9.1-grok · 5830 in / 1412 out tokens · 47501 ms · 2026-06-30T10:33:16.282385+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 14 canonical work pages · 7 internal anchors

  1. [1]

    Amari, S.-i. (2016). Information Geometry and Its Applications. Springer

  2. [2]

    Amari, S.-i., Matsuda, T. (2024). Information geometry of Wasserstein statistics on shapes and affine deformations. Information Geometry, 7(2), 285--309

  3. [3]

    Ambrosio, L., Gigli, N., Savaré, G. (2008). Gradient Flows in Metric Spaces and in the Space of Probability Measures, 2nd ed. Birkhäuser

  4. [4]

    Ambrosio, L., Santambrogio, F. (2007). Necessary optimality conditions for geodesics in weighted Wasserstein spaces. Rend. Lincei Mat. Appl., 18; arXiv:math/0603435

  5. [5]

    Arnold, V. I. (1989). Mathematical Methods of Classical Mechanics, 2nd ed. Springer, Graduate Texts in Mathematics 60

  6. [6]

    Atkinson, C., Mitchell, A. F. S. (1981). Rao's distance measure. Sankhyā A, 43, 345--365

  7. [7]

    Aurell, E., Mejía-Monasterio, C., Muratore-Ginanneschi, P. (2011). Optimal protocols and optimal transport in stochastic thermodynamics. Phys. Rev. Lett., 106, 250601

  8. [8]

    Bahri, Y., Dyer, E., Kaplan, J., Lee, J., Sharma, U. (2024). Explaining neural scaling laws. Proc. Natl. Acad. Sci. USA, 121(27), e2311878121; arXiv:2102.06701

  9. [9]

    Bekenstein, J. D. (1981). Universal upper bound on the entropy-to-energy ratio for bounded systems. Phys. Rev. D, 23(2), 287--298

  10. [10]

    Benamou, J.-D., Brenier, Y. (2000). A computational fluid mechanics solution to the Monge--Kantorovich mass transfer problem. Numer. Math., 84, 375--393

  11. [11]

    Bennett, C. H. (1982). The thermodynamics of computation: a review. Int. J. Theor. Phys., 21, 905--940

  12. [12]

    Bérut, A., et al. (2012). Experimental verification of Landauer's principle. Nature, 483, 187--189

  13. [13]

    Bobkov, S., Ledoux, M. (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances. Mem. AMS 261

  14. [14]

    Burago, D., Burago, Y., Ivanov, S. (2001). A Course in Metric Geometry. AMS, GSM 33

  15. [15]

    Buttazzo, G., Giaquinta, M., Hildebrandt, S. (1998). One-dimensional Variational Problems. Oxford University Press

  16. [16]

    Chentsov, N. N. (1972). Statistical Decision Rules and Optimal Inference. AMS Transl. Math. Monogr. 53

  17. [17]

    Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.-X. (2018). Unbalanced optimal transport: dynamic and Kantorovich formulations. J. Funct. Anal., 274, 3090--3123

  18. [18]

    Conforti, G., Pavon, M. (2018). Extremal flows on Wasserstein space. J. Math. Phys., 59(6), 063502; preprint arXiv:1712.02257; short version in Proc. GSI 2017, Springer, 92--99

  19. [19]

    Costa, S. I. R., Santos, S. A., Strapasson, J. E. (2015). Fisher information distance: a geometrical reading. Discrete Appl. Math., 197, 59--69

  20. [20]

    M., Thomas, J

    Cover, T. M., Thomas, J. A. (2006). Elements of Information Theory, 2nd ed. Wiley

  21. [21]

    Cox, R. T. (1946). Probability, frequency and reasonable expectation. Am. J. Phys., 14(1), 1--13

  22. [22]

    Dechant, A., Sakurai, Y. (2019). Thermodynamic interpretation of Wasserstein distance. Preprint arXiv:1912.08405

  23. [23]

    do Carmo, M. P. (1976). Differential Geometry of Curves and Surfaces. Prentice-Hall, Englewood Cliffs

  24. [24]

    Dupuis, P., Ellis, R. S. (1997). A Weak Convergence Approach to the Theory of Large Deviations. Wiley

  25. [25]

    Einstein, A. (1916). Die Grundlage der allgemeinen Relativitätstheorie. Annalen der Physik, 354(7), 769--822

  26. [26]

    Fisher, R. A. (1925). Theory of statistical estimation. Proc.\ Cambridge Philos.\ Soc., 22, 700--725

  27. [27]

    Friston, K. (2019). A free energy principle for a particular physics. Preprint arXiv:1906.10184

  28. [28]

    Gigli, N. (2012). Second order analysis on ( P_2(M),W_2) . Mem. Amer. Math. Soc., 216(1018)

  29. [29]

    R., Shortt, R

    Givens, C. R., Shortt, R. M. (1984). A class of Wasserstein metrics for probability distributions. Michigan Math. J., 31, 231--240

  30. [30]

    Grieves, M., Vickers, J. (2017). Digital twin: mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives on Complex Systems, Springer, 85--113

  31. [31]

    Huber, P. J. (1981). Robust Statistics. Wiley

  32. [32]

    Huszár, F. (2018). Note on the quadratic penalties in elastic weight consolidation. Proc. Natl. Acad. Sci. USA, 115, E2496--E2497

  33. [33]

    Hoffmann, J., Borgeaud, S., Mensch, A., et al. (2022). Training compute-optimal large language models. Adv. Neural Inf. Process. Syst. (NeurIPS) 35; arXiv:2203.15556

  34. [34]

    Hyland, D., Albarracin, M. (2025). On the variational costs of changing our minds. Proc. 6th Int. Workshop on Active Inference; arXiv:2509.17957

  35. [35]

    Ito, S. (2018). Stochastic thermodynamic interpretation of information geometry. Phys. Rev. Lett., 121, 030605

  36. [36]

    Ito, S. (2023). Geometric thermodynamics for the Fokker--Planck equation. Information Geometry, 6, 441--483; arXiv:2209.00527

  37. [37]

    Ito, S., Sagawa, T. (2016). Information flow and entropy production on Bayesian networks. In Math. Found. Appl. Graph Entropy, 63--99; arXiv:1506.08519

  38. [38]

    Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press

  39. [39]

    Jordan, R., Kinderlehrer, D., Otto, F. (1998). The variational formulation of the Fokker--Planck equation. SIAM J. Math. Anal., 29, 1--17

  40. [40]

    Jordan, P., von Neumann, J. (1935). On inner products in linear, metric spaces. Ann. of Math., 36(3), 719--723

  41. [41]

    Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Trans.\ ASME J.\ Basic Eng., 82(1), 35--45

  42. [42]

    Kaplan, J., McCandlish, S., Henighan, T., et al. (2020). Scaling laws for neural language models. arXiv:2001.08361

  43. [43]

    Khinchin, A. I. (1957). Mathematical Foundations of Information Theory. Dover, New York

  44. [44]

    Kirkpatrick, J., Pascanu, R., Rabinowitz, N., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. USA, 114, 3521--3526

  45. [45]

    Kleijn, B. J. K., van der Vaart, A. W. (2012). The Bernstein--von Mises theorem under misspecification. Electron. J. Stat., 6, 354--381

  46. [46]

    Kolchinsky, A., Wolpert, D. H. (2018). Semantic information, autonomous agency and non-equilibrium statistical physics. Interface Focus, 8, 20180041

  47. [47]

    Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM J. Res. Dev., 5, 183--191

  48. [48]

    Li, W. (2021). Hessian metric via transport information geometry. J. Math. Phys., 62; arXiv:2003.10526

  49. [49]

    Li, W., Montúfar, G. (2018). Natural gradient via optimal transport. Information Geometry, 1(2), 181--214

  50. [50]

    Li, W., Zhao, J. (2023). Wasserstein information matrix. Information Geometry, 6(1), 203--255

  51. [51]

    Lindley, D. V. (1991). Making Decisions, 2nd ed. Wiley, London. (Cromwell's rule.)

  52. [52]

    Lions, P.-L. (1984). The concentration-compactness principle in the calculus of variations. Ann. IHP Anal. Non Linéaire, 1, 109--145, 223--283

  53. [53]

    Lott, J., Villani, C. (2009). Ricci curvature for metric-measure spaces via optimal transport. Ann. of Math., 169(3), 903--991

  54. [54]

    Sturm, K.-T. (2006). On the geometry of metric measure spaces I, II. Acta Math., 196(1), 65--131, 133--177

  55. [55]

    MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press

  56. [56]

    McCann, R. J. (1997). A convexity principle for interacting gases. Adv. Math., 128(1), 153--179

  57. [57]

    B., Duarte Queirós, S

    Melo, P. B., Duarte Queirós, S. M., Morgado, W. A. M. (2025). Stochastic thermodynamics of Fisher information. Phys. Rev. E, 111, 014101

  58. [58]

    Okanohara, D. (2026). A thermodynamic theory of learning I: irreversible ensemble transport and epistemic costs. Preprint arXiv:2601.17607

  59. [59]

    Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations, 26(1--2), 101--174

  60. [60]

    Otto, F., Villani, C. (2000). Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal., 173, 361--400

  61. [61]

    Parrondo, J. M. R., Horowitz, J. M., Sagawa, T. (2015). Thermodynamics of information. Nat. Phys., 11, 131--139

  62. [62]

    Sagawa, T., Ueda, M. (2010). Generalized Jarzynski equality under nonequilibrium feedback control. Phys. Rev. Lett., 104, 090602

  63. [63]

    Said, S., Bombrun, L., Berthoumieu, Y. (2017). Warped Riemannian metrics for location-scale models. Preprint arXiv:1707.07163

  64. [64]

    Sakthivadivel, D. A. R. (2022). Towards a geometry and analysis for Bayesian mechanics. Preprint arXiv:2204.11900

  65. [65]

    Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians. Birkhäuser (§2: optimal transport in one dimension)

  66. [66]

    Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J., 27, 379--423, 623--656

  67. [67]

    A., Crooks, G

    Sivak, D. A., Crooks, G. E. (2012). Thermodynamic metrics and optimal paths. Phys. Rev. Lett., 108, 190602

  68. [68]

    Stam, A. J. (1959). Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control, 2, 101--112

  69. [69]

    A., Bell, A

    Still, S., Sivak, D. A., Bell, A. J., Crooks, G. E. (2012). Thermodynamics of prediction. Phys. Rev. Lett., 109, 120604

  70. [70]

    Takatsu, A. (2011). Wasserstein geometry of Gaussian measures. Osaka J. Math., 48(4), 1005--1026

  71. [71]

    van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. (Ch. 10, Bernstein--von Mises.)

  72. [72]

    Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der math. Wiss. 338, Springer

  73. [73]

    von Renesse, M.-K. (2012). An optimal transport view of Schrödinger's equation. Canad. Math. Bull., 55, 858--869

  74. [74]

    Zhang, J., Wong, T.-K. L. (2022). -Deformation: a canonical framework for statistical manifolds of constant curvature. Entropy, 24(2), 193