The Geometry of Statistical Data and Information: A Large Deviation Perspective
Pith reviewed 2026-05-23 06:20 UTC · model grok-4.3
The pith
The information projection from divergence minimization coincides with the projection in Kolmogorov probability theory under both i.i.d. and Markov assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The information projection defined in information geometry as divergence minimization coincides with the information projection in Kolmogorov's probability theory under both i.i.d. and Markovian assumptions.
What carries the argument
Large-deviation rate functions (entropy functions) that equip the manifold of empirical means with a Riemannian geometry.
If this is right
- Fisher-Rao spherical geometry appears only for singleton frequencies under the i.i.d. case and fails for pairwise statistics.
- The governing probability measure itself curves the space of empirical data.
- The identification places information geometry inside the measure-theoretic foundations of probability.
Where Pith is reading between the lines
- The same entropy-based geometry could be examined for stationary processes that are neither i.i.d. nor Markov.
- If the projections remain identical, the construction supplies a concrete Riemannian metric on empirical-mean manifolds for any ergodic source.
Load-bearing premise
Large-deviation rate functions can be interpreted directly as defining a Riemannian geometry on the manifold of empirical means without extra regularity conditions on the probability measure.
What would settle it
An explicit pair of distributions and a Markov chain for which the divergence-minimizing projection differs from the Kolmogorov information projection.
read the original abstract
The manifold of empirical mean values of statistical data ad infinitum has a geometric shape that depends on the probability measure that governs the generating model. Large deviation theory produces entropy functions that depend on both the probability measure and the statistical data; we use entropy to study the geometry of the data space rather than that of the space of probability distributions. It is well known, since Rao's work, that the Fisher-Rao metric makes the probability simplex into a sphere. From our perspective, that result translates to the space of empirical singleton counting frequencies under an i.i.d. assumption. Following our ideas and going beyond i.i.d., the choice of measure curves the space. When we study the pairwise statistics, the spherical geometry breaks down entirely. We show that the information projection, defined in information geometry as divergence minimization, coincides with the information projection in Kolmogorov's probability theory. This identification holds under both i.i.d. and Markovian assumptions and connects information geometry to the foundations of probability theory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the manifold of empirical mean values of statistical data has a geometric shape determined by the governing probability measure. Large deviation theory yields entropy functions depending on both the measure and data, which are used to study the geometry of the data space. It asserts that the information projection defined via divergence minimization in information geometry coincides with the information projection in Kolmogorov's probability theory, holding under both i.i.d. and Markovian assumptions. This connects information geometry to probability foundations, with the Fisher-Rao metric yielding spherical geometry for i.i.d. singleton frequencies but breaking down for pairwise statistics under Markov assumptions.
Significance. If the identification holds, the manuscript bridges information geometry and the foundations of probability by reinterpreting large-deviation rate functions as defining geometry on empirical data manifolds rather than probability distributions. The Markovian extension, showing breakdown of spherical geometry, is a substantive step beyond standard i.i.d. results such as Sanov's theorem. The work explicitly builds on Rao's Fisher-Rao metric without introducing free parameters, ad-hoc axioms, or invented entities. The potential concern about regularity conditions on the probability measure for a Riemannian interpretation does not appear to undermine the central claim, as the large-deviation principle is taken as granted and the identification remains internally consistent.
minor comments (3)
- [Abstract] Abstract: the phrase 'ad infinitum' is imprecise; replace with a clearer description of the limiting regime for empirical means.
- [Abstract] Abstract: the claim that spherical geometry 'breaks down entirely' for pairwise statistics would be strengthened by a brief concrete indication of the deviation (e.g., a specific rate-function property) in the Markov case.
- The connection to Kolmogorov's probability theory would benefit from citing one or two specific foundational results or theorems rather than a general reference.
Simulated Author's Rebuttal
We thank the referee for their careful reading and positive evaluation of the manuscript. We are pleased that the significance of connecting large-deviation rate functions to the geometry of empirical data manifolds, and the identification of information projections under both i.i.d. and Markov assumptions, has been recognized. We will incorporate any minor revisions as appropriate.
Circularity Check
No significant circularity
full rationale
The paper's core claim equates the I-projection from information geometry (KL minimization) with a Kolmogorov-style projection via large-deviation rate functions. This identification for the i.i.d. case follows directly from the standard Sanov theorem, an external result. The Markovian extension applies the same rate-function minimization to pairwise empirical measures once the LDP for the pair measure is granted. No quoted equations reduce a prediction to a fitted parameter, no self-citation chain bears the central load, and no ansatz is smuggled in. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large deviation principle holds for the empirical measures under the stated i.i.d. and Markov assumptions
- standard math The Fisher-Rao metric on the probability simplex is known to be spherical (Rao)
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Theorem 7 (Information Projection). ... E[ν | Fx] = arg inf ν∈ri(Δn) {S(ν|p) | ∑νixi=x}
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
S(ν|p)=∑νi log(νi/pi) ... rate function of Sanov’s theorem
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff refines?
refinesRelation between the paper passage and the cited Recognition theorem.
Legendre-Fenchel transform ... free energy F(μ|p)=log∑pieμi
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A. N. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. New York: Springer, 1933
work page 1933
-
[2]
Y . Choquet-Bruhat, C. DeWitt-Morette, and M. Dillard-Bleick, Analysis, Manifolds and Physics, Part I: Basics , 2nd ed. Amsterdam: Elsevier, 1982
work page 1982
-
[3]
Amari, Information Geometry and Its Applications
S.-I. Amari, Information Geometry and Its Applications . New York: Springer, 2016
work page 2016
-
[4]
Amari, Differential-Geometrical Methods in Statistics , ser
S.-I. Amari, Differential-Geometrical Methods in Statistics , ser. Lecture Notes in Statistics. New York: Springer, 1990
work page 1990
-
[5]
Information and the accuracy attainable in the estimation of statistical parameters,
C. R. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” in Breakthroughs in Statistics , ser. Springer Series in Statistics, S. Kotz and N. L. Johnson, Eds. New York: Springer, 1992, pp. 235–247
work page 1992
-
[6]
When optimal transport meets information geometry,
G. Khan and J. Zhang, “When optimal transport meets information geometry,” Information Geometry, vol. 5, pp. 47–78, 2022
work page 2022
-
[7]
J. Dickey, N. T. Gridgeman, M. C. S. Kingsley, I. J. Good, J. E. Carlson, D. Gianola, M. H. Kutner, and S. Selvin, “Letters to the editor,” The American Statistician, vol. 29, no. 3, pp. 131–134, 1975
work page 1975
-
[8]
S. Selvin, M. Bloxham, A. I. Khuri, M. Moore, R. Coleman, G. R. Bryce, J. A. Hagans, T. C. Chalmers, E. A. Maxwell, and G. N. Smith, “Letters to the editor,” The American Statistician , vol. 29, no. 1, pp. 67–71, 1975
work page 1975
-
[9]
Formulering van het ‘som-en-product’-probleem,
H. Freudenthal, “Formulering van het ‘som-en-product’-probleem,” Nieuw Archief voor Wiskunde , vol. 17, no. 3, p. 152, 1969
work page 1969
-
[10]
Pride of problems, including one that is virtually impossible,
M. Gardner, “Pride of problems, including one that is virtually impossible,” Scientific American, vol. 241, no. 6, p. 22, 1979
work page 1979
-
[11]
Baclawski, Introduction to Probability with R
K. Baclawski, Introduction to Probability with R . Chapman and Hall/CRC, 2008
work page 2008
-
[12]
H. Qian, “Internal energy, fundamental thermodynamic relation, and Gibbs’ ensemble theory as emergent laws of statistical counting,” Entropy, vol. 26, p. 1091, 2024
work page 2024
-
[13]
L. Ambrosio, N. Gigli, and G. Savare, Gradient flows, 2nd ed., ser. Lectures in Mathematics. ETH Z ¨urich. Basel, Switzerland: Birkhauser Verlag AG, Dec. 2008
work page 2008
-
[14]
Villani, Optimal Transport, ser
C. Villani, Optimal Transport, ser. Grundlehren der mathematischen Wissenschaften. Berlin, Germany: Springer, Dec. 2009
work page 2009
-
[15]
Information geometry of the EM and em algorithms for neural networks,
S.-I. Amari, “Information geometry of the EM and em algorithms for neural networks,” Neural Networks, vol. 8, no. 9, pp. 1379–1408, 1995
work page 1995
-
[16]
E. T. Jaynes, Probability Theory: The Logic of Science . London, U.K.: Cambridge University Press, 2003
work page 2003
-
[17]
S. H. Strogatz, Nonlinear Dynamics and Chaos With Applications to Physics, Biology, Chemistry, and Engineering , 2nd ed. Boca Raton: CRC Press, 2015
work page 2015
-
[18]
A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes . New York: Springer, 1996
work page 1996
-
[19]
H. Qian and H. Ge, Stochastic Chemical Reaction Systems in Biology . Cham, Switzerland: Springer Nature, 2021
work page 2021
-
[20]
H. B. Callen, Thermodynamics and an Introduction to Thermostatistics , 2nd ed. New York: Wiley, 1991
work page 1991
-
[21]
B. Miao, H. Qian, and Y .-S. Wu, “On thermodynamic information,” arXiv:2312.03454, 2023
-
[22]
More is different: Broken symmetry and the nature of the hierarchical structure of science,
P. W. Anderson, “More is different: Broken symmetry and the nature of the hierarchical structure of science,” Science, vol. 177, pp. 393–396, 1972
work page 1972
-
[23]
H. Qian and Y .-C. Cheng, “Counting single cells and computing their heterogeneity: From phenotypic frequencies to mean value of a quantitative biomarker,” Quant. Biol., vol. 8, pp. 172–176, 2020
work page 2020
-
[24]
Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,
E. Angelini and H. Qian, “Statistical analysis of random motion and energetic behavior of counting: Gibbs’ theory revisited,” J. Phys. Chem. B , vol. 127, pp. 2552–2564, 2023
work page 2023
-
[25]
Further studies on the thermal equilibrium of gas molecules,
L. Boltzmann, “Further studies on the thermal equilibrium of gas molecules,” in The Kinetic Theory of Gases: An Anthology of Classic Papers with Historical Commentary, S. G. Brush and N. S. Hall, Eds. Singapore: World Scientific, 2003, pp. 262–349
work page 2003
-
[26]
Planck, The Theory of Heat Radiation
M. Planck, The Theory of Heat Radiation . Blakiston, 1914
work page 1914
-
[27]
A mathematical theory of communication,
C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948
work page 1948
-
[28]
A. Y . Khinchin, Mathematical Foundations of Information Theory . Dover, 1957
work page 1957
-
[29]
Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,
J. Shore and R. Johnson, “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,” IEEE Transactions on Information Theory , vol. 26, no. 1, pp. 26–37, 1980. 17
work page 1980
-
[30]
On the probability of large deviations of random variables,
I. N. Sanov, “On the probability of large deviations of random variables,” Selected Translations in Mathematical Statistics and Probability , vol. 1, pp. 213–244, 1961
work page 1961
-
[31]
A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications , 2nd ed. New York: Springer, 1998
work page 1998
-
[32]
D. A. Kappos, Probability algebras and stochastic spaces . Academic Press, 2014, vol. 7
work page 2014
-
[33]
Twelve problems in probability no one likes to bring up,
G.-C. Rota, “Twelve problems in probability no one likes to bring up,” in Algebraic Combinatorics and Computer Science: A Tribute to Gian-Carlo Rota. Springer, 2001, pp. 57–93
work page 2001
-
[34]
A. R ´enyi, Probability Theory. Courier Corporation, 2007
work page 2007
-
[35]
Durrett, Probability: Theory and Examples
R. Durrett, Probability: Theory and Examples . Cambridge university press, 2019
work page 2019
-
[36]
Stochastic calculus, filtering, and stochastic control,
R. Van Handel, “Stochastic calculus, filtering, and stochastic control,” Course notes., URL http://www. princeton. edu/rvan/acm217/ACM217. pdf, vol. 14, 2007
work page 2007
-
[37]
M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information . Cambridge: Cambridge University Press, 2010
work page 2010
-
[38]
Baym, Lectures On Quantum Mechanics
G. Baym, Lectures On Quantum Mechanics . CRC Press, 1969
work page 1969
-
[39]
Importance Sampling in the Monte Carlo Study of Sequential Tests,
D. Siegmund, “Importance Sampling in the Monte Carlo Study of Sequential Tests,” The Annals of Statistics , vol. 4, no. 4, pp. 673 – 684, 1976
work page 1976
-
[40]
T. L. Hill, Statistical Mechanics: Principles and Selected Applications . New York: McGraw-Hill, 1956
work page 1956
-
[41]
R. T. Rockafellar, Convex Analysis. Princeton: Princeton University Press, 1970
work page 1970
-
[42]
Clustering with Bregman divergences,
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” Journal of Machine Learning Research , vol. 6, no. 58, pp. 1705–1749, 2005
work page 2005
-
[43]
J. W. Gibbs, The Collected Works of J. Willard Gibbs . New Haven, CT: Yale Univ. Press, 1948
work page 1948
-
[44]
¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,
L. Szilard, “ ¨Uber die ausdehnung der ph ¨anomenologschen thermodynamik auf die schwankungserscheinungen,” Zeitschrift f ¨ur Physik , vol. 32, pp. 753–7888, 1925
work page 1925
-
[45]
J. M. Lee, Introduction to Smooth Manifolds , ser. Graduate Texts in Mathematics. New York: Springer, 2002
work page 2002
-
[46]
T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. New York: Wiley-Interscience, 2006
work page 2006
-
[47]
B. Gr ¨unbaum, V . Klee, M. A. Perles, and G. C. Shephard, Convex polytopes. Springer, 1967, vol. 16
work page 1967
-
[48]
Subdivisions and triangulations of polytopes,
C. W. Lee and F. Santos, “Subdivisions and triangulations of polytopes,” in Handbook of discrete and computational geometry. Chapman and Hall/CRC, 2017, pp. 415–447
work page 2017
-
[49]
J. Gallier and J. Quaintance, “Aspects of convex geometry polyhedra, linear programming, shellings, voronoi diagrams, delaunay triangulations,” Department of Computer and Information Science, University of Pennsylvania , vol. 219104, pp. 31–235, 2017
work page 2017
-
[50]
J. M. Lee, Introduction to Topological Manifolds , 2nd ed., ser. Graduate Texts in Mathematics. New York: Springer, 2010. APPENDIX We continue our discussion from section III-D. Since {q1, . . . ,qn−k} are all endpoint of the simplex U, we relate Q and the random variable X as XQ = Xq1 . . . Xqn−k = | | x . . . x | | = x1T n−k or, (21) X − x1T n Q...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.