pith. sign in

arxiv: 2508.21466 · v2 · submitted 2025-08-29 · 💻 cs.LG · cs.IT· math.IT

Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces

Pith reviewed 2026-05-18 20:49 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords Normalized Maximum LikelihoodRiemannian ManifoldsHyperbolic SpaceCoordinate InvarianceModel SelectionVolume FormSymmetric SpacesRegret Minimization
0
0 comments X

The pith

A coordinate-invariant normalized maximum likelihood can be defined on Riemannian manifolds by using the volume form induced by the metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a new version of normalized maximum likelihood, called Rm-NML, that respects the geometry of any Riemannian manifold instead of being tied to Euclidean coordinates. The central move is to replace the ordinary Lebesgue measure in the normalizing integral with the Riemannian volume element, which automatically makes the resulting code length unchanged when the coordinate system is altered. This new quantity reduces exactly to the classical NML when the manifold is flat Euclidean space equipped with its standard coordinates. The authors also extend existing approximation methods and obtain simplified expressions on symmetric spaces, with an explicit formula worked out for normal distributions on hyperbolic space.

Core claim

We define the Riemannian manifold NML (Rm-NML) by replacing the Lebesgue measure in the conventional NML normalizing constant with the Riemannian volume form. This ensures invariance under coordinate transformations and agreement with standard NML in Euclidean space. We extend computational techniques and derive simplifications for Riemannian symmetric spaces, including explicit computation for normal distributions on hyperbolic spaces.

What carries the argument

Rm-NML, formed by integrating the maximum likelihood density against the Riemannian volume measure rather than Lebesgue measure, which carries the invariance property.

If this is right

  • Model selection and regret minimization can now be performed directly on manifold-valued data without coordinate artifacts.
  • On hyperbolic spaces the method supplies a concrete code length for hierarchical graph models.
  • Existing NML approximation algorithms extend to the manifold setting once the volume form is substituted.
  • The same construction applies to any Riemannian symmetric space arising in data analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach suggests trying Rm-NML on sphere-valued or Grassmannian data to see whether it improves compression over Euclidean approximations.
  • One could compare regret achieved by Rm-NML versus Euclidean NML when data are known to lie on a curved manifold.
  • Closed-form expressions might be derived for other common distributions on hyperbolic space beyond Gaussians.

Load-bearing premise

A Riemannian metric and its associated volume measure must exist on the data space so the normalizing integral can be written in a coordinate-free manner.

What would settle it

Evaluate the Rm-NML value for the same distribution on the sphere or hyperbolic plane in two different coordinate systems and check whether the numerical results match exactly.

read the original abstract

In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and are inherently dependent on the choice of coordinate systems, making it non-trivial to extend NML to Riemannian manifolds. In this study, we define a new NML that reflects the geometric structure of Riemannian manifolds, called the Riemannian manifold NML (Rm-NML). This Rm-NML is invariant under coordinate transformations and coincides with the conventional NML under the natural parameterization in Euclidean space. We extend existing computational techniques for NML to the setting of Riemannian manifolds. Furthermore, we derive a method to simplify the computation of Rm-NML on Riemannian symmetric spaces, which encompass data spaces of growing interest such as hyperbolic spaces. To illustrate the practical application of our proposed method, we explicitly computed the Rm-NML for normal distributions on hyperbolic spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper defines a Riemannian manifold version of Normalized Maximum Likelihood (Rm-NML) by replacing the Euclidean Lebesgue measure in the normalizing integral with the Riemannian volume form induced by a chosen metric on the data manifold. The resulting quantity is claimed to be invariant under coordinate reparameterizations, to reduce exactly to ordinary NML on Euclidean space with the standard metric, and to admit simplified evaluation on Riemannian symmetric spaces. An explicit closed-form or computable expression is derived for the normal distribution on hyperbolic space.

Significance. If the finiteness of the new normalizing integral and the invariance property are rigorously established, the construction supplies a geometrically intrinsic model-selection criterion for data lying on manifolds that arise in graph embedding and hierarchical modeling. The reduction to the Euclidean case and the explicit hyperbolic-normal example are direct consequences of standard differential-geometric identities and therefore constitute a clean extension of existing NML techniques.

major comments (2)
  1. [§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.
  2. [§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.
minor comments (2)
  1. [Notation] The notation for the Riemannian metric g and its induced volume form dV_g is introduced without a short reminder of the coordinate-change rule for the volume element; a one-sentence recall would improve readability for readers outside differential geometry.
  2. [Introduction] A reference to Rissanen’s original NML papers or to the standard Euclidean regret bounds would help situate the new definition relative to the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and for highlighting the importance of rigorously establishing the well-definedness of Rm-NML. We address each major comment below and will incorporate the necessary additions in the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.

    Authors: We agree that an explicit proof of finiteness is essential for the non-compact case. In the revised manuscript, we will include a dedicated subsection providing a decay estimate for the integrand. Specifically, we will show that the maximum likelihood density for the Riemannian normal distribution on hyperbolic space decays exponentially with the hyperbolic distance, which, when integrated against the exponentially growing volume element, yields a convergent integral. This argument relies on standard estimates for hyperbolic geometry and ensures Rm-NML is well-defined. revision: yes

  2. Referee: [§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.

    Authors: The maximum-likelihood density is normalized by construction with respect to the Riemannian volume measure, as it is the density of the probability distribution on the manifold. For the Rm-NML integral, we acknowledge the lack of explicit verification in the original text. We will add a verification step in the revised version, either by direct computation for the specific parameters or by providing bounds that confirm convergence using the explicit volume form in the hyperboloid model. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in Rm-NML definition

full rationale

The central construction replaces the Lebesgue measure with the Riemannian volume form inside the normalizing integral of the maximum-likelihood density. This is a direct, coordinate-free redefinition that inherits invariance from the intrinsic properties of the volume element and reduces to ordinary NML on Euclidean space by specialization of the metric; neither step relies on fitted parameters renamed as predictions nor on self-citations whose content is presupposed. Extensions to symmetric spaces and the hyperbolic-normal example invoke only standard differential-geometric identities that are independent of the present paper's results. The derivation chain is therefore self-contained against external geometric facts and introduces no load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a Riemannian volume measure that replaces the Euclidean Lebesgue measure and on the technical assumption that the resulting integral can be normalized and computed on symmetric spaces.

axioms (1)
  • domain assumption Data space admits a Riemannian metric whose volume form can be used to define an invariant normalizing constant for the maximum-likelihood integral.
    Invoked when the authors replace coordinate-dependent Lebesgue measure with the intrinsic volume element to achieve invariance.

pith-pipeline@v0.9.0 · 5736 in / 1289 out tokens · 56593 ms · 2026-05-18T20:49:01.786752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Spherical graph embedding for item retrieval in recommendation system,

    W. Zhu, Y . Xu, X. Huang, Q. Min, and X. Zhou, “Spherical graph embedding for item retrieval in recommendation system,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4752–4756

  2. [2]

    Poincar ´e embeddings for learning hierarchical representations,

    M. Nickel and D. Kiela, “Poincar ´e embeddings for learning hierarchical representations,” Advances in Neural Information Processing Systems , vol. 30, 2017

  3. [3]

    Hyperbolic geometry of complex networks,

    D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Bo- gun´a, “Hyperbolic geometry of complex networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 82, no. 3, p. 036106, 2010

  4. [4]

    Learning mixed-curvature repre- sentations in product spaces,

    A. Gu, F. Sala, B. Gunel, and C. R ´e, “Learning mixed-curvature repre- sentations in product spaces,” in International Conference on Learning Representations, 2018

  5. [5]

    Universal sequential coding of single messages,

    Y . M. Shtar’kov, “Universal sequential coding of single messages,” Problemy Peredachi Informatsii, vol. 23, no. 3, pp. 3–17, 1987

  6. [6]

    Yamanishi, Learning with the Minimum Description Length Princi- ple

    K. Yamanishi, Learning with the Minimum Description Length Princi- ple. Springer, 2023

  7. [7]

    Fisher information and stochastic complexity,

    J. J. Rissanen, “Fisher information and stochastic complexity,” IEEE transactions on Information Theory , vol. 42, no. 1, pp. 40–47, 2002

  8. [8]

    Mdl denoising,

    J. Rissanen, “Mdl denoising,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2537–2543, 2002

  9. [9]

    Exact calculation of normalized max- imum likelihood code length using fourier analysis,

    A. Suzuki and K. Yamanishi, “Exact calculation of normalized max- imum likelihood code length using fourier analysis,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1211–1215

  10. [10]

    Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,

    ——, “Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,” IEEE Transac- tions on Information Theory , vol. 67, no. 9, pp. 6164–6178, 2021

  11. [11]

    A randomized approximation of the mdl for stochastic models with hidden variables,

    K. Yamanishi, “A randomized approximation of the mdl for stochastic models with hidden variables,” in Proceedings of the Ninth Annual Conference on Computational Learning Theory , 1996, pp. 99–109

  12. [12]

    Monte carlo estimation of minimax regret with an application to mdl model selection,

    T. Roos, “Monte carlo estimation of minimax regret with an application to mdl model selection,” in 2008 IEEE Information Theory Workshop . IEEE, 2008, pp. 284–288

  13. [13]

    Rissanen, Optimal Estimation of Parameters

    J. Rissanen, Optimal Estimation of Parameters. Cambridge University Press, 2012

  14. [14]

    Amari and H

    S.-i. Amari and H. Nagaoka, Methods of Information Geometry. Amer- ican Mathematical Soc., 2000, vol. 191

  15. [15]

    Foundation of calculating normalized maximum likelihood for continuous probability models,

    A. Suzuki, K. Fukuzawa, and K. Yamanishi, “Foundation of calculating normalized maximum likelihood for continuous probability models,”

  16. [16]

    Available: https://arxiv.org/abs/2409.08387

    [Online]. Available: https://arxiv.org/abs/2409.08387

  17. [17]

    Modeling by shortest data description,

    J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978

  18. [18]

    World scientific, 1998, vol

    ——, Stochastic Complexity in Statistical Inquiry . World scientific, 1998, vol. 15

  19. [19]

    Minimum complexity density estima- tion,

    A. R. Barron and T. M. Cover, “Minimum complexity density estima- tion,” IEEE Transactions on Information Theory , vol. 37, no. 4, pp. 1034–1054, 2002

  20. [20]

    A learning criterion for stochastic rules,

    K. Yamanishi, “A learning criterion for stochastic rules,” Machine Learning, vol. 9, no. 2, pp. 165–203, 1992

  21. [21]

    P. D. Gr ¨unwald, The Minimum Description Length Principle . MIT press, 2007

  22. [22]

    An mdl framework for data clustering,

    P. Kontkanen, P. Myllym ¨aki, W. Buntine, J. Rissanen, and H. Tirri, “An mdl framework for data clustering,” Advances in Minimum Description Length: Theory and Applications , pp. 323–354, 2005

  23. [23]

    Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,

    S. Hirai and K. Yamanishi, “Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,”IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7718–7727, 2013

  24. [24]

    Principal geodesic analysis for the study of nonlinear statistics of shape,

    P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging , vol. 23, no. 8, pp. 995–1005, 2004

  25. [25]

    I. L. Dryden and K. V . Mardia, Statistical Shape Analysis: with Appli- cations in R . John Wiley & Sons, 2016

  26. [26]

    Reparameterizing distributions on lie groups,

    L. Falorsi, P. De Haan, T. R. Davidson, and P. Forr ´e, “Reparameterizing distributions on lie groups,” in The 22nd International Conference on Artificial Intelligence and Statistics . PMLR, 2019, pp. 3244–3253

  27. [27]

    Toruse: Knowledge graph embedding on a lie group,

    T. Ebisu and R. Ichise, “Toruse: Knowledge graph embedding on a lie group,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018

  28. [28]

    M¨obiuse: Knowledge graph embedding on m ¨obius ring,

    Y . Chen, J. Liu, Z. Zhang, S. Wen, and W. Xiong, “M¨obiuse: Knowledge graph embedding on m ¨obius ring,” Knowledge-Based Systems, vol. 227, p. 107181, 2021

  29. [29]

    Hyperbolic neural networks,

    O. Ganea, G. B ´ecigneul, and T. Hofmann, “Hyperbolic neural networks,” Advances in Neural Information Processing Systems , vol. 31, 2018

  30. [30]

    Absil, R

    P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

  31. [31]

    Stochastic gradient descent on riemannian manifolds,

    S. Bonnabel, “Stochastic gradient descent on riemannian manifolds,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2217–2229, 2013

  32. [32]

    Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,

    S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton, “Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,” IEEE Transactions on Information Theory , vol. 63, no. 4, pp. 2153–2170, 2017

  33. [33]

    Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,

    R. Yuki, Y . Ike, and K. Yamanishi, “Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,” Knowledge and Information Systems , vol. 65, no. 12, pp. 5601–5634, 2023

  34. [34]

    The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,

    K. Yamanishi, T. Wu, S. Sugawara, and M. Okada, “The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,”Data Mining and Knowledge Discovery, vol. 33, no. 4, pp. 1017–1058, 2019

  35. [35]

    A wrapped normal distribution on hyperbolic space for gradient-based learning,

    Y . Nagano, S. Yamaguchi, Y . Fujita, and M. Koyama, “A wrapped normal distribution on hyperbolic space for gradient-based learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 4693–4702

  36. [36]

    Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,

    S. Said, H. Hajri, L. Bombrun, and B. C. Vemuri, “Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,”IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 752–772, 2017

  37. [37]

    J. G. Ratcliffe, Foundations of Hyperbolic Manifolds . Springer, 2006. APPENDIX A ASYMPTOTIC APPROXIMATION FORMULA FOR PC ON RIEMANNIAN MANIFOLDS We provide a proof of the asymptotic approximation of the PC on a Riemannian manifold, which is derived in this study and presented as Theorem 4. Proof. In Theorem 3, the parameter is denoted by θ, while in Theo...

  38. [38]

    The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact

    (69) Proof. The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact. Therefore, the matrix norm attains its maximum value: ∂ϕ ∂ψ (θ) < C ϕ,ψ. (70) Then, the Fisher information matrix under the transformed coordinate system ψ can be written as: Iψ(xn, θ) = ∂ϕ ∂ψ (θ) ⊤ Iϕ(xn, θ) ∂ϕ...