Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces
Pith reviewed 2026-05-18 20:49 UTC · model grok-4.3
The pith
A coordinate-invariant normalized maximum likelihood can be defined on Riemannian manifolds by using the volume form induced by the metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define the Riemannian manifold NML (Rm-NML) by replacing the Lebesgue measure in the conventional NML normalizing constant with the Riemannian volume form. This ensures invariance under coordinate transformations and agreement with standard NML in Euclidean space. We extend computational techniques and derive simplifications for Riemannian symmetric spaces, including explicit computation for normal distributions on hyperbolic spaces.
What carries the argument
Rm-NML, formed by integrating the maximum likelihood density against the Riemannian volume measure rather than Lebesgue measure, which carries the invariance property.
If this is right
- Model selection and regret minimization can now be performed directly on manifold-valued data without coordinate artifacts.
- On hyperbolic spaces the method supplies a concrete code length for hierarchical graph models.
- Existing NML approximation algorithms extend to the manifold setting once the volume form is substituted.
- The same construction applies to any Riemannian symmetric space arising in data analysis.
Where Pith is reading between the lines
- The approach suggests trying Rm-NML on sphere-valued or Grassmannian data to see whether it improves compression over Euclidean approximations.
- One could compare regret achieved by Rm-NML versus Euclidean NML when data are known to lie on a curved manifold.
- Closed-form expressions might be derived for other common distributions on hyperbolic space beyond Gaussians.
Load-bearing premise
A Riemannian metric and its associated volume measure must exist on the data space so the normalizing integral can be written in a coordinate-free manner.
What would settle it
Evaluate the Rm-NML value for the same distribution on the sphere or hyperbolic plane in two different coordinate systems and check whether the numerical results match exactly.
read the original abstract
In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and are inherently dependent on the choice of coordinate systems, making it non-trivial to extend NML to Riemannian manifolds. In this study, we define a new NML that reflects the geometric structure of Riemannian manifolds, called the Riemannian manifold NML (Rm-NML). This Rm-NML is invariant under coordinate transformations and coincides with the conventional NML under the natural parameterization in Euclidean space. We extend existing computational techniques for NML to the setting of Riemannian manifolds. Furthermore, we derive a method to simplify the computation of Rm-NML on Riemannian symmetric spaces, which encompass data spaces of growing interest such as hyperbolic spaces. To illustrate the practical application of our proposed method, we explicitly computed the Rm-NML for normal distributions on hyperbolic spaces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines a Riemannian manifold version of Normalized Maximum Likelihood (Rm-NML) by replacing the Euclidean Lebesgue measure in the normalizing integral with the Riemannian volume form induced by a chosen metric on the data manifold. The resulting quantity is claimed to be invariant under coordinate reparameterizations, to reduce exactly to ordinary NML on Euclidean space with the standard metric, and to admit simplified evaluation on Riemannian symmetric spaces. An explicit closed-form or computable expression is derived for the normal distribution on hyperbolic space.
Significance. If the finiteness of the new normalizing integral and the invariance property are rigorously established, the construction supplies a geometrically intrinsic model-selection criterion for data lying on manifolds that arise in graph embedding and hierarchical modeling. The reduction to the Euclidean case and the explicit hyperbolic-normal example are direct consequences of standard differential-geometric identities and therefore constitute a clean extension of existing NML techniques.
major comments (2)
- [§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.
- [§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.
minor comments (2)
- [Notation] The notation for the Riemannian metric g and its induced volume form dV_g is introduced without a short reminder of the coordinate-change rule for the volume element; a one-sentence recall would improve readability for readers outside differential geometry.
- [Introduction] A reference to Rissanen’s original NML papers or to the standard Euclidean regret bounds would help situate the new definition relative to the literature.
Simulated Author's Rebuttal
We are grateful to the referee for the thorough review and for highlighting the importance of rigorously establishing the well-definedness of Rm-NML. We address each major comment below and will incorporate the necessary additions in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.
Authors: We agree that an explicit proof of finiteness is essential for the non-compact case. In the revised manuscript, we will include a dedicated subsection providing a decay estimate for the integrand. Specifically, we will show that the maximum likelihood density for the Riemannian normal distribution on hyperbolic space decays exponentially with the hyperbolic distance, which, when integrated against the exponentially growing volume element, yields a convergent integral. This argument relies on standard estimates for hyperbolic geometry and ensures Rm-NML is well-defined. revision: yes
-
Referee: [§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.
Authors: The maximum-likelihood density is normalized by construction with respect to the Riemannian volume measure, as it is the density of the probability distribution on the manifold. For the Rm-NML integral, we acknowledge the lack of explicit verification in the original text. We will add a verification step in the revised version, either by direct computation for the specific parameters or by providing bounds that confirm convergence using the explicit volume form in the hyperboloid model. revision: yes
Circularity Check
No significant circularity detected in Rm-NML definition
full rationale
The central construction replaces the Lebesgue measure with the Riemannian volume form inside the normalizing integral of the maximum-likelihood density. This is a direct, coordinate-free redefinition that inherits invariance from the intrinsic properties of the volume element and reduces to ordinary NML on Euclidean space by specialization of the metric; neither step relies on fitted parameters renamed as predictions nor on self-citations whose content is presupposed. Extensions to symmetric spaces and the hyperbolic-normal example invoke only standard differential-geometric identities that are independent of the present paper's results. The derivation chain is therefore self-contained against external geometric facts and introduces no load-bearing circular reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data space admits a Riemannian metric whose volume form can be used to define an invariant normalizing constant for the maximum-likelihood integral.
Reference graph
Works this paper leans on
-
[1]
Spherical graph embedding for item retrieval in recommendation system,
W. Zhu, Y . Xu, X. Huang, Q. Min, and X. Zhou, “Spherical graph embedding for item retrieval in recommendation system,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4752–4756
work page 2022
-
[2]
Poincar ´e embeddings for learning hierarchical representations,
M. Nickel and D. Kiela, “Poincar ´e embeddings for learning hierarchical representations,” Advances in Neural Information Processing Systems , vol. 30, 2017
work page 2017
-
[3]
Hyperbolic geometry of complex networks,
D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Bo- gun´a, “Hyperbolic geometry of complex networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 82, no. 3, p. 036106, 2010
work page 2010
-
[4]
Learning mixed-curvature repre- sentations in product spaces,
A. Gu, F. Sala, B. Gunel, and C. R ´e, “Learning mixed-curvature repre- sentations in product spaces,” in International Conference on Learning Representations, 2018
work page 2018
-
[5]
Universal sequential coding of single messages,
Y . M. Shtar’kov, “Universal sequential coding of single messages,” Problemy Peredachi Informatsii, vol. 23, no. 3, pp. 3–17, 1987
work page 1987
-
[6]
Yamanishi, Learning with the Minimum Description Length Princi- ple
K. Yamanishi, Learning with the Minimum Description Length Princi- ple. Springer, 2023
work page 2023
-
[7]
Fisher information and stochastic complexity,
J. J. Rissanen, “Fisher information and stochastic complexity,” IEEE transactions on Information Theory , vol. 42, no. 1, pp. 40–47, 2002
work page 2002
-
[8]
J. Rissanen, “Mdl denoising,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2537–2543, 2002
work page 2002
-
[9]
Exact calculation of normalized max- imum likelihood code length using fourier analysis,
A. Suzuki and K. Yamanishi, “Exact calculation of normalized max- imum likelihood code length using fourier analysis,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1211–1215
work page 2018
-
[10]
——, “Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,” IEEE Transac- tions on Information Theory , vol. 67, no. 9, pp. 6164–6178, 2021
work page 2021
-
[11]
A randomized approximation of the mdl for stochastic models with hidden variables,
K. Yamanishi, “A randomized approximation of the mdl for stochastic models with hidden variables,” in Proceedings of the Ninth Annual Conference on Computational Learning Theory , 1996, pp. 99–109
work page 1996
-
[12]
Monte carlo estimation of minimax regret with an application to mdl model selection,
T. Roos, “Monte carlo estimation of minimax regret with an application to mdl model selection,” in 2008 IEEE Information Theory Workshop . IEEE, 2008, pp. 284–288
work page 2008
-
[13]
Rissanen, Optimal Estimation of Parameters
J. Rissanen, Optimal Estimation of Parameters. Cambridge University Press, 2012
work page 2012
-
[14]
S.-i. Amari and H. Nagaoka, Methods of Information Geometry. Amer- ican Mathematical Soc., 2000, vol. 191
work page 2000
-
[15]
Foundation of calculating normalized maximum likelihood for continuous probability models,
A. Suzuki, K. Fukuzawa, and K. Yamanishi, “Foundation of calculating normalized maximum likelihood for continuous probability models,”
-
[16]
Available: https://arxiv.org/abs/2409.08387
[Online]. Available: https://arxiv.org/abs/2409.08387
-
[17]
Modeling by shortest data description,
J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978
work page 1978
-
[18]
——, Stochastic Complexity in Statistical Inquiry . World scientific, 1998, vol. 15
work page 1998
-
[19]
Minimum complexity density estima- tion,
A. R. Barron and T. M. Cover, “Minimum complexity density estima- tion,” IEEE Transactions on Information Theory , vol. 37, no. 4, pp. 1034–1054, 2002
work page 2002
-
[20]
A learning criterion for stochastic rules,
K. Yamanishi, “A learning criterion for stochastic rules,” Machine Learning, vol. 9, no. 2, pp. 165–203, 1992
work page 1992
-
[21]
P. D. Gr ¨unwald, The Minimum Description Length Principle . MIT press, 2007
work page 2007
-
[22]
An mdl framework for data clustering,
P. Kontkanen, P. Myllym ¨aki, W. Buntine, J. Rissanen, and H. Tirri, “An mdl framework for data clustering,” Advances in Minimum Description Length: Theory and Applications , pp. 323–354, 2005
work page 2005
-
[23]
S. Hirai and K. Yamanishi, “Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,”IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7718–7727, 2013
work page 2013
-
[24]
Principal geodesic analysis for the study of nonlinear statistics of shape,
P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging , vol. 23, no. 8, pp. 995–1005, 2004
work page 2004
-
[25]
I. L. Dryden and K. V . Mardia, Statistical Shape Analysis: with Appli- cations in R . John Wiley & Sons, 2016
work page 2016
-
[26]
Reparameterizing distributions on lie groups,
L. Falorsi, P. De Haan, T. R. Davidson, and P. Forr ´e, “Reparameterizing distributions on lie groups,” in The 22nd International Conference on Artificial Intelligence and Statistics . PMLR, 2019, pp. 3244–3253
work page 2019
-
[27]
Toruse: Knowledge graph embedding on a lie group,
T. Ebisu and R. Ichise, “Toruse: Knowledge graph embedding on a lie group,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018
work page 2018
-
[28]
M¨obiuse: Knowledge graph embedding on m ¨obius ring,
Y . Chen, J. Liu, Z. Zhang, S. Wen, and W. Xiong, “M¨obiuse: Knowledge graph embedding on m ¨obius ring,” Knowledge-Based Systems, vol. 227, p. 107181, 2021
work page 2021
-
[29]
O. Ganea, G. B ´ecigneul, and T. Hofmann, “Hyperbolic neural networks,” Advances in Neural Information Processing Systems , vol. 31, 2018
work page 2018
- [30]
-
[31]
Stochastic gradient descent on riemannian manifolds,
S. Bonnabel, “Stochastic gradient descent on riemannian manifolds,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2217–2229, 2013
work page 2013
-
[32]
Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,
S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton, “Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,” IEEE Transactions on Information Theory , vol. 63, no. 4, pp. 2153–2170, 2017
work page 2017
-
[33]
R. Yuki, Y . Ike, and K. Yamanishi, “Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,” Knowledge and Information Systems , vol. 65, no. 12, pp. 5601–5634, 2023
work page 2023
-
[34]
K. Yamanishi, T. Wu, S. Sugawara, and M. Okada, “The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,”Data Mining and Knowledge Discovery, vol. 33, no. 4, pp. 1017–1058, 2019
work page 2019
-
[35]
A wrapped normal distribution on hyperbolic space for gradient-based learning,
Y . Nagano, S. Yamaguchi, Y . Fujita, and M. Koyama, “A wrapped normal distribution on hyperbolic space for gradient-based learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 4693–4702
work page 2019
-
[36]
S. Said, H. Hajri, L. Bombrun, and B. C. Vemuri, “Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,”IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 752–772, 2017
work page 2017
-
[37]
J. G. Ratcliffe, Foundations of Hyperbolic Manifolds . Springer, 2006. APPENDIX A ASYMPTOTIC APPROXIMATION FORMULA FOR PC ON RIEMANNIAN MANIFOLDS We provide a proof of the asymptotic approximation of the PC on a Riemannian manifold, which is derived in this study and presented as Theorem 4. Proof. In Theorem 3, the parameter is denoted by θ, while in Theo...
work page 2006
-
[38]
(69) Proof. The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact. Therefore, the matrix norm attains its maximum value: ∂ϕ ∂ψ (θ) < C ϕ,ψ. (70) Then, the Fisher information matrix under the transformed coordinate system ψ can be written as: Iψ(xn, θ) = ∂ϕ ∂ψ (θ) ⊤ Iϕ(xn, θ) ∂ϕ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.