Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces

Atsushi Suzuki; Kenji Yamanishi; Kota Fukuzawa

arxiv: 2508.21466 · v2 · submitted 2025-08-29 · 💻 cs.LG · cs.IT· math.IT

Normalized Maximum Likelihood Code-Length on Riemannian Data Spaces

Kota Fukuzawa , Atsushi Suzuki , Kenji Yamanishi This is my paper

Pith reviewed 2026-05-18 20:49 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords Normalized Maximum LikelihoodRiemannian ManifoldsHyperbolic SpaceCoordinate InvarianceModel SelectionVolume FormSymmetric SpacesRegret Minimization

0 comments

The pith

A coordinate-invariant normalized maximum likelihood can be defined on Riemannian manifolds by using the volume form induced by the metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a new version of normalized maximum likelihood, called Rm-NML, that respects the geometry of any Riemannian manifold instead of being tied to Euclidean coordinates. The central move is to replace the ordinary Lebesgue measure in the normalizing integral with the Riemannian volume element, which automatically makes the resulting code length unchanged when the coordinate system is altered. This new quantity reduces exactly to the classical NML when the manifold is flat Euclidean space equipped with its standard coordinates. The authors also extend existing approximation methods and obtain simplified expressions on symmetric spaces, with an explicit formula worked out for normal distributions on hyperbolic space.

Core claim

We define the Riemannian manifold NML (Rm-NML) by replacing the Lebesgue measure in the conventional NML normalizing constant with the Riemannian volume form. This ensures invariance under coordinate transformations and agreement with standard NML in Euclidean space. We extend computational techniques and derive simplifications for Riemannian symmetric spaces, including explicit computation for normal distributions on hyperbolic spaces.

What carries the argument

Rm-NML, formed by integrating the maximum likelihood density against the Riemannian volume measure rather than Lebesgue measure, which carries the invariance property.

If this is right

Model selection and regret minimization can now be performed directly on manifold-valued data without coordinate artifacts.
On hyperbolic spaces the method supplies a concrete code length for hierarchical graph models.
Existing NML approximation algorithms extend to the manifold setting once the volume form is substituted.
The same construction applies to any Riemannian symmetric space arising in data analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach suggests trying Rm-NML on sphere-valued or Grassmannian data to see whether it improves compression over Euclidean approximations.
One could compare regret achieved by Rm-NML versus Euclidean NML when data are known to lie on a curved manifold.
Closed-form expressions might be derived for other common distributions on hyperbolic space beyond Gaussians.

Load-bearing premise

A Riemannian metric and its associated volume measure must exist on the data space so the normalizing integral can be written in a coordinate-free manner.

What would settle it

Evaluate the Rm-NML value for the same distribution on the sphere or hyperbolic plane in two different coordinate systems and check whether the numerical results match exactly.

read the original abstract

In recent years, with the large-scale expansion of graph data, there has been an increased focus on Riemannian manifold data spaces other than Euclidean space. In particular, the development of hyperbolic spaces has been remarkable, and they have high expressive power for graph data with hierarchical structures. Normalized Maximum Likelihood (NML) is employed in regret minimization and model selection. However, existing formulations of NML have been developed primarily in Euclidean spaces and are inherently dependent on the choice of coordinate systems, making it non-trivial to extend NML to Riemannian manifolds. In this study, we define a new NML that reflects the geometric structure of Riemannian manifolds, called the Riemannian manifold NML (Rm-NML). This Rm-NML is invariant under coordinate transformations and coincides with the conventional NML under the natural parameterization in Euclidean space. We extend existing computational techniques for NML to the setting of Riemannian manifolds. Furthermore, we derive a method to simplify the computation of Rm-NML on Riemannian symmetric spaces, which encompass data spaces of growing interest such as hyperbolic spaces. To illustrate the practical application of our proposed method, we explicitly computed the Rm-NML for normal distributions on hyperbolic spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper defines Rm-NML by swapping the Riemannian volume form into the normalizing integral, which automatically gives coordinate invariance and reduces to ordinary NML on Euclidean space.

read the letter

The main takeaway is that the authors replace the Lebesgue measure with the Riemannian volume form when they normalize the maximum-likelihood density. That single change makes the code length invariant under reparameterization, and it matches the usual NML exactly when the manifold is Euclidean with its standard metric. They then adapt existing NML approximation techniques to this setting and simplify the integral on symmetric spaces, which lets them compute an explicit expression for the normal distribution on hyperbolic space. That last calculation is the most concrete part of the work. The construction is direct and the invariance property follows at once from the intrinsic nature of the volume form, so the central claim is on solid ground. The reduction to the Euclidean case is also immediate by construction. No obvious circularity appears in how the new quantity is introduced. One practical question is how often the normalizing integral can be evaluated in closed form or with stable numerics beyond the hyperbolic-normal case they treat. If the paper only shows the analytic result and does not include comparisons on actual graph data or checks on convergence for other manifolds, the method stays mostly theoretical for now. Still, the math itself looks clean and the extension is a natural one. This is aimed at people who already use hyperbolic or other manifold embeddings for hierarchical data and want an information-theoretic model-selection tool that respects the geometry. A reader comfortable with both differential geometry and normalized maximum likelihood will pick it up quickly. I would send the paper to peer review. The idea is modest in scope but fills a clear gap, and the derivations are straightforward enough that referees can check them without unusual effort.

Referee Report

2 major / 2 minor

Summary. The paper defines a Riemannian manifold version of Normalized Maximum Likelihood (Rm-NML) by replacing the Euclidean Lebesgue measure in the normalizing integral with the Riemannian volume form induced by a chosen metric on the data manifold. The resulting quantity is claimed to be invariant under coordinate reparameterizations, to reduce exactly to ordinary NML on Euclidean space with the standard metric, and to admit simplified evaluation on Riemannian symmetric spaces. An explicit closed-form or computable expression is derived for the normal distribution on hyperbolic space.

Significance. If the finiteness of the new normalizing integral and the invariance property are rigorously established, the construction supplies a geometrically intrinsic model-selection criterion for data lying on manifolds that arise in graph embedding and hierarchical modeling. The reduction to the Euclidean case and the explicit hyperbolic-normal example are direct consequences of standard differential-geometric identities and therefore constitute a clean extension of existing NML techniques.

major comments (2)

[§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.
[§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.

minor comments (2)

[Notation] The notation for the Riemannian metric g and its induced volume form dV_g is introduced without a short reminder of the coordinate-change rule for the volume element; a one-sentence recall would improve readability for readers outside differential geometry.
[Introduction] A reference to Rissanen’s original NML papers or to the standard Euclidean regret bounds would help situate the new definition relative to the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and for highlighting the importance of rigorously establishing the well-definedness of Rm-NML. We address each major comment below and will incorporate the necessary additions in the revised manuscript.

read point-by-point responses

Referee: [§3] §3, Definition of Rm-NML: the normalizing integral is asserted to be finite once the Riemannian volume form is substituted, yet no explicit integrability argument or decay estimate is supplied for the non-compact hyperbolic case; without such a bound the central claim that Rm-NML is well-defined remains unsupported.

Authors: We agree that an explicit proof of finiteness is essential for the non-compact case. In the revised manuscript, we will include a dedicated subsection providing a decay estimate for the integrand. Specifically, we will show that the maximum likelihood density for the Riemannian normal distribution on hyperbolic space decays exponentially with the hyperbolic distance, which, when integrated against the exponentially growing volume element, yields a convergent integral. This argument relies on standard estimates for hyperbolic geometry and ensures Rm-NML is well-defined. revision: yes
Referee: [§5] §5, hyperbolic-normal example: the derivation of the explicit Rm-NML expression relies on the volume element of the hyperboloid model, but the manuscript does not verify that the resulting integral converges or that the maximum-likelihood density is properly normalized with respect to this measure.

Authors: The maximum-likelihood density is normalized by construction with respect to the Riemannian volume measure, as it is the density of the probability distribution on the manifold. For the Rm-NML integral, we acknowledge the lack of explicit verification in the original text. We will add a verification step in the revised version, either by direct computation for the specific parameters or by providing bounds that confirm convergence using the explicit volume form in the hyperboloid model. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in Rm-NML definition

full rationale

The central construction replaces the Lebesgue measure with the Riemannian volume form inside the normalizing integral of the maximum-likelihood density. This is a direct, coordinate-free redefinition that inherits invariance from the intrinsic properties of the volume element and reduces to ordinary NML on Euclidean space by specialization of the metric; neither step relies on fitted parameters renamed as predictions nor on self-citations whose content is presupposed. Extensions to symmetric spaces and the hyperbolic-normal example invoke only standard differential-geometric identities that are independent of the present paper's results. The derivation chain is therefore self-contained against external geometric facts and introduces no load-bearing circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a Riemannian volume measure that replaces the Euclidean Lebesgue measure and on the technical assumption that the resulting integral can be normalized and computed on symmetric spaces.

axioms (1)

domain assumption Data space admits a Riemannian metric whose volume form can be used to define an invariant normalizing constant for the maximum-likelihood integral.
Invoked when the authors replace coordinate-dependent Lebesgue measure with the intrinsic volume element to achieve invariance.

pith-pipeline@v0.9.0 · 5736 in / 1289 out tokens · 56593 ms · 2026-05-18T20:49:01.786752+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Spherical graph embedding for item retrieval in recommendation system,

W. Zhu, Y . Xu, X. Huang, Q. Min, and X. Zhou, “Spherical graph embedding for item retrieval in recommendation system,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4752–4756

work page 2022
[2]

Poincar ´e embeddings for learning hierarchical representations,

M. Nickel and D. Kiela, “Poincar ´e embeddings for learning hierarchical representations,” Advances in Neural Information Processing Systems , vol. 30, 2017

work page 2017
[3]

Hyperbolic geometry of complex networks,

D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Bo- gun´a, “Hyperbolic geometry of complex networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 82, no. 3, p. 036106, 2010

work page 2010
[4]

Learning mixed-curvature repre- sentations in product spaces,

A. Gu, F. Sala, B. Gunel, and C. R ´e, “Learning mixed-curvature repre- sentations in product spaces,” in International Conference on Learning Representations, 2018

work page 2018
[5]

Universal sequential coding of single messages,

Y . M. Shtar’kov, “Universal sequential coding of single messages,” Problemy Peredachi Informatsii, vol. 23, no. 3, pp. 3–17, 1987

work page 1987
[6]

Yamanishi, Learning with the Minimum Description Length Princi- ple

K. Yamanishi, Learning with the Minimum Description Length Princi- ple. Springer, 2023

work page 2023
[7]

Fisher information and stochastic complexity,

J. J. Rissanen, “Fisher information and stochastic complexity,” IEEE transactions on Information Theory , vol. 42, no. 1, pp. 40–47, 2002

work page 2002
[8]

Mdl denoising,

J. Rissanen, “Mdl denoising,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2537–2543, 2002

work page 2002
[9]

Exact calculation of normalized max- imum likelihood code length using fourier analysis,

A. Suzuki and K. Yamanishi, “Exact calculation of normalized max- imum likelihood code length using fourier analysis,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1211–1215

work page 2018
[10]

Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,

——, “Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,” IEEE Transac- tions on Information Theory , vol. 67, no. 9, pp. 6164–6178, 2021

work page 2021
[11]

A randomized approximation of the mdl for stochastic models with hidden variables,

K. Yamanishi, “A randomized approximation of the mdl for stochastic models with hidden variables,” in Proceedings of the Ninth Annual Conference on Computational Learning Theory , 1996, pp. 99–109

work page 1996
[12]

Monte carlo estimation of minimax regret with an application to mdl model selection,

T. Roos, “Monte carlo estimation of minimax regret with an application to mdl model selection,” in 2008 IEEE Information Theory Workshop . IEEE, 2008, pp. 284–288

work page 2008
[13]

Rissanen, Optimal Estimation of Parameters

J. Rissanen, Optimal Estimation of Parameters. Cambridge University Press, 2012

work page 2012
[14]

Amari and H

S.-i. Amari and H. Nagaoka, Methods of Information Geometry. Amer- ican Mathematical Soc., 2000, vol. 191

work page 2000
[15]

Foundation of calculating normalized maximum likelihood for continuous probability models,

A. Suzuki, K. Fukuzawa, and K. Yamanishi, “Foundation of calculating normalized maximum likelihood for continuous probability models,”

work page
[16]

Available: https://arxiv.org/abs/2409.08387

[Online]. Available: https://arxiv.org/abs/2409.08387

work page arXiv
[17]

Modeling by shortest data description,

J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978

work page 1978
[18]

World scientific, 1998, vol

——, Stochastic Complexity in Statistical Inquiry . World scientific, 1998, vol. 15

work page 1998
[19]

Minimum complexity density estima- tion,

A. R. Barron and T. M. Cover, “Minimum complexity density estima- tion,” IEEE Transactions on Information Theory , vol. 37, no. 4, pp. 1034–1054, 2002

work page 2002
[20]

A learning criterion for stochastic rules,

K. Yamanishi, “A learning criterion for stochastic rules,” Machine Learning, vol. 9, no. 2, pp. 165–203, 1992

work page 1992
[21]

P. D. Gr ¨unwald, The Minimum Description Length Principle . MIT press, 2007

work page 2007
[22]

An mdl framework for data clustering,

P. Kontkanen, P. Myllym ¨aki, W. Buntine, J. Rissanen, and H. Tirri, “An mdl framework for data clustering,” Advances in Minimum Description Length: Theory and Applications , pp. 323–354, 2005

work page 2005
[23]

Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,

S. Hirai and K. Yamanishi, “Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,”IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7718–7727, 2013

work page 2013
[24]

Principal geodesic analysis for the study of nonlinear statistics of shape,

P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging , vol. 23, no. 8, pp. 995–1005, 2004

work page 2004
[25]

I. L. Dryden and K. V . Mardia, Statistical Shape Analysis: with Appli- cations in R . John Wiley & Sons, 2016

work page 2016
[26]

Reparameterizing distributions on lie groups,

L. Falorsi, P. De Haan, T. R. Davidson, and P. Forr ´e, “Reparameterizing distributions on lie groups,” in The 22nd International Conference on Artificial Intelligence and Statistics . PMLR, 2019, pp. 3244–3253

work page 2019
[27]

Toruse: Knowledge graph embedding on a lie group,

T. Ebisu and R. Ichise, “Toruse: Knowledge graph embedding on a lie group,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018

work page 2018
[28]

M¨obiuse: Knowledge graph embedding on m ¨obius ring,

Y . Chen, J. Liu, Z. Zhang, S. Wen, and W. Xiong, “M¨obiuse: Knowledge graph embedding on m ¨obius ring,” Knowledge-Based Systems, vol. 227, p. 107181, 2021

work page 2021
[29]

Hyperbolic neural networks,

O. Ganea, G. B ´ecigneul, and T. Hofmann, “Hyperbolic neural networks,” Advances in Neural Information Processing Systems , vol. 31, 2018

work page 2018
[30]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008
[31]

Stochastic gradient descent on riemannian manifolds,

S. Bonnabel, “Stochastic gradient descent on riemannian manifolds,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2217–2229, 2013

work page 2013
[32]

Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,

S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton, “Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,” IEEE Transactions on Information Theory , vol. 63, no. 4, pp. 2153–2170, 2017

work page 2017
[33]

Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,

R. Yuki, Y . Ike, and K. Yamanishi, “Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,” Knowledge and Information Systems , vol. 65, no. 12, pp. 5601–5634, 2023

work page 2023
[34]

The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,

K. Yamanishi, T. Wu, S. Sugawara, and M. Okada, “The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,”Data Mining and Knowledge Discovery, vol. 33, no. 4, pp. 1017–1058, 2019

work page 2019
[35]

A wrapped normal distribution on hyperbolic space for gradient-based learning,

Y . Nagano, S. Yamaguchi, Y . Fujita, and M. Koyama, “A wrapped normal distribution on hyperbolic space for gradient-based learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 4693–4702

work page 2019
[36]

Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,

S. Said, H. Hajri, L. Bombrun, and B. C. Vemuri, “Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,”IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 752–772, 2017

work page 2017
[37]

J. G. Ratcliffe, Foundations of Hyperbolic Manifolds . Springer, 2006. APPENDIX A ASYMPTOTIC APPROXIMATION FORMULA FOR PC ON RIEMANNIAN MANIFOLDS We provide a proof of the asymptotic approximation of the PC on a Riemannian manifold, which is derived in this study and presented as Theorem 4. Proof. In Theorem 3, the parameter is denoted by θ, while in Theo...

work page 2006
[38]

The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact

(69) Proof. The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact. Therefore, the matrix norm attains its maximum value: ∂ϕ ∂ψ (θ) < C ϕ,ψ. (70) Then, the Fisher information matrix under the transformed coordinate system ψ can be written as: Iψ(xn, θ) = ∂ϕ ∂ψ (θ) ⊤ Iϕ(xn, θ) ∂ϕ...

work page

[1] [1]

Spherical graph embedding for item retrieval in recommendation system,

W. Zhu, Y . Xu, X. Huang, Q. Min, and X. Zhou, “Spherical graph embedding for item retrieval in recommendation system,” inProceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4752–4756

work page 2022

[2] [2]

Poincar ´e embeddings for learning hierarchical representations,

M. Nickel and D. Kiela, “Poincar ´e embeddings for learning hierarchical representations,” Advances in Neural Information Processing Systems , vol. 30, 2017

work page 2017

[3] [3]

Hyperbolic geometry of complex networks,

D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Bo- gun´a, “Hyperbolic geometry of complex networks,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics , vol. 82, no. 3, p. 036106, 2010

work page 2010

[4] [4]

Learning mixed-curvature repre- sentations in product spaces,

A. Gu, F. Sala, B. Gunel, and C. R ´e, “Learning mixed-curvature repre- sentations in product spaces,” in International Conference on Learning Representations, 2018

work page 2018

[5] [5]

Universal sequential coding of single messages,

Y . M. Shtar’kov, “Universal sequential coding of single messages,” Problemy Peredachi Informatsii, vol. 23, no. 3, pp. 3–17, 1987

work page 1987

[6] [6]

Yamanishi, Learning with the Minimum Description Length Princi- ple

K. Yamanishi, Learning with the Minimum Description Length Princi- ple. Springer, 2023

work page 2023

[7] [7]

Fisher information and stochastic complexity,

J. J. Rissanen, “Fisher information and stochastic complexity,” IEEE transactions on Information Theory , vol. 42, no. 1, pp. 40–47, 2002

work page 2002

[8] [8]

Mdl denoising,

J. Rissanen, “Mdl denoising,” IEEE Transactions on Information Theory, vol. 46, no. 7, pp. 2537–2543, 2002

work page 2002

[9] [9]

Exact calculation of normalized max- imum likelihood code length using fourier analysis,

A. Suzuki and K. Yamanishi, “Exact calculation of normalized max- imum likelihood code length using fourier analysis,” in 2018 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2018, pp. 1211–1215

work page 2018

[10] [10]

Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,

——, “Fourier-analysis-based form of normalized maximum likelihood: Exact formula and relation to complex bayesian prior,” IEEE Transac- tions on Information Theory , vol. 67, no. 9, pp. 6164–6178, 2021

work page 2021

[11] [11]

A randomized approximation of the mdl for stochastic models with hidden variables,

K. Yamanishi, “A randomized approximation of the mdl for stochastic models with hidden variables,” in Proceedings of the Ninth Annual Conference on Computational Learning Theory , 1996, pp. 99–109

work page 1996

[12] [12]

Monte carlo estimation of minimax regret with an application to mdl model selection,

T. Roos, “Monte carlo estimation of minimax regret with an application to mdl model selection,” in 2008 IEEE Information Theory Workshop . IEEE, 2008, pp. 284–288

work page 2008

[13] [13]

Rissanen, Optimal Estimation of Parameters

J. Rissanen, Optimal Estimation of Parameters. Cambridge University Press, 2012

work page 2012

[14] [14]

Amari and H

S.-i. Amari and H. Nagaoka, Methods of Information Geometry. Amer- ican Mathematical Soc., 2000, vol. 191

work page 2000

[15] [15]

Foundation of calculating normalized maximum likelihood for continuous probability models,

A. Suzuki, K. Fukuzawa, and K. Yamanishi, “Foundation of calculating normalized maximum likelihood for continuous probability models,”

work page

[16] [16]

Available: https://arxiv.org/abs/2409.08387

[Online]. Available: https://arxiv.org/abs/2409.08387

work page arXiv

[17] [17]

Modeling by shortest data description,

J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978

work page 1978

[18] [18]

World scientific, 1998, vol

——, Stochastic Complexity in Statistical Inquiry . World scientific, 1998, vol. 15

work page 1998

[19] [19]

Minimum complexity density estima- tion,

A. R. Barron and T. M. Cover, “Minimum complexity density estima- tion,” IEEE Transactions on Information Theory , vol. 37, no. 4, pp. 1034–1054, 2002

work page 2002

[20] [20]

A learning criterion for stochastic rules,

K. Yamanishi, “A learning criterion for stochastic rules,” Machine Learning, vol. 9, no. 2, pp. 165–203, 1992

work page 1992

[21] [21]

P. D. Gr ¨unwald, The Minimum Description Length Principle . MIT press, 2007

work page 2007

[22] [22]

An mdl framework for data clustering,

P. Kontkanen, P. Myllym ¨aki, W. Buntine, J. Rissanen, and H. Tirri, “An mdl framework for data clustering,” Advances in Minimum Description Length: Theory and Applications , pp. 323–354, 2005

work page 2005

[23] [23]

Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,

S. Hirai and K. Yamanishi, “Efficient computation of normalized maxi- mum likelihood codes for gaussian mixture models with its applications to clustering,”IEEE Transactions on Information Theory, vol. 59, no. 11, pp. 7718–7727, 2013

work page 2013

[24] [24]

Principal geodesic analysis for the study of nonlinear statistics of shape,

P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi, “Principal geodesic analysis for the study of nonlinear statistics of shape,”IEEE Transactions on Medical Imaging , vol. 23, no. 8, pp. 995–1005, 2004

work page 2004

[25] [25]

I. L. Dryden and K. V . Mardia, Statistical Shape Analysis: with Appli- cations in R . John Wiley & Sons, 2016

work page 2016

[26] [26]

Reparameterizing distributions on lie groups,

L. Falorsi, P. De Haan, T. R. Davidson, and P. Forr ´e, “Reparameterizing distributions on lie groups,” in The 22nd International Conference on Artificial Intelligence and Statistics . PMLR, 2019, pp. 3244–3253

work page 2019

[27] [27]

Toruse: Knowledge graph embedding on a lie group,

T. Ebisu and R. Ichise, “Toruse: Knowledge graph embedding on a lie group,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018

work page 2018

[28] [28]

M¨obiuse: Knowledge graph embedding on m ¨obius ring,

Y . Chen, J. Liu, Z. Zhang, S. Wen, and W. Xiong, “M¨obiuse: Knowledge graph embedding on m ¨obius ring,” Knowledge-Based Systems, vol. 227, p. 107181, 2021

work page 2021

[29] [29]

Hyperbolic neural networks,

O. Ganea, G. B ´ecigneul, and T. Hofmann, “Hyperbolic neural networks,” Advances in Neural Information Processing Systems , vol. 31, 2018

work page 2018

[30] [30]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

work page 2008

[31] [31]

Stochastic gradient descent on riemannian manifolds,

S. Bonnabel, “Stochastic gradient descent on riemannian manifolds,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2217–2229, 2013

work page 2013

[32] [32]

Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,

S. Said, L. Bombrun, Y . Berthoumieu, and J. H. Manton, “Riemannian gaussian distributions on the space of symmetric positive definite ma- trices,” IEEE Transactions on Information Theory , vol. 63, no. 4, pp. 2153–2170, 2017

work page 2017

[33] [33]

Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,

R. Yuki, Y . Ike, and K. Yamanishi, “Dimensionality selection for hyper- bolic embeddings using decomposed normalized maximum likelihood code-length,” Knowledge and Information Systems , vol. 65, no. 12, pp. 5601–5634, 2023

work page 2023

[34] [34]

The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,

K. Yamanishi, T. Wu, S. Sugawara, and M. Okada, “The decomposed normalized maximum likelihood code-length criterion for selecting hier- archical latent variable models,”Data Mining and Knowledge Discovery, vol. 33, no. 4, pp. 1017–1058, 2019

work page 2019

[35] [35]

A wrapped normal distribution on hyperbolic space for gradient-based learning,

Y . Nagano, S. Yamaguchi, Y . Fujita, and M. Koyama, “A wrapped normal distribution on hyperbolic space for gradient-based learning,” in International Conference on Machine Learning . PMLR, 2019, pp. 4693–4702

work page 2019

[36] [36]

Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,

S. Said, H. Hajri, L. Bombrun, and B. C. Vemuri, “Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices,”IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 752–772, 2017

work page 2017

[37] [37]

J. G. Ratcliffe, Foundations of Hyperbolic Manifolds . Springer, 2006. APPENDIX A ASYMPTOTIC APPROXIMATION FORMULA FOR PC ON RIEMANNIAN MANIFOLDS We provide a proof of the asymptotic approximation of the PC on a Riemannian manifold, which is derived in this study and presented as Theorem 4. Proof. In Theorem 3, the parameter is denoted by θ, while in Theo...

work page 2006

[38] [38]

The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact

(69) Proof. The Jacobian matrix ∂ϕ ∂ψ (θ) is a matrix-valued function that is continuous with respect to θ, and the parameter space Θ is compact. Therefore, the matrix norm attains its maximum value: ∂ϕ ∂ψ (θ) < C ϕ,ψ. (70) Then, the Fisher information matrix under the transformed coordinate system ψ can be written as: Iψ(xn, θ) = ∂ϕ ∂ψ (θ) ⊤ Iϕ(xn, θ) ∂ϕ...

work page