Entropic Strict Minimum Message Length and Its Connections to PAC-Bayes and NML

Daniel F. Schmidt; Enes Makalic

arxiv: 2605.02099 · v2 · pith:6YRNMN2Enew · submitted 2026-05-03 · 🧮 math.ST · stat.TH

Entropic Strict Minimum Message Length and Its Connections to PAC-Bayes and NML

Enes Makalic , Daniel F. Schmidt This is my paper

Pith reviewed 2026-05-20 23:52 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords entropic SMMLminimum message lengthPAC-Bayesnormalized maximum likelihoodrisk-sensitive codingexponential familiesasymptotic analysisinformation theory

0 comments

The pith

Entropic SMML replaces expected codelength with an exponential certainty equivalent to create a tunable family of coding rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces entropic strict minimum message length as a risk-sensitive version of strict MML. It substitutes an exponential certainty equivalent for the usual expected two-part codelength under the prior predictive distribution. This substitution yields a one-parameter family of rules that moves continuously from Bayesian average-case coding to worst-case minimax coding. The construction recovers ordinary SMML at the risk-neutral end and the normalized maximum likelihood minimax-regret rule at the high-risk end. It also supplies a variational view that connects the criterion to PAC-Bayes and supplies joint asymptotics that locate the regime transitions on a logarithmic scale in sample size and risk parameter.

Core claim

Entropic SMML replaces the expected two-part codelength under the prior predictive distribution with an exponential certainty equivalent, thereby defining a one-parameter family of coding rules that interpolates between Bayesian average-case coding and worst-case minimax coding. Ordinary SMML is recovered in the risk-neutral limit, while the extreme risk-sensitive limit yields a minimax codelength criterion that coincides with the NML minimax-regret principle after centering by the oracle maximum-likelihood codelength. The criterion admits a variational characterization as a Kullback-Leibler-regularized worst-case expected codelength and, for regular exponential families, the fixed-codebook

What carries the argument

Entropic SMML criterion formed by replacing the expected two-part codelength with its exponential certainty equivalent under the prior predictive distribution.

If this is right

Ordinary SMML is recovered exactly when the risk parameter approaches the neutral limit.
The high-risk limit, after centering by the oracle MLE codelength, coincides with the NML minimax-regret principle.
A variational representation as KL-regularized worst-case expected codelength supplies a PAC-Bayes interpretation.
In regular parametric models the transition between Bayesian, robust and minimax regimes occurs on a logarithmic scale in n and the risk parameter.
For regular exponential families the fixed-codebook partition stays affine in sufficient-statistic space and the codepoints satisfy tilted moment-matching as tilted Bregman centroids.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The logarithmic scaling implies that moderate risk sensitivity produces distinct behavior from both Bayesian and minimax extremes even at moderately large sample sizes.
The PAC-Bayes link may be used to derive new generalization bounds that incorporate explicit risk sensitivity into model selection.
The tilted-Bregman-centroid view suggests possible extensions to other Bregman divergences or non-exponential-family models where the affine property fails.
Practical coding algorithms could tune the risk parameter to trade average-case efficiency against robustness on small or heterogeneous data sets.

Load-bearing premise

The joint asymptotic theory and the claims of affine partitions with tilted moment-matching assume regular parametric models and regular exponential families.

What would settle it

A concrete counter-example showing a non-affine codebook partition or a non-logarithmic transition between regimes inside a regular exponential family for large but finite n would falsify the asymptotic claims.

Figures

Figures reproduced from arXiv: 2605.02099 by Daniel F. Schmidt, Enes Makalic.

**Figure 1.** Figure 1: Binomial comparison of ordinary SMML, entropic SMML, and the worst-case codelength endpoint for view at source ↗

**Figure 1.** Figure 1: Binomial comparison of ordinary SMML, entropic SMML, and the worst-case codelength endpoint for [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗

read the original abstract

We introduce entropic strict minimum message length (SMML), a risk-sensitive generalization of strict minimum message length coding. The proposed criterion replaces expected two-part codelength under the prior predictive distribution with an exponential certainty equivalent, thereby defining a one-parameter family of coding rules that interpolates between Bayesian average-case coding and worst-case minimax coding. We show that ordinary SMML is recovered in the risk-neutral limit, while the extreme risk-sensitive limit yields a minimax codelength criterion; when centered by the oracle maximum likelihood codelength, this criterion coincides with the normalized maximum likelihood (NML) minimax-regret principle. We further prove that entropic SMML admits a variational characterization as a Kullback--Leibler-regularized worst-case expected codelength, giving it a PAC--Bayes-type interpretation. We establish a joint asymptotic theory linking the sample size $n$ and the risk parameter $\tau$, showing that in regular parametric models the transition between Bayesian, robust, and minimax coding regimes occurs on a logarithmic scale. For regular exponential families, the fixed-codebook partition remains affine in sufficient-statistic space, while the codepoints satisfy a tilted moment-matching condition and admit an interpretation as tilted Bregman centroids. These results position entropic SMML as an information-theoretic bridge between MML, PAC--Bayes, and MDL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Entropic SMML gives a clean risk-parameter bridge from ordinary MML to PAC-Bayes and NML, but the joint n-τ asymptotics need checking for uniform control under tilting.

read the letter

The punchline is that this paper defines entropic SMML as a one-parameter family using exponential certainty equivalents on codelengths, which interpolates between ordinary SMML and NML minimax, and gives it a PAC-Bayes variational form plus some joint asymptotics. What is new is the specific entropic formulation and the claim that it recovers NML when centered by the MLE codelength, along with the tilted moment-matching for exponential families. The synthesis of MML, PAC-Bayes, and MDL ideas through this risk parameter looks fresh. The paper does well in laying out the connections without forcing them. The variational characterization as KL-regularized worst-case expected codelength is a nice touch that makes the PAC-Bayes link natural. For exponential families, keeping the partition affine while tilting the centroids is a solid technical point. The soft spots are mostly around the asymptotics. The joint theory for n and τ assumes regularity for uniform approximations, but the stress-test raises a fair point that when τ scales with log n the tilting could amplify tails and break uniformity. Without seeing the full proofs and error bounds, it's hard to tell if the regime transitions hold uniformly or just pointwise. The abstract mentions proofs exist, but regularity conditions and edge cases aren't detailed here. This is for readers in statistical model selection and information theory who want to see bridges between these areas. Someone working on PAC-Bayes or MDL would get value from the variational view and the asymptotic scaling. It deserves a serious referee to check the derivations and see if the claims hold up under the stated assumptions. I recommend putting it through peer review rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper introduces entropic strict minimum message length (SMML) as a risk-sensitive generalization of strict MML coding. It replaces the expected two-part codelength under the prior predictive distribution with an exponential certainty equivalent, yielding a one-parameter family indexed by risk parameter τ that interpolates between Bayesian average-case coding and worst-case minimax coding. The work claims that the risk-neutral limit recovers ordinary SMML, the extreme risk-sensitive limit yields a minimax codelength that coincides with the normalized maximum likelihood (NML) principle when centered by the oracle MLE codelength, a variational characterization as KL-regularized worst-case expected codelength with PAC-Bayes interpretation, a joint asymptotic theory for regular parametric models showing regime transitions on a logarithmic scale in n and τ, and for regular exponential families an affine fixed-codebook partition in sufficient-statistic space together with tilted moment-matching codepoints interpretable as tilted Bregman centroids.

Significance. If the stated derivations and asymptotic results hold under the assumed regularity conditions (twice-differentiable densities, positive definite Fisher information), the paper supplies a tunable information-theoretic criterion that formally bridges MML, PAC-Bayes, and MDL. The explicit PAC-Bayes-type variational form, the NML coincidence, and the geometric characterizations for exponential families are potentially useful for robust coding and model selection; the joint (n,τ) asymptotics, if rigorously controlled, would clarify the transition between average-case and worst-case regimes.

major comments (2)

[§4] §4 (joint asymptotic theory): the claimed regime transitions on a logarithmic scale in n and τ rely on uniform Laplace-type approximations across the codebook. When τ scales as log n the exponential tilting amplifies tail contributions; without explicit uniform error bounds on the large-deviation rate function under the stated regularity, the interpolation between Bayesian and minimax regimes and the Bregman-centroid interpretation may hold only pointwise rather than uniformly.
[Abstract, §3] Abstract and the variational/NML sections: the claims that the variational form, the NML coincidence, and the asymptotic regimes are proven are central, yet the manuscript provides no explicit statement of the full regularity conditions or verification of edge cases (e.g., boundary behavior of the exponential family or non-compact parameter spaces). This leaves the support for the central claims at the level of plausibility until the derivations are inspected.

minor comments (2)

Define the exponential certainty equivalent explicitly in the main text (rather than only in the abstract) so that readers can follow the transition from expected codelength to the risk-sensitive objective without external references.
Add a short table or diagram illustrating the limiting regimes (τ→0, τ→∞) and the corresponding coding rules to improve readability of the one-parameter family.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. The comments raise important issues about the rigor of our asymptotic results and the clarity of our regularity assumptions. We address each point below and commit to revisions that will strengthen the paper.

read point-by-point responses

Referee: [§4] §4 (joint asymptotic theory): the claimed regime transitions on a logarithmic scale in n and τ rely on uniform Laplace-type approximations across the codebook. When τ scales as log n the exponential tilting amplifies tail contributions; without explicit uniform error bounds on the large-deviation rate function under the stated regularity, the interpolation between Bayesian and minimax regimes and the Bregman-centroid interpretation may hold only pointwise rather than uniformly.

Authors: We appreciate this observation regarding the need for uniform error bounds in the asymptotic analysis. The current manuscript uses standard Laplace approximations for the regime transitions, but we agree that explicit uniform bounds are required to rigorously justify the claims when τ is of order log n. In the revision, we will include a lemma providing uniform large-deviation bounds under the assumed regularity conditions (twice-differentiable log-densities and positive definite Fisher information). This will ensure the interpolation holds uniformly, and we will update the Bregman-centroid interpretation accordingly. We believe this addresses the concern without altering the main results. revision: yes
Referee: [Abstract, §3] Abstract and the variational/NML sections: the claims that the variational form, the NML coincidence, and the asymptotic regimes are proven are central, yet the manuscript provides no explicit statement of the full regularity conditions or verification of edge cases (e.g., boundary behavior of the exponential family or non-compact parameter spaces). This leaves the support for the central claims at the level of plausibility until the derivations are inspected.

Authors: The referee correctly identifies that the manuscript would benefit from an explicit enumeration of the regularity conditions supporting the variational characterization, NML coincidence, and asymptotic regimes. We will add a new section titled 'Regularity Conditions' that lists all assumptions, including compactness of the parameter space for the main results and interior-point assumptions for exponential families. Edge cases such as boundary behavior will be discussed with references to truncation techniques for non-compact spaces. This revision will make the proofs more transparent and verifiable, elevating the claims from plausible to fully supported. revision: yes

Circularity Check

0 steps flagged

No circularity: new definition yields derived properties under standard regularity

full rationale

The paper defines entropic SMML by replacing the expected two-part codelength with an exponential certainty equivalent, creating a parameterized family. It then derives limit cases (risk-neutral recovers ordinary SMML; risk-sensitive yields NML after centering), a variational KL-regularized form, and joint asymptotics for regular parametric models and exponential families (affine partitions, tilted moment-matching, Bregman centroids). These steps follow directly from the definition plus standard analytic techniques and regularity assumptions (twice-differentiable densities, positive definite Fisher information) without any reduction of a claimed result to a fitted input, self-citation load-bearing premise, or imported uniqueness theorem. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central addition is the risk parameter τ that controls sensitivity; the remaining structure rests on standard regularity assumptions for parametric models and exponential families that are invoked for the asymptotic results.

free parameters (1)

risk parameter τ
Single scalar that sets the degree of risk sensitivity in the exponential certainty equivalent and controls the transition between coding regimes.

axioms (1)

domain assumption Regularity conditions on the parametric model family
Invoked to obtain the joint asymptotic theory linking n and τ and to guarantee that the fixed-codebook partition remains affine in sufficient-statistic space.

pith-pipeline@v0.9.0 · 5774 in / 1366 out tokens · 64365 ms · 2026-05-20T23:52:29.965947+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel; costAlphaLog_high_calibrated_iff echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

In,τ(P,q,θ) = 1/τ log E_rn[exp(τ Λ_P,q,θ(Xn))]; codepoints satisfy tilted moment-matching n∇A(ν*) = Σ wj,τ(x;ν*) T(x) with wj,τ ∝ rn(x) pn(x|θ)−τ
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration; J_uniquely_calibrated_via_higher_derivative refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

entropic SMML codepoint is the m-projection of a τ-tilted distribution sj,τ ∝ rn(x) pn(x|θ*)−τ onto the model manifold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

Wallace and David M

Chris S. Wallace and David M. Boulton. An information measure for classification.Computer Journal, 11(2):185– 194, August 1968

work page 1968
[2]

Wallace and David M

Chris S. Wallace and David M. Boulton. An invariant Bayes method for point estimation.Classification Society Bulletin, 3(3):11–34, 1975

work page 1975
[3]

Wallace and Peter R

Chris S. Wallace and Peter R. Freeman. Estimation and inference by compact coding.Journal of the Royal Statistical Society (Series B), 49(3):240–252, 1987

work page 1987
[4]

Wallace.Statistical and inductive inference by minimum message length

Chris S. Wallace.Statistical and inductive inference by minimum message length. Information Science and Statistics. Springer, first edition, 2005

work page 2005
[5]

Modeling by shortest data description.Automatica, 14(5):465–471, September 1978

Jorma Rissanen. Modeling by shortest data description.Automatica, 14(5):465–471, September 1978

work page 1978
[6]

Universal coding, information, prediction, and estimation.IEEE Transactions on Information Theory, 30(4):629–636, July 1984

Jorma Rissanen. Universal coding, information, prediction, and estimation.IEEE Transactions on Information Theory, 30(4):629–636, July 1984

work page 1984
[7]

Fisher information and stochastic complexity.IEEE Transactions on Information Theory, 42(1):40–47, January 1996

Jorma Rissanen. Fisher information and stochastic complexity.IEEE Transactions on Information Theory, 42(1):40–47, January 1996

work page 1996
[8]

Strong optimality of the normalized ML models as universal codes and information in data

Jorma Rissanen. Strong optimality of the normalized ML models as universal codes and information in data. IEEE Transactions on Information Theory, 47(5):1712–1717, July 2001

work page 2001
[9]

Information Science and Statistics

Jorma Rissanen.Information and Complexity in Statistical Modeling. Information Science and Statistics. Springer, first edition, 2007

work page 2007
[10]

Grünwald.The Minimum Description Length Principle

Peter D. Grünwald.The Minimum Description Length Principle. Adaptive Communication and Machine Learning. The MIT Press, 2007

work page 2007
[11]

Minimum description length revisited.International Journal of Mathematics for Industry, 11(01), December 2019

Peter Grünwald and Teemu Roos. Minimum description length revisited.International Journal of Mathematics for Industry, 11(01), December 2019

work page 2019
[12]

M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain markov process expectations for large time, i.Communications on Pure and Applied Mathematics, 28(1):1–47, January 1975

work page 1975
[13]

Entropic risk measures: Coherence vs

Hans Föllmer and Thomas Knispel. Entropic risk measures: Coherence vs. convexity, model ambiguity and robust large deviations.Stochastics and Dynamics, 11(02n03):333–351, 2011

work page 2011
[14]

Kullback and R

S. Kullback and R. A. Leibler. On information and sufficiency.The Annals of Mathematical Statistics, 22(1):79–86, March 1951

work page 1951
[15]

Pac-Bayesian supervised classification: The thermodynamics of statistical learning.IMS Lecture Notes Monograph Series, 56:1–163, 2007

Olivier Catoni. Pac-Bayesian supervised classification: The thermodynamics of statistical learning.IMS Lecture Notes Monograph Series, 56:1–163, 2007

work page 2007
[16]

Enes Makalic and Daniel F. Schmidt. Information geometry and asymptotic theory for SMML estimators. arXiv:2604.05241, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[17]

Dhillon, and Joydeep Ghosh

Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6(58):1705–1749, 2005

work page 2005
[18]

Campbell

L.L. Campbell. A coding theorem and Rényi’s entropy.Information and Control, 8(4):423–429, August 1965

work page 1965
[19]

On measures of entropy and information

Alfréd Rényi. On measures of entropy and information. In Jerzy Neyman, editor,Berkeley Symp. on Math. Statist. and Prob., volume I, pages 547–561. University of California Press, 1961

work page 1961
[20]

J.-F. Bercher. Source coding with escort distributions and Rényi entropy bounds.Physics Letters A, 373(36):3235– 3238, 2009

work page 2009
[21]

I. Csiszar. Generalized cutoff rates and Rényi’s information measures.IEEE Transactions on Information Theory, 41(1):26–34, 1995

work page 1995
[22]

Berger.Statistical Decision Theory and Bayesian Analysis

James O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer New York, 1985

work page 1985
[23]

Y . M. Shtarkov. Universal sequential coding of single messages.Probl. Inform. Transm., 23(3):3–17, 1987

work page 1987
[24]

Normalized maximum likelihood with luckiness for multivariate normal distributions, 2017

Kohei Miyaguchi. Normalized maximum likelihood with luckiness for multivariate normal distributions, 2017

work page 2017
[25]

American Mathematical Society, 2000

Shun’ichi Amari and Hiroshi Nagaoka.Methods of Information Geometry, volume 191 ofTranslations of mathematical monographs. American Mathematical Society, 2000

work page 2000
[26]

James G. Dowty. SMML estimators for exponential families with continuous sufficient statistics. arXiv:1302.0581, 2013. 17

work page internal anchor Pith review Pith/arXiv arXiv 2013

[1] [1]

Wallace and David M

Chris S. Wallace and David M. Boulton. An information measure for classification.Computer Journal, 11(2):185– 194, August 1968

work page 1968

[2] [2]

Wallace and David M

Chris S. Wallace and David M. Boulton. An invariant Bayes method for point estimation.Classification Society Bulletin, 3(3):11–34, 1975

work page 1975

[3] [3]

Wallace and Peter R

Chris S. Wallace and Peter R. Freeman. Estimation and inference by compact coding.Journal of the Royal Statistical Society (Series B), 49(3):240–252, 1987

work page 1987

[4] [4]

Wallace.Statistical and inductive inference by minimum message length

Chris S. Wallace.Statistical and inductive inference by minimum message length. Information Science and Statistics. Springer, first edition, 2005

work page 2005

[5] [5]

Modeling by shortest data description.Automatica, 14(5):465–471, September 1978

Jorma Rissanen. Modeling by shortest data description.Automatica, 14(5):465–471, September 1978

work page 1978

[6] [6]

Universal coding, information, prediction, and estimation.IEEE Transactions on Information Theory, 30(4):629–636, July 1984

Jorma Rissanen. Universal coding, information, prediction, and estimation.IEEE Transactions on Information Theory, 30(4):629–636, July 1984

work page 1984

[7] [7]

Fisher information and stochastic complexity.IEEE Transactions on Information Theory, 42(1):40–47, January 1996

Jorma Rissanen. Fisher information and stochastic complexity.IEEE Transactions on Information Theory, 42(1):40–47, January 1996

work page 1996

[8] [8]

Strong optimality of the normalized ML models as universal codes and information in data

Jorma Rissanen. Strong optimality of the normalized ML models as universal codes and information in data. IEEE Transactions on Information Theory, 47(5):1712–1717, July 2001

work page 2001

[9] [9]

Information Science and Statistics

Jorma Rissanen.Information and Complexity in Statistical Modeling. Information Science and Statistics. Springer, first edition, 2007

work page 2007

[10] [10]

Grünwald.The Minimum Description Length Principle

Peter D. Grünwald.The Minimum Description Length Principle. Adaptive Communication and Machine Learning. The MIT Press, 2007

work page 2007

[11] [11]

Minimum description length revisited.International Journal of Mathematics for Industry, 11(01), December 2019

Peter Grünwald and Teemu Roos. Minimum description length revisited.International Journal of Mathematics for Industry, 11(01), December 2019

work page 2019

[12] [12]

M. D. Donsker and S. R. S. Varadhan. Asymptotic evaluation of certain markov process expectations for large time, i.Communications on Pure and Applied Mathematics, 28(1):1–47, January 1975

work page 1975

[13] [13]

Entropic risk measures: Coherence vs

Hans Föllmer and Thomas Knispel. Entropic risk measures: Coherence vs. convexity, model ambiguity and robust large deviations.Stochastics and Dynamics, 11(02n03):333–351, 2011

work page 2011

[14] [14]

Kullback and R

S. Kullback and R. A. Leibler. On information and sufficiency.The Annals of Mathematical Statistics, 22(1):79–86, March 1951

work page 1951

[15] [15]

Pac-Bayesian supervised classification: The thermodynamics of statistical learning.IMS Lecture Notes Monograph Series, 56:1–163, 2007

Olivier Catoni. Pac-Bayesian supervised classification: The thermodynamics of statistical learning.IMS Lecture Notes Monograph Series, 56:1–163, 2007

work page 2007

[16] [16]

Enes Makalic and Daniel F. Schmidt. Information geometry and asymptotic theory for SMML estimators. arXiv:2604.05241, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[17] [17]

Dhillon, and Joydeep Ghosh

Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, and Joydeep Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6(58):1705–1749, 2005

work page 2005

[18] [18]

Campbell

L.L. Campbell. A coding theorem and Rényi’s entropy.Information and Control, 8(4):423–429, August 1965

work page 1965

[19] [19]

On measures of entropy and information

Alfréd Rényi. On measures of entropy and information. In Jerzy Neyman, editor,Berkeley Symp. on Math. Statist. and Prob., volume I, pages 547–561. University of California Press, 1961

work page 1961

[20] [20]

J.-F. Bercher. Source coding with escort distributions and Rényi entropy bounds.Physics Letters A, 373(36):3235– 3238, 2009

work page 2009

[21] [21]

I. Csiszar. Generalized cutoff rates and Rényi’s information measures.IEEE Transactions on Information Theory, 41(1):26–34, 1995

work page 1995

[22] [22]

Berger.Statistical Decision Theory and Bayesian Analysis

James O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer New York, 1985

work page 1985

[23] [23]

Y . M. Shtarkov. Universal sequential coding of single messages.Probl. Inform. Transm., 23(3):3–17, 1987

work page 1987

[24] [24]

Normalized maximum likelihood with luckiness for multivariate normal distributions, 2017

Kohei Miyaguchi. Normalized maximum likelihood with luckiness for multivariate normal distributions, 2017

work page 2017

[25] [25]

American Mathematical Society, 2000

Shun’ichi Amari and Hiroshi Nagaoka.Methods of Information Geometry, volume 191 ofTranslations of mathematical monographs. American Mathematical Society, 2000

work page 2000

[26] [26]

James G. Dowty. SMML estimators for exponential families with continuous sufficient statistics. arXiv:1302.0581, 2013. 17

work page internal anchor Pith review Pith/arXiv arXiv 2013