Empirical Bayes in Bayesian learning: understanding a common practice

Judith Rousseau; Sonia Petrone; Stefano Rizzelli

arxiv: 2402.19036 · v2 · submitted 2024-02-29 · 🧮 math.ST · stat.TH

Empirical Bayes in Bayesian learning: understanding a common practice

Stefano Rizzelli , Judith Rousseau , Sonia Petrone This is my paper

Pith reviewed 2026-05-24 03:53 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords empirical Bayesmaximum marginal likelihoodBernstein-von Misesposterior mergingparametric modelsmixture modelshyperparameter estimation

0 comments

The pith

Empirical Bayes via maximum marginal likelihood approximates the oracle posterior from the most informative prior in the class at a faster rate than Bernstein-von Mises.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a theoretical framework treating the common practice of setting prior hyperparameters by their maximum marginal likelihood estimates as a computational strategy to approximate a genuine Bayesian posterior. It establishes the limit behavior of these estimates for parametric models in both identifiable and non-identifiable cases, including overfitted mixtures, and proves higher-order merging results. A sympathetic reader would care because the work supplies rigorous justification for an everyday shortcut that previously lacked formal support, showing faster approximation to the posterior based on the prior that carries the most information about the true parameters.

Core claim

When not degenerate, the EB posterior approximates at a faster rate an oracle-Bayes posterior distribution based on the prior law that, within the given class of priors, expresses the most information on the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. The framework also supplies general properties of the MMLE and a simple proxy for its computation.

What carries the argument

The maximum marginal likelihood estimate (MMLE) of the hyperparameters within a fixed class of priors, which selects the prior expressing the most information and drives the higher-order merging of the EB posterior to the oracle posterior.

If this is right

The MMLE exhibits consistent limit behavior in general parametric settings, including non-identifiable models such as overfitted mixtures.
EB posteriors serve as a computational strategy for approximating genuine Bayesian posteriors.
Higher-order merging holds, yielding faster approximation than first-order asymptotic theorems.
Simple proxies for computing the MMLE become available under the stated regularity conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In practice the results may favor choosing a prior class that is rich enough to contain a near-oracle member but still allows stable MMLE computation.
The merging properties could extend to sequential updating schemes where hyperparameters are refreshed as new data arrive.
Modelers working with complex likelihoods might test whether the faster rate improves finite-sample coverage of credible intervals compared with fixed-hyperparameter Bayes.

Load-bearing premise

There exists a fixed class of priors together with regularity conditions that make the maximum marginal likelihood estimate converge to the value selecting the most informative prior, in both identifiable and non-identifiable models.

What would settle it

An explicit calculation or simulation in an overfitted mixture model showing that the EB posterior merges to the oracle posterior at the same first-order rate as standard Bernstein-von Mises rather than at the claimed faster rate.

Figures

Figures reproduced from arXiv: 2402.19036 by Judith Rousseau, Sonia Petrone, Stefano Rizzelli.

**Figure 2.** Figure 2: Bayesian LASSO.Posterior densities of β1 and β14: EB with MMLE (black solid), Bayes with oracle hyperparameter λ ∗ (gray solid), Bayes with λ = 1 (dotted) and λ = 8 (dashed). True values β0,j are marked as black bullets and EB posterior means as empty triangles. This behaviour is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

read the original abstract

In applications of Bayesian procedures, once a class of priors has been chosen, it may be tempting to fix the prior's hyperparameters from the data, in an empirical Bayes (EB) fashion, usually by their maximum marginal likelihood estimates (MMLE). This is a quite common but questionable practice, lacking a rigorous theoretical basis. We provide a theoretical framework where this form of EB is regarded as a computational strategy for approximating a genuine Bayesian posterior distribution and prove its general properties for parametric models. While computing the MMLE may still be demanding, we prove novel results that allow us to provide a simple proxy. These results establish the limit behavior of the MMLE in quite general settings, including both identifiable and non-identifiable models - specifically, overfitted mixture models - significantly filling a gap in the literature. Moreover, we study higher order merging, showing that, when not degenerate, the EB posterior approximates at a faster rate an oracle-Bayes posterior distribution based on the prior law that, within the given class of priors, expresses the most information on the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Our work provides formal content to common beliefs on this popular practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies new limit results for MMLE in non-identifiable models and claims faster-than-BvM merging for the EB posterior, but the supporting regularity conditions look like the weakest part.

read the letter

The core contribution is a set of limit theorems for the maximum marginal likelihood estimator of hyperparameters inside a fixed prior class, covering both identifiable and non-identifiable parametric models, plus a higher-order merging result that says the resulting EB posterior tracks an oracle posterior faster than the usual Bernstein-von Mises rate when the limit is non-degenerate. That is the part that actually fills the gap the abstract mentions for overfitted mixtures and similar settings. The authors also give a simple proxy for the MMLE that avoids full marginal likelihood maximization, which is practically useful if the proxy works under the same conditions. Those two pieces are the real novelty and the reason the work is worth reading. The general properties of the EB approximation as a computational device are cleanly stated but feel more like setup than the main advance. The soft spot is the regularity conditions needed for the MMLE limit in the non-identifiable case. The marginal likelihood surface can be flat or have multiple modes, so convergence to the single most informative prior is not automatic; the paper must impose extra assumptions on the prior class or on the rate of concentration that are not inherited from standard identifiability arguments. Without seeing the precise statements and error bounds, it is hard to judge how restrictive those conditions end up being. The faster merging claim stands or falls with that step. This is aimed at researchers in Bayesian asymptotics who already work with mixtures or other non-identifiable models and want a justification for the common EB shortcut. It is not a broad methods paper. The topic is important enough and the claimed results are sharp enough that a serious editor should send it to referees rather than desk-reject, even though the proofs will need careful checking on the non-identifiable side.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a theoretical framework treating empirical Bayes (EB) via maximum marginal likelihood estimates (MMLE) as a computational approximation to a genuine Bayesian posterior within a fixed class of priors for parametric models. It proves general properties of this approximation, establishes novel limit results for the MMLE in both identifiable and non-identifiable settings (explicitly including overfitted mixture models), and demonstrates higher-order merging in which the EB posterior approximates an oracle-Bayes posterior (corresponding to the most informative prior in the class) at a rate faster than classical Bernstein-von Mises theorems when the setup is non-degenerate.

Significance. If the derivations hold under the stated conditions, the work would supply rigorous justification for a widespread but previously loosely grounded practice in Bayesian statistics. The extension of MMLE limit theory to non-identifiable models and the faster-than-BvM merging rate would constitute concrete advances over existing approximation results, giving formal content to common intuitions about EB methods.

major comments (2)

[Abstract and the section establishing MMLE limits in non-identifiable models] The higher-order merging result (abstract) rests on the MMLE converging to the hyperparameter yielding the most informative prior within the class; this convergence in non-identifiable models (e.g., overfitted mixtures) requires regularity conditions on the marginal likelihood surface that are not automatically inherited from standard identifiability arguments and are not shown to hold against known flat or multimodal cases in the literature.
[Section on higher-order merging] The claim of a faster approximation rate than Bernstein-von Mises is load-bearing for the paper's novelty, yet it is not accompanied by explicit error bounds, rates, or verification that the required MMLE limit persists when the marginal likelihood is non-concave; without these, the faster merging does not necessarily follow from the general properties proved for identifiable cases.

minor comments (1)

The abstract would benefit from a brief parenthetical clarification of the precise class of priors under consideration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below. Where the comments identify gaps in explicit verification or bounds, we will revise the manuscript to strengthen the presentation while preserving the core results on MMLE limits and higher-order merging.

read point-by-point responses

Referee: [Abstract and the section establishing MMLE limits in non-identifiable models] The higher-order merging result (abstract) rests on the MMLE converging to the hyperparameter yielding the most informative prior within the class; this convergence in non-identifiable models (e.g., overfitted mixtures) requires regularity conditions on the marginal likelihood surface that are not automatically inherited from standard identifiability arguments and are not shown to hold against known flat or multimodal cases in the literature.

Authors: Our general theorem on MMLE convergence is stated under explicit regularity conditions on the marginal likelihood (local identifiability of the maximizer and suitable curvature away from degeneracy) that apply equally to identifiable and non-identifiable models. For overfitted mixtures we verify these conditions directly by exploiting the known structure of the marginal likelihood in that class. We agree that a more explicit cross-reference to known flat or multimodal counter-examples in the literature would strengthen the exposition; we will add a short remark clarifying which of those examples fall outside our regularity assumptions and which are covered. revision: yes
Referee: [Section on higher-order merging] The claim of a faster approximation rate than Bernstein-von Mises is load-bearing for the paper's novelty, yet it is not accompanied by explicit error bounds, rates, or verification that the required MMLE limit persists when the marginal likelihood is non-concave; without these, the faster merging does not necessarily follow from the general properties proved for identifiable cases.

Authors: The higher-order merging rate is expressed in terms of the convergence rate of the MMLE to the oracle hyperparameter; the proof does not rely on global concavity but only on the local behavior guaranteed by our MMLE limit theorem, which already covers non-concave surfaces provided the stated regularity conditions hold. To make the argument fully self-contained we will insert explicit big-O error bounds (in terms of the MMLE rate) and a short paragraph confirming that the same local conditions suffice for the non-concave case. This revision will not alter the stated results but will improve readability. revision: yes

Circularity Check

0 steps flagged

No circularity; independent theoretical derivations on MMLE limits and merging rates

full rationale

The paper derives limit behavior of the MMLE and higher-order merging of the EB posterior to an oracle posterior from standard regularity conditions on the marginal likelihood in both identifiable and non-identifiable parametric models. These are presented as novel asymptotic results filling a literature gap, not as reductions of fitted quantities to predictions or as self-definitional constructs. No load-bearing self-citations, imported uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or described framework; the central claims rest on external mathematical analysis rather than the paper's own inputs. The work is self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the framework rests on standard regularity conditions for asymptotic analysis in parametric Bayesian models; no free parameters or invented entities are described.

axioms (1)

domain assumption Regularity conditions sufficient for the limit behavior of the MMLE to hold in parametric models (identifiable and non-identifiable)
Invoked to establish the general properties and higher-order merging results for the EB posterior.

pith-pipeline@v0.9.0 · 5737 in / 1233 out tokens · 35071 ms · 2026-05-24T03:53:17.173275+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Lijoi, G

Ascolani, F., A. Lijoi, G. Rebaudo, and G. Zanella (2022). Clustering consistency with Dirichlet process mixtures . Biometrika\/ 110 , 551--558

work page 2022
[2]

Berger, J. O. and L. M. Berliner (1986). Robust bayes and empirical bayes analysis with -contaminated priors. Ann. Statist.\/ 14 , 461--486

work page 1986
[3]

Berk, R. H. (1966). Limiting Behavior of Posterior Distributions when the Model is Incorrect . Ann. Math. Statist.\/ 37 , 51--58

work page 1966
[4]

Blackwell, D. and L. Dubins (1962). Merging of opinions with increasing information. Ann. Math. Statist.\/ 33 , 882--886

work page 1962
[5]

Lugosi, and P

Boucheron, S., G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford: Oxford University Press

work page 2013
[6]

Carlin, B. and T. Louis (1996). Bayes and empirial B ayes methods for data analysis . Texts in Statistical Science. London (UK): Chapman & Hall

work page 1996
[7]

Clarke, B. and A. Barron (1990). Information-theoretic asymptotics of Bayes methods . IEEE Trans. Inform. Theory\/ 36 , 453--471

work page 1990
[8]

Crawford, S. (1994). An Application of the Laplace Method to Finite Mixture Distributions . J. Amer. Statist. Assoc.\/ 89 , 259--267

work page 1994
[9]

Datta, G. and R. Mukerjee (2004). Probability Matching Priors: Higher Order Asymptotics . New York (US): Springer-Verlag

work page 2004
[10]

Diaconis, P. and D. Freedman (1986). On the consistency of Bayes estimates . Ann. Statist.\/ 14 , 1--26

work page 1986
[11]

Douc, R. and E. Moulines (2012). Asymptotic properties of the maximum likelihood estimation in misspecified hidden Markov models . Ann. Statist.\/ 40 , 2697--2732

work page 2012
[12]

Efron, B. (2019). Bayes, Oracle Bayes and Empirical Bayes . Statist. Sci.\/ 34 , 177--201

work page 2019
[13]

Jiang, and Q

Fan, J., B. Jiang, and Q. Sun (2021). Hoeffding’s inequality for general Markov Chains and its applications to statistical learning . J. Mach. Learn. Res.\/ 22 , 1--35

work page 2021
[14]

Fong, E. and C. Holmes (2020). On the marginal likelihood and cross-validation. Biometrika\/ 107 , 489–496

work page 2020
[15]

Ghosal, S., J. K. Ghosh, and A. W. van der Vaart (2000). Convergence rates of posterior distributions . Ann. Statist.\/ 28 , 500 -- 531

work page 2000
[16]

Ghosal, S. and A. van der Vaart (2007). Convergence rates of posterior distributions for non iid observations. Ann. Statist.\/ 35 , 192--223

work page 2007
[17]

Ghosal, S. and A. van der Vaart (2017). Fundamentals of Nonparametric Bayesian Inference . Cambridge (UK): Cambridge University Press

work page 2017
[18]

Ghosh, J. K. and R. V. Ramamoorthi (2003). Bayesian Nonparametrics . New York: Springer-Verlag

work page 2003
[19]

Good, I. J. (1966). The Estimation of Probabilities . Cambridge, US: M.I.T. Press

work page 1966
[20]

Hoadley, B. (1971). Asymptotic Properties of Maximum Likelihood Estimators for the Independent Not Identically Distributed Case . Ann. Math. Statist.\/ 42 , 1977 -- 1991

work page 1971
[21]

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics , pp.\ 221–--233. Berkeley, CA.: Univ. California Press

work page 1967
[22]

Tierney, and J

Kass, R., L. Tierney, and J. Kadane (1990). The validity of posterior expansions based on laplace's method. In S. P. S. Geisser, J.S. Hodges and A. Zellner (Eds.), Essays in Honor of George Bernard , pp.\ 473–488. Amsterdam (NL): North-Holland

work page 1990
[23]

Lai, T. L., H. Robbins, and C. Z. Wei (1979). Strong Consistency of Least Squares Estimates in Multiple Regression II . J. Multivariate Anal.\/ 9 , 343--361

work page 1979
[24]

Rousseau, and F

Naulet, Z., J. Rousseau, and F. Caron (2024). Asymptotic analysis of statistical estimators related to multigraphex processes under misspecification. Bernoulli\/ (to appear)

work page 2024
[25]

Park, T. and G. Casella (2008). The Bayesian Lasso . J. Amer. Statist. Assoc.\/ 103 , 681--686

work page 2008
[26]

Pe\ n a, V. and J. O. Berger (2020). Restricted Type II Maximum Likelihood Priors on Regression Coefficients . Bayesian Anal.\/ 15 , 1281--1297

work page 2020
[27]

Rousseau, and C

Petrone, S., J. Rousseau, and C. Scricciolo (2014). B ayes and empirical B ayes: do they merge? Biometrika\/ 101 , 285--302

work page 2014
[28]

Raftery, A. E. (1996). Hypothesis testing and model selection. In Markov Chain Monte Carlo in Practice , pp.\ 163--188. London (UK): Chapman & Hall

work page 1996
[29]

Redner, R. A. (1981). Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann. Statist.\/ 9 , 225--228

work page 1981
[30]

Richardson, S. and P. J. Green (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion) . J. R. Stat. Soc. Ser. B. Stat. Methodol.\/ 59 , 731--792

work page 1997
[31]

Robbins, H. (1956). An E mpirical B ayes approach to statistics. Berkeley Symp. on Math. Statist. and Prob.\/ 3.1 , 157--163

work page 1956
[32]

Robert, C. P. (1994). The Bayesian choice: A decision-theoretic motivation . New York: Springer-Verlag

work page 1994
[33]

Ronning, G. (1989). Maximum likelihood estimation of Dirichlet distributions . J. Stat. Comput. Simul.\/ 32 , 215--221

work page 1989
[34]

Rousseau, J. and K. Mengersen (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol.\/ 73 , 689--710

work page 2011
[35]

Rousseau, J. and B. Szabo (2017). Asymptotic behaviour of the empirical B ayes posteriors associated to maximum marginal likelihood estimator. Ann. Statist.\/ 45 , 833 -- 865

work page 2017
[36]

Tanaka, K. and A. Takemura (2006). Strong consistency of the maximum likelihood estimator for finite mixtures of location–scale distributions when the scale parameters are exponentially small. Bernoulli\/ 12 , 1003--1017

work page 2006
[37]

van der Vaart, A. W. (2000). Asymptotic Statistics . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge (UK): Cambridge University Press

work page 2000
[38]

Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist.\/ 20 , 595--601

work page 1949
[39]

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica\/ 50 , 1--25

work page 1982
[40]

Yakowitz, S. J. and J. D. Spragins (1968). On the identifiability of finite mixtures. Ann. Math. Statist.\/ 39 , 209--214

work page 1968
[41]

Zhang, F. and C. Gao (2020). Convergence rates of variational posterior distributions . Ann. Statist.\/ 48 , 2180 -- 2207

work page 2020

[1] [1]

Lijoi, G

Ascolani, F., A. Lijoi, G. Rebaudo, and G. Zanella (2022). Clustering consistency with Dirichlet process mixtures . Biometrika\/ 110 , 551--558

work page 2022

[2] [2]

Berger, J. O. and L. M. Berliner (1986). Robust bayes and empirical bayes analysis with -contaminated priors. Ann. Statist.\/ 14 , 461--486

work page 1986

[3] [3]

Berk, R. H. (1966). Limiting Behavior of Posterior Distributions when the Model is Incorrect . Ann. Math. Statist.\/ 37 , 51--58

work page 1966

[4] [4]

Blackwell, D. and L. Dubins (1962). Merging of opinions with increasing information. Ann. Math. Statist.\/ 33 , 882--886

work page 1962

[5] [5]

Lugosi, and P

Boucheron, S., G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford: Oxford University Press

work page 2013

[6] [6]

Carlin, B. and T. Louis (1996). Bayes and empirial B ayes methods for data analysis . Texts in Statistical Science. London (UK): Chapman & Hall

work page 1996

[7] [7]

Clarke, B. and A. Barron (1990). Information-theoretic asymptotics of Bayes methods . IEEE Trans. Inform. Theory\/ 36 , 453--471

work page 1990

[8] [8]

Crawford, S. (1994). An Application of the Laplace Method to Finite Mixture Distributions . J. Amer. Statist. Assoc.\/ 89 , 259--267

work page 1994

[9] [9]

Datta, G. and R. Mukerjee (2004). Probability Matching Priors: Higher Order Asymptotics . New York (US): Springer-Verlag

work page 2004

[10] [10]

Diaconis, P. and D. Freedman (1986). On the consistency of Bayes estimates . Ann. Statist.\/ 14 , 1--26

work page 1986

[11] [11]

Douc, R. and E. Moulines (2012). Asymptotic properties of the maximum likelihood estimation in misspecified hidden Markov models . Ann. Statist.\/ 40 , 2697--2732

work page 2012

[12] [12]

Efron, B. (2019). Bayes, Oracle Bayes and Empirical Bayes . Statist. Sci.\/ 34 , 177--201

work page 2019

[13] [13]

Jiang, and Q

Fan, J., B. Jiang, and Q. Sun (2021). Hoeffding’s inequality for general Markov Chains and its applications to statistical learning . J. Mach. Learn. Res.\/ 22 , 1--35

work page 2021

[14] [14]

Fong, E. and C. Holmes (2020). On the marginal likelihood and cross-validation. Biometrika\/ 107 , 489–496

work page 2020

[15] [15]

Ghosal, S., J. K. Ghosh, and A. W. van der Vaart (2000). Convergence rates of posterior distributions . Ann. Statist.\/ 28 , 500 -- 531

work page 2000

[16] [16]

Ghosal, S. and A. van der Vaart (2007). Convergence rates of posterior distributions for non iid observations. Ann. Statist.\/ 35 , 192--223

work page 2007

[17] [17]

Ghosal, S. and A. van der Vaart (2017). Fundamentals of Nonparametric Bayesian Inference . Cambridge (UK): Cambridge University Press

work page 2017

[18] [18]

Ghosh, J. K. and R. V. Ramamoorthi (2003). Bayesian Nonparametrics . New York: Springer-Verlag

work page 2003

[19] [19]

Good, I. J. (1966). The Estimation of Probabilities . Cambridge, US: M.I.T. Press

work page 1966

[20] [20]

Hoadley, B. (1971). Asymptotic Properties of Maximum Likelihood Estimators for the Independent Not Identically Distributed Case . Ann. Math. Statist.\/ 42 , 1977 -- 1991

work page 1971

[21] [21]

Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics , pp.\ 221–--233. Berkeley, CA.: Univ. California Press

work page 1967

[22] [22]

Tierney, and J

Kass, R., L. Tierney, and J. Kadane (1990). The validity of posterior expansions based on laplace's method. In S. P. S. Geisser, J.S. Hodges and A. Zellner (Eds.), Essays in Honor of George Bernard , pp.\ 473–488. Amsterdam (NL): North-Holland

work page 1990

[23] [23]

Lai, T. L., H. Robbins, and C. Z. Wei (1979). Strong Consistency of Least Squares Estimates in Multiple Regression II . J. Multivariate Anal.\/ 9 , 343--361

work page 1979

[24] [24]

Rousseau, and F

Naulet, Z., J. Rousseau, and F. Caron (2024). Asymptotic analysis of statistical estimators related to multigraphex processes under misspecification. Bernoulli\/ (to appear)

work page 2024

[25] [25]

Park, T. and G. Casella (2008). The Bayesian Lasso . J. Amer. Statist. Assoc.\/ 103 , 681--686

work page 2008

[26] [26]

Pe\ n a, V. and J. O. Berger (2020). Restricted Type II Maximum Likelihood Priors on Regression Coefficients . Bayesian Anal.\/ 15 , 1281--1297

work page 2020

[27] [27]

Rousseau, and C

Petrone, S., J. Rousseau, and C. Scricciolo (2014). B ayes and empirical B ayes: do they merge? Biometrika\/ 101 , 285--302

work page 2014

[28] [28]

Raftery, A. E. (1996). Hypothesis testing and model selection. In Markov Chain Monte Carlo in Practice , pp.\ 163--188. London (UK): Chapman & Hall

work page 1996

[29] [29]

Redner, R. A. (1981). Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann. Statist.\/ 9 , 225--228

work page 1981

[30] [30]

Richardson, S. and P. J. Green (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion) . J. R. Stat. Soc. Ser. B. Stat. Methodol.\/ 59 , 731--792

work page 1997

[31] [31]

Robbins, H. (1956). An E mpirical B ayes approach to statistics. Berkeley Symp. on Math. Statist. and Prob.\/ 3.1 , 157--163

work page 1956

[32] [32]

Robert, C. P. (1994). The Bayesian choice: A decision-theoretic motivation . New York: Springer-Verlag

work page 1994

[33] [33]

Ronning, G. (1989). Maximum likelihood estimation of Dirichlet distributions . J. Stat. Comput. Simul.\/ 32 , 215--221

work page 1989

[34] [34]

Rousseau, J. and K. Mengersen (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol.\/ 73 , 689--710

work page 2011

[35] [35]

Rousseau, J. and B. Szabo (2017). Asymptotic behaviour of the empirical B ayes posteriors associated to maximum marginal likelihood estimator. Ann. Statist.\/ 45 , 833 -- 865

work page 2017

[36] [36]

Tanaka, K. and A. Takemura (2006). Strong consistency of the maximum likelihood estimator for finite mixtures of location–scale distributions when the scale parameters are exponentially small. Bernoulli\/ 12 , 1003--1017

work page 2006

[37] [37]

van der Vaart, A. W. (2000). Asymptotic Statistics . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge (UK): Cambridge University Press

work page 2000

[38] [38]

Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist.\/ 20 , 595--601

work page 1949

[39] [39]

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica\/ 50 , 1--25

work page 1982

[40] [40]

Yakowitz, S. J. and J. D. Spragins (1968). On the identifiability of finite mixtures. Ann. Math. Statist.\/ 39 , 209--214

work page 1968

[41] [41]

Zhang, F. and C. Gao (2020). Convergence rates of variational posterior distributions . Ann. Statist.\/ 48 , 2180 -- 2207

work page 2020