pith. sign in

arxiv: 1701.00311 · v2 · pith:4HXXM4XTnew · submitted 2017-01-02 · 🧮 math.ST · stat.ME· stat.ML· stat.TH

Bayesian model selection consistency and oracle inequality with intractable marginal likelihood

classification 🧮 math.ST stat.MEstat.MLstat.TH
keywords modelbayesianselectionfunctiongenerallocaloracleprocedures
0
0 comments X
read the original abstract

In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Early-stopped aggregation: Adaptive inference with computational efficiency

    math.ST 2026-04 unverdicted novelty 6.0

    Early-stopped aggregation performs adaptive model selection and aggregation by halting at simpler models via an early-stopping criterion, achieving optimal contraction rates with reduced computation in variational Bay...