Universal priors: solving empirical Bayes via Bayesian inference and pretraining
read the original abstract
We theoretically justify the recent empirical finding of [Teh et al., 2025] that a transformer pretrained on synthetically generated data achieves strong performance on empirical Bayes (EB) problems. We take an indirect approach to this question: rather than analyzing the model architecture or training dynamics, we ask why a pretrained Bayes estimator, trained under a prespecified training distribution, can adapt to arbitrary test distributions. Focusing on Poisson EB problems, we identify the existence of universal priors such that training under these priors yields a near-optimal regret bound of $\widetilde{O}(\frac{1}{n})$ uniformly over all test distributions. Our analysis leverages the classical phenomenon of posterior contraction in Bayesian statistics, showing that the pretrained transformer adapts to unknown test distributions precisely through posterior contraction. This perspective also explains the phenomenon of length generalization, in which the test sequence length exceeds the training length, as the model performs Bayesian inference using a generalized posterior.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Poisson Empirical Bayes via Gamma-Smoothed Nonparametric Maximum Likelihood
A Gamma-smoothed NPMLE for Poisson empirical Bayes achieves optimal nearly parametric rates for posterior means and enables asymptotically exact, shorter marginal coverage confidence sets under compact support.
-
Quasi-Bayes empirical Bayes estimation of sums of random variables
A nonparametric quasi-Bayes empirical Bayes procedure is proposed for estimating sums of random variables, with recursive mixing distribution estimation, asymptotic guarantees, and uncertainty quantification.
-
Merging of Bayes and quasi-Bayes empirical Bayes procedures for Poisson compound decisions
Proves frequentist merging of Bayesian (Dirichlet process) and quasi-Bayesian (Newton's algorithm) empirical Bayes estimators for Poisson compound decisions via concentration rates on marginal PMFs and excess risks, w...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.