Variational Inference: A Review for Statisticians

Alp Kucukelbir; David M. Blei; Jon D. McAuliffe

arxiv: 1601.00670 · v9 · pith:N53GHUGVnew · submitted 2016-01-04 · 📊 stat.CO · cs.LG· stat.ML

Variational Inference: A Review for Statisticians

David M. Blei , Alp Kucukelbir , Jon D. McAuliffe This is my paper

classification 📊 stat.CO cs.LGstat.ML

keywords inferencedensitiesfamilyreviewvariationalbayesianbehinddiscuss

0 comments

read the original abstract

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Variational Inference for Evidential Deep Learning
cs.LG 2026-05 unverdicted novelty 6.0

VI-EDL reformulates evidential deep learning via variational inference to derive an ELBO that limits excessive evidence and a generalization bound that justifies setting Dirichlet parameters to e+1.
HuggingFace's Transformers: State-of-the-art Natural Language Processing
cs.CL 2019-10 accept novelty 6.0

Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training
cs.LG 2026-07 unverdicted novelty 5.0

MTCL learns multi-scale temporal correlations in videos via contrastive learning to produce more informative representations that improve sample efficiency and performance in downstream RL tasks.