Deep Amortized Inference for Probabilistic Programs

Daniel Ritchie; Noah D. Goodman; Paul Horsfall

arxiv: 1610.05735 · v1 · pith:ADAIKR3Enew · submitted 2016-10-18 · 💻 cs.AI · cs.LG· stat.ML

Deep Amortized Inference for Probabilistic Programs

Daniel Ritchie , Paul Horsfall , Noah D. Goodman This is my paper

classification 💻 cs.AI cs.LGstat.ML

keywords guideinferenceamortizedprogramprogramspplsprobabilisticdata

0 comments

read the original abstract

Probabilistic programming languages (PPLs) are a powerful modeling tool, able to represent any computable probability distribution. Unfortunately, probabilistic program inference is often intractable, and existing PPLs mostly rely on expensive, approximate sampling-based methods. To alleviate this problem, one could try to learn from past inferences, so that future inferences run faster. This strategy is known as amortized inference; it has recently been applied to Bayesian networks and deep generative models. This paper proposes a system for amortized inference in PPLs. In our system, amortization comes in the form of a parameterized guide program. Guide programs have similar structure to the original program, but can have richer data flow, including neural network components. These networks can be optimized so that the guide approximately samples from the posterior distribution defined by the original program. We present a flexible interface for defining guide programs and a stochastic gradient-based scheme for optimizing guide parameters, as well as some preliminary results on automatically deriving guide programs. We explore in detail the common machine learning pattern in which a 'local' model is specified by 'global' random values and used to generate independent observed data points; this gives rise to amortized local inference supporting global model learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Revisiting the Volume Hypothesis
cs.LG 2026-06 unverdicted novelty 6.0

The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.