pith. sign in

Deep Amortized Inference for Probabilistic Programs

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Probabilistic programming languages (PPLs) are a powerful modeling tool, able to represent any computable probability distribution. Unfortunately, probabilistic program inference is often intractable, and existing PPLs mostly rely on expensive, approximate sampling-based methods. To alleviate this problem, one could try to learn from past inferences, so that future inferences run faster. This strategy is known as amortized inference; it has recently been applied to Bayesian networks and deep generative models. This paper proposes a system for amortized inference in PPLs. In our system, amortization comes in the form of a parameterized guide program. Guide programs have similar structure to the original program, but can have richer data flow, including neural network components. These networks can be optimized so that the guide approximately samples from the posterior distribution defined by the original program. We present a flexible interface for defining guide programs and a stochastic gradient-based scheme for optimizing guide parameters, as well as some preliminary results on automatically deriving guide programs. We explore in detail the common machine learning pattern in which a 'local' model is specified by 'global' random values and used to generate independent observed data points; this gives rise to amortized local inference supporting global model learning.

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Revisiting the Volume Hypothesis

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.

citing papers explorer

Showing 1 of 1 citing paper.

  • Revisiting the Volume Hypothesis cs.LG · 2026-06-30 · unverdicted · none · ref 40 · internal anchor

    The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.