Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Dami Choi; David Duvenaud; Geoffrey Roeder; Will Grathwohl; Yuhuai Wu

arxiv: 1711.00123 · v3 · pith:5LR5GJA3new · submitted 2017-10-31 · 💻 cs.LG

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Will Grathwohl , Dami Choi , Yuhuai Wu , Geoffrey Roeder , David Duvenaud This is my paper

classification 💻 cs.LG

keywords learninggradientblack-boxdiscreteframeworkoptimizationreinforcementunbiased

0 comments

read the original abstract

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Low-variance estimators overcome the phase-gradient bottleneck in complex-valued neural quantum states
cond-mat.dis-nn 2026-06 unverdicted novelty 7.0

Direct differentiation of the local energy at fixed samples yields an unbiased low-variance estimator for the variational Monte Carlo phase force in complex neural quantum states, with an adaptive mixture extending it...
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.