Gradient Estimation Using Stochastic Computation Graphs

John Schulman; Nicolas Heess; Pieter Abbeel; Theophane Weber

arxiv: 1506.05254 · v3 · pith:7CJR2KIZnew · submitted 2015-06-17 · 💻 cs.LG

Gradient Estimation Using Stochastic Computation Graphs

John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel This is my paper

classification 💻 cs.LG

keywords gradientfunctionlossstochasticalgorithmcomputationdeterministicestimator

0 comments

read the original abstract

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs---directed acyclic graphs that include both deterministic functions and conditional probability distributions---and describe how to easily and automatically derive an unbiased estimator of the loss function's gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein. It could assist researchers in developing intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evolvability ES: Scalable and Direct Optimization of Evolvability
cs.NE 2019-07 unverdicted novelty 6.0

Evolvability ES is an evolutionary strategy variant that directly optimizes for evolvability by maximizing behavioral diversity under mutations, tested on 2D/3D locomotion tasks and shown competitive with MAML.