Deep Generative Stochastic Networks Trainable by Backprop

Yoshua Bengio , \'Eric Thibodeau-Laufer , Guillaume Alain , Jason Yosinski

Authors on Pith no claims yet

classification 💻 cs.LG

keywords distributionbackpropnetworksalongchainconditionaldeepeasier

read the original abstract

We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution of the Markov chain is conditional on the previous state, generally involving a small move, so this conditional distribution has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn because it is easier to approximate its partition function, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. We provide theorems that generalize recent work on the probabilistic interpretation of denoising autoencoders and obtain along the way an interesting justification for dependency networks and generalized pseudolikelihood, along with a definition of an appropriate joint distribution and sampling mechanism even when the conditionals are not consistent. GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. We validate these theoretical results with experiments on two image datasets using an architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with simple backprop, without the need for layerwise pretraining.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Deep Unsupervised Learning using Nonequilibrium Thermodynamics
cs.LG 2015-03 accept novelty 8.0

A forward diffusion process adds noise iteratively to data until it is unstructured, and a neural network learns the reverse process to generate new samples from the original distribution.