Variational Lossy Autoencoder

Diederik P. Kingma; Ilya Sutskever; John Schulman; Pieter Abbeel; Prafulla Dhariwal; Tim Salimans; Xi Chen; Yan Duan

arxiv: 1611.02731 · v2 · pith:RLDMPK6Inew · submitted 2016-11-08 · 💻 cs.LG · stat.ML

Variational Lossy Autoencoder

Xi Chen , Diederik P. Kingma , Tim Salimans , Yan Duan , Prafulla Dhariwal , John Schulman , Ilya Sutskever , Pieter Abbeel This is my paper

classification 💻 cs.LG stat.ML

keywords globalrepresentationautoencoderautoregressivecodedatadistributionimages

0 comments

read the original abstract

Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution $p(z)$ and decoding distribution $p(x|z)$, we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art results on MNIST, OMNIGLOT and Caltech-101 Silhouettes density estimation tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
cs.LG 2026-05 unverdicted novelty 7.0

The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.
Tessellations of Semi-Discrete Flow Matching
cs.LG 2026-05 unverdicted novelty 7.0

Semi-discrete Flow Matching produces terminal assignment regions that are topologically simple (open, simply connected, homeomorphic to the ball under assumption) yet geometrically distinct from optimal transport Lagu...
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
stat.ME 2026-05 unverdicted novelty 6.0

RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generali...
Axiomatizing Neural Networks via Pursuit of Subspaces
cs.LG 2026-05 unverdicted novelty 5.0

Authors introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic geometric framework that unifies explanations for representation, computation, and generalization in shallow and deep neural networks.