pith. machine review for the scientific record. sign in

arxiv: 1611.05013 · v1 · submitted 2016-11-15 · 💻 cs.LG

Recognition: unknown

PixelVAE: A Latent Variable Model for Natural Images

Authors on Pith no claims yet
classification 💻 cs.LG
keywords modellatentcapturingpixelcnnautoregressivedetailsnaturalperformance
0
0 comments X
read the original abstract

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Latent Fourier Transform

    cs.SD 2026-04 unverdicted novelty 7.0

    LatentFT uses latent-space Fourier transforms and frequency masking in diffusion autoencoders to enable timescale-specific manipulation of musical structure in generative models.