Preventing Posterior Collapse with delta-VAEs

A\"aron van den Oord; Ali Razavi; Ben Poole; Oriol Vinyals

arxiv: 1901.03416 · v1 · pith:JAWEIQ6Anew · submitted 2019-01-10 · 💻 cs.LG · stat.ML

Preventing Posterior Collapse with delta-VAEs

Ali Razavi , A\"aron van den Oord , Ben Poole , Oriol Vinyals This is my paper

classification 💻 cs.LG stat.ML

keywords approachlatentmodelsposteriorcollapsegenerativelearningmodeling

0 comments

read the original abstract

Due to the phenomenon of "posterior collapse," current latent variable generative models pose a challenging design choice that either weakens the capacity of the decoder or requires augmenting the objective so it does not only maximize the likelihood of the data. In this paper, we propose an alternative that utilizes the most powerful generative models as decoders, whilst optimising the variational lower bound all while ensuring that the latent variables preserve and encode useful information. Our proposed $\delta$-VAEs achieve this by constraining the variational family for the posterior to have a minimum distance to the prior. For sequential latent variable models, our approach resembles the classic representation learning approach of slow feature analysis. We demonstrate the efficacy of our approach at modeling text on LM1B and modeling images: learning representations, improving sample quality, and achieving state of the art log-likelihood on CIFAR-10 and ImageNet $32\times 32$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Shaping Belief States with Generative Environment Models for RL
cs.LG 2019-06 unverdicted novelty 5.0

Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.