Importance Weighted Autoencoders

Roger Grosse; Ruslan Salakhutdinov; Yuri Burda

arxiv: 1509.00519 · v4 · pith:ODFUJIFPnew · submitted 2015-09-01 · 💻 cs.LG · stat.ML

Importance Weighted Autoencoders

Yuri Burda , Roger Grosse , Ruslan Salakhutdinov This is my paper

classification 💻 cs.LG stat.ML

keywords networkposteriorgenerativeimportancemodelassumptionsautoencoderempirically

0 comments

read the original abstract

The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It typically makes strong assumptions about posterior inference, for instance that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network's entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems
stat.ML 2026-05 unverdicted novelty 7.0

Identifiability is proven for recurrent nonlinear switching dynamical systems under flexible assumptions, and ΩSDS is introduced as a flow-based estimator that improves disentanglement and forecasting over VAE-based methods.
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
cs.CV 2024-06 unverdicted novelty 7.0

MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.
Efficient Learning of Deep State Space Models via Importance Smoothing
cs.LG 2026-05 unverdicted novelty 6.0

Introduces PVMC, a parallelizable training method for deep state space models that claims state-of-the-art results and 10x faster training than prior SMC approaches.
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
cs.CL 2026-05 conditional novelty 6.0

RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
stat.ME 2026-05 unverdicted novelty 6.0

RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generali...
Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer achieves state-of-the-art results in offline goal-conditioned RL by replacing return-to-go with a state-conditioned Q-estimator and introducing a gated hybrid attention-mamba backbone for content-adaptive histor...
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markov...
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
cs.RO 2026-02 unverdicted novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
Mitigating Barren Plateaus in Quantum Denoising Diffusion Probabilistic Model
cs.LG 2025-12 unverdicted novelty 5.0

Quantum diffusion models develop a distinct barren plateau beyond small qubit counts; an architectural enhancement and conditional formulation restore trainability for Hamiltonian-parameterized ground-state generation.