Auto-Encoding Variational Bayes

Diederik P Kingma; Max Welling

arxiv: 1312.6114 · v11 · submitted 2013-12-20 · 📊 stat.ML · cs.LG

Auto-Encoding Variational Bayes

Diederik P Kingma , Max Welling This is my paper

classification 📊 stat.ML cs.LG

keywords inferencebounddatasetsintractablelowerposteriorvariationalcontinuous

0 comments

read the original abstract

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Disentanglement Beyond Generative Models with Riemannian ICA
cs.LG 2026-05 unverdicted novelty 8.0

RICA replaces ICA's global generative model with local Riemannian geometry, introducing a disentanglement tensor based on the Hessian of the log-likelihood and Ricci curvature to measure pointwise disentanglement, whi...
Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion
cs.CV 2026-05 unverdicted novelty 8.0

MIRAGE discovers semantic attacks on online HD map construction via conditional diffusion, enabling boundary removal and injection that degrade AV performance while passing as realistic environmental changes.
Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
cs.LG 2026-05 unverdicted novelty 8.0

Inference-time refinement of pre-trained tabular diffusion models via Bidirectional Chamfer Refinement achieves median 8.6% better downstream performance than real data across 15 benchmarks while preserving fidelity a...
Gradient-Based Program Synthesis with Neurally Interpreted Languages
cs.LG 2026-04 unverdicted novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
GIANTS: Generative Insight Anticipation from Scientific Literature
cs.CL 2026-04 unverdicted novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
cs.CL 2023-09 unverdicted novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
cs.LG 2022-09 unverdicted novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Denoising Diffusion Implicit Models
cs.LG 2020-10 unverdicted novelty 8.0

DDIMs construct non-Markovian diffusion processes that share DDPM training objectives but allow much faster reverse sampling, demonstrated empirically at 10-50x wall-clock speedup.
Denoising Diffusion Probabilistic Models
cs.LG 2020-06 accept novelty 8.0

Denoising diffusion probabilistic models generate high-quality images by learning to reverse a fixed forward diffusion process, achieving FID 3.17 on CIFAR10.
PathVQA: 30000+ Questions for Medical Visual Question Answering
cs.CL 2020-03 accept novelty 8.0

PathVQA is the first public dataset of over 32,000 questions on nearly 5,000 pathology images for medical visual question answering.
Categorical Reparameterization with Gumbel-Softmax
stat.ML 2016-11 unverdicted novelty 8.0

Gumbel-Softmax provides a continuous relaxation of categorical sampling that anneals to discrete samples for gradient-based optimization.
Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
cs.LG 2015-11 accept novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
GeoCycler: Reward-Aligned 3D Diffusion for Constraint-Conditioned Cyclic Peptide Design
cs.CE 2026-05 unverdicted novelty 7.0

GeoCycler aligns latent diffusion models via reward-weighted training with a type-gated stair reward to raise cyclic peptide closure rates across multiple topologies on the LNR benchmark.
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
cs.CV 2026-05 unverdicted novelty 7.0

MotiMotion adds visual reasoning via a training-free VLM to refine primary trajectories and hallucinate secondary motions, plus a confidence-aware guidance scheme, yielding more plausible interactions on the new MotiB...
DrawMotion: Generating 3D Human Motions by Freehand Drawing
cs.CV 2026-05 unverdicted novelty 7.0

DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.
Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations
cs.LG 2026-05 unverdicted novelty 7.0

CAML meta-learns a progressively refined inductive bias from active-learning queries to improve robustness to spurious correlations, reporting accuracy gains on minority groups across several benchmarks.
When Does Model Collapse Occur in Structured Interactive Learning?
cs.LG 2026-05 unverdicted novelty 7.0

Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic res...
CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection
cs.LG 2026-05 unverdicted novelty 7.0

CAMERA is an ego-decoupled mixture-of-experts model with context-informed gating and one-class objectives for unsupervised fraud detection in text-attributed graphs facing semantic camouflage.
An Exterior Method for Nonnegative Matrix Factorization
cs.LG 2026-05 conditional novelty 7.0

eNMF is a new exterior-point algorithm for NMF that initializes from unconstrained factorization, applies a rotation to reach the nonnegative boundary, and empirically outperforms 81 baseline combinations on real and ...
Functionalization via Structure Completion and Motion Rectification
cs.CV 2026-05 unverdicted novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture wi...
Structured Neural Marked Point Processes for Interpretable Event Interaction Modeling
cs.LG 2026-05 unverdicted novelty 7.0

SNMPP builds a product-form neural influence kernel from a signed class-wise interaction network and a monotonic delay-aware temporal network to enable interpretable multi-class event stream modeling.
Causal Anomaly Detection for Lithium-Ion Battery Degradation
cond-mat.mtrl-sci 2026-05 unverdicted novelty 7.0

CausalHealth detects lithium-ion battery degradation with 100% sensitivity and up to 402-cycle lead time using causal anomaly scores from voltage, current, temperature, and resistance time series across seven cells.
When Bits Break Recourse: Counterfactual-Faithful Quantization
cs.LG 2026-05 unverdicted novelty 7.0

CFQ trains quantizer parameters and mixed-precision allocation to preserve counterfactual recourse validity, cost, and direction on Adult, German Credit, and COMPAS while matching accuracy of standard quantizers.
Towards Generalized Image Manipulation Localization via Score-based Model
cs.CV 2026-05 conditional novelty 7.0

DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning
cs.LG 2026-05 unverdicted novelty 7.0

Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
AnyAct: Towards Human Reenactment of Character Motion From Video
cs.CV 2026-05 unverdicted novelty 7.0

AnyAct generates plausible human reenactments from non-human character videos via conditional motion generation from transferable sparse local 2D articulated cues, using human-only supervision, progressive training, a...
UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation
cs.CV 2026-05 unverdicted novelty 7.0

UniTriGen uses unified diffusion in a shared latent space plus lightweight adapters and scene-balanced sampling to produce high-quality aligned VIS-IR-Label triplets from limited paired data, improving few-shot RGB-T ...
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 7.0

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
Proximal-Based Generative Modeling for Bayesian Inverse Problems
math.OC 2026-05 unverdicted novelty 7.0

PGM replaces the intractable likelihood score in diffusion models with a closed-form Moreau score computed via proximal operators, enabling non-asymptotic sampling for inverse problems trained only on prior data.
Amortized Neural Clustering of Time Series based on Statistical Features
stat.ML 2026-05 unverdicted novelty 7.0

Neural networks trained on simulated time series learn to cluster real data using features like autocorrelations, matching or exceeding traditional methods and sometimes auto-selecting the number of clusters.
Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

TCE bridges domain gaps in offline RL by selectively using source data or generating target-aligned transitions via a dual score-based model, outperforming baselines in experiments.
Coreset-Induced Conditional Velocity Flow Matching
stat.ML 2026-05 unverdicted novelty 7.0

CCVFM replaces the inner noise source in hierarchical rectified flow matching with a data-informed Gaussian mixture surrogate from a Sinkhorn coreset, yielding a closed-form conditional velocity law and competitive fe...
The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models
cs.LG 2026-05 unverdicted novelty 7.0

Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data...
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport
cs.CV 2026-05 unverdicted novelty 7.0

DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generativ...
Human face perception reflects inverse-generative and naturalistic discriminative objectives
q-bio.NC 2026-05 unverdicted novelty 7.0

Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
DriftXpress: Faster Drifting Models via Projected RKHS Fields
cs.LG 2026-05 unverdicted novelty 7.0

DriftXpress approximates drifting kernels via projected RKHS fields to lower training cost of one-step generative models while matching original FID scores.
The finite expression method for turbulent dynamics with high-order moment recovery
cs.LG 2026-05 unverdicted novelty 7.0

A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
Muninn: Your Trajectory Diffusion Model But Faster
cs.RO 2026-05 unverdicted novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation
cs.HC 2026-05 unverdicted novelty 7.0

HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
cs.AI 2026-05 unverdicted novelty 7.0

An exploration-aware RL framework lets LLM agents adaptively explore only under high uncertainty via variational rewards and action grouping, yielding consistent gains on text and GUI agent benchmarks.
HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis
cs.GR 2026-05 unverdicted novelty 7.0

HairGPT reframes 3D hairstyle synthesis as dual-decoupled autoregressive strand sequence modeling with geometric tokenization for semantic control and rare style generation.
Flow Matching for Count Data
stat.ML 2026-05 unverdicted novelty 7.0

Count-FM is a new flow-matching method for count data based on birth-death processes that achieves better sample quality with fewer parameters than baselines on simulations and real scRNA-seq and spike-train data.
Tessellations of Semi-Discrete Flow Matching
cs.LG 2026-05 unverdicted novelty 7.0

Semi-discrete Flow Matching produces terminal assignment regions that are topologically simple (open, simply connected, homeomorphic to the ball under assumption) yet geometrically distinct from optimal transport Lagu...
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling
cs.CV 2026-05 unverdicted novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Risk-Controlled Post-Processing of Decision Policies
stat.ML 2026-05 unverdicted novelty 7.0

Risk-controlled post-processing yields a threshold-structured policy that follows the baseline except where an oracle fallback sharply reduces conditional violation risk, achieving O(log n/n) expected excess risk in i...
Order-Agnostic Autoregressive Modelling with Missing Data
cs.LG 2026-05 unverdicted novelty 7.0

Order-agnostic autoregressive models are extended via a missingness-aware training framework (MO-ARM) that enables direct learning from incomplete data and active information acquisition, outperforming standard imputa...
Autoregressive Visual Generation Needs a Prologue
cs.CV 2026-05 unverdicted novelty 7.0

Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.
R2H-Diff: Guided Spectral Diffusion Model for RGB-to-Hyperspectral Reconstruction
cs.CV 2026-05 unverdicted novelty 7.0

R2H-Diff is a guided spectral diffusion framework that reconstructs hyperspectral images from RGB observations using RGB-conditioned feature fusion, transposed attention, and a five-step linear noise schedule to achie...
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline ...
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models
cs.CV 2026-05 unverdicted novelty 7.0

DMGD achieves better performance than fine-tuned SOTA methods in dataset distillation on ImageNet subsets by using semantic matching through conditional likelihood optimization and OT-based distribution matching in a ...
PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics
cs.LG 2026-05 unverdicted novelty 7.0

PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...
iGENE: A Differentiable Flux-Tube Gyrokinetic Code in TensorFlow
physics.plasm-ph 2026-05 unverdicted novelty 7.0

A fully differentiable TensorFlow gyrokinetic code allows approximate gradients of nonlinear turbulence quantities to be used for outer-loop tasks such as profile prediction despite stochasticity.
Generative Modeling with Orbit-Space Particle Flow Matching
cs.GR 2026-05 unverdicted novelty 7.0

OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution
cs.LG 2026-05 unverdicted novelty 7.0

MA-GIG uses VAE latent space to align Integrated Gradients paths with the data manifold for more faithful feature attributions in deep neural networks.
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
cs.CV 2026-05 unverdicted novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Arbitrarily Conditioned Hierarchical Flows for Spatiotemporal Events
cs.LG 2026-05 unverdicted novelty 7.0

ARCH is a hierarchical flow-based generative model that enables tractable conditional intensity computation and arbitrary conditioning for spatiotemporal event distributions.
Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation
cs.LG 2026-05 unverdicted novelty 7.0

LHSD uses spectral filtering on the log-density Hessian to isolate tangent directions from noise and estimate local intrinsic dimension scalably via Stochastic Lanczos Quadrature.
RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects
cs.RO 2026-04 unverdicted novelty 7.0

RopeDreamer uses quaternionic kinematic chains in a recurrent state space model with a dual decoder to cut open-loop prediction error by 40.52% over 50 steps on simulated DLO trajectories while preserving physical con...
Fake3DGS: A Benchmark for 3D Manipulation Detection in Neural Rendering
cs.CV 2026-04 unverdicted novelty 7.0

Fake3DGS benchmark shows state-of-the-art 2D fake detectors fail on 3D-manipulated Gaussian Splatting images while a new multi-view coherence method improves detection.