Wasserstein Auto-Encoders

Bernhard Schoelkopf; Ilya Tolstikhin; Olivier Bousquet; Sylvain Gelly

arxiv: 1711.01558 · v4 · pith:JY7Z7SZ2new · submitted 2017-11-05 · 📊 stat.ML · cs.LG

Wasserstein Auto-Encoders

Ilya Tolstikhin , Olivier Bousquet , Sylvain Gelly , Bernhard Schoelkopf This is my paper

classification 📊 stat.ML cs.LG

keywords distributionwassersteinalgorithmauto-encoderauto-encodersmodelregularizertraining

0 comments

read the original abstract

We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
cs.LG 2026-05 unverdicted novelty 7.0

The paper identifies distinct failure mechanisms: excessive posterior-prior regularization erases mode information in latent policies, while smooth base-to-action maps limit mode coverage in generative policies.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 conditional novelty 7.0

W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
Optimal Stability of KL Divergence under Gaussian Perturbations
cs.LG 2026-04 unverdicted novelty 7.0

KL divergence between a general distribution and a perturbed Gaussian reference remains stable with an optimal sqrt(ε) degradation rate under finite second-moment conditions.
Continuous Reasoning for Vision-Language-Action
cs.RO 2026-05 unverdicted novelty 6.0

Continuous Reasoning for VLA introduces a shared Gaussian latent for continuous thoughts, trained with self-verification to improve action prediction on LIBERO-PRO and real robots.
Mechanisms of Misgeneralization in Physical Sequence Modeling
cs.LG 2026-05 unverdicted novelty 6.0

Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance o...
ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin
cs.CV 2026-05 unverdicted novelty 6.0

ArcVQ-VAE constrains VQ-VAE codebook vectors inside a time-dependent ball and adds angular margin loss to increase separability and codebook utilization.
One-Step Generative Modeling via Wasserstein Gradient Flows
cs.LG 2026-05 unverdicted novelty 6.0

W-Flow compresses a Wasserstein gradient flow defined via Sinkhorn divergence into a single-step neural generator, reporting 1.29 FID on ImageNet 256x256 with improved mode coverage.
Nonlinear Stochastic Model Predictive Control with Generative Uncertainty in Homogeneous Charge Compression Ignition
eess.SY 2026-04 unverdicted novelty 6.0

A stochastic MPC controller for HCCI engines using learned uncertainty distributions, polynomial chaos expansion, and an MMD-based cost reduces combustion phasing variation by over 28% and improves load tracking by ov...
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications
cs.LG 2019-07 unverdicted novelty 6.0

A scalable framework combining streaming graphs, topology computation, and topology-aware datacubes enables interactive analysis of high-dimensional functions in scientific ML applications.
Local Bures-Wasserstein Transport: A Practical and Fast Mapping Approximation
stat.ML 2019-06 unverdicted novelty 6.0

A local Gaussian Bures-Wasserstein method approximates transport maps and barycenters, claimed to run 80x faster than kernel baselines while using fewer components.
ArcVQ-VAE: A Spherical Vector Quantization Framework with ArcCosine Additive Margin
cs.CV 2026-05 unverdicted novelty 5.0

ArcVQ-VAE adds spherical angular-margin regularization consisting of ball-bounded norms and arc-cosine margin loss to improve codebook utilization in VQ-VAE for image tasks.
Molecular Design beyond Training Data with Novel Extended Objective Functionals of Generative AI Models Driven by Quantum Annealing Computer
q-bio.QM 2026-02 unverdicted novelty 5.0

Quantum annealing combined with a Neural Hash Function lets generative models create molecules that are more drug-like than classical versions or the training set itself.
Enhancing Few-Shot Classification of Benchmark and Disaster Imagery with ABHFA-Net
cs.CV 2025-10 conditional novelty 5.0

ABHFA-Net is a novel few-shot classification framework that models prototypes as distributions, applies spatial-channel attention, and uses Bhattacharyya-based contrastive loss, achieving state-of-the-art accuracies o...