Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function
Pith reviewed 2026-05-24 18:44 UTC · model grok-4.3
The pith
The standard VAE loss is the evidence lower bound on data log-likelihood, which simplifies to a closed-form expression when the latent prior and approximate posterior are both Gaussian.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the variational lower bound on the log likelihood equals the expected reconstruction log probability under the approximate posterior minus the Kullback-Leibler divergence between the approximate posterior and the prior; under the stated Gaussian assumptions this divergence admits the closed-form expression negative one-half times the sum over dimensions of one plus the log variance minus the squared mean minus the variance.
What carries the argument
The evidence lower bound (ELBO), obtained by decomposing the marginal log likelihood into the bound plus a non-negative KL term and then dropping the KL.
If this is right
- Training reduces to maximizing the expected log reconstruction probability while penalizing deviation of the encoder distribution from the prior.
- Gradients of the KL term can be computed analytically without sampling or numerical integration.
- The overall loss separates into a term that depends on the decoder and a term that depends only on the encoder parameters.
- The bound becomes tight only when the approximate posterior equals the true posterior.
Where Pith is reading between the lines
- The same derivation strategy can be applied to other pairs of distributions that admit closed-form KL terms, such as certain exponential-family members.
- If the covariance matrices are full rather than diagonal the closed-form expression changes but the overall structure of the bound remains identical.
- The tutorial's emphasis on deriving everything from Bayes' theorem makes the same steps reusable for other variational models beyond the autoencoder setting.
Load-bearing premise
The prior distribution over latent variables and the approximate posterior produced by the encoder are both Gaussian.
What would settle it
A step-by-step algebraic check that the KL divergence integral between two unit-variance Gaussians equals the stated closed-form expression, or fails to match when the distributions are changed to non-Gaussian.
read the original abstract
In Bayesian machine learning, the posterior distribution is typically computationally intractable, hence variational inference is often required. In this approach, an evidence lower bound on the log likelihood of data is maximized during training. Variational Autoencoders (VAE) are one important example where variational inference is utilized. In this tutorial, we derive the variational lower bound loss function of the standard variational autoencoder. We do so in the instance of a gaussian latent prior and gaussian approximate posterior, under which assumptions the Kullback-Leibler term in the variational lower bound has a closed form solution. We derive essentially everything we use along the way; everything from Bayes' theorem to the Kullback-Leibler divergence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a tutorial deriving the evidence lower bound (ELBO) loss for the standard VAE. It begins from Bayes' theorem, introduces variational inference to obtain the ELBO log p(x) >= E_{q(z|x)}[log p(x|z)] - KL(q(z|x)||p(z)), and shows that the KL term admits a closed-form expression when both the prior p(z) and approximate posterior q(z|x) are Gaussian.
Significance. If the derivation holds, the manuscript supplies a self-contained, from-first-principles exposition of the Gaussian VAE ELBO that explicitly derives Bayes' rule, the definition of KL divergence, and the analytic KL between two Gaussians. This level of explicitness is a pedagogical strength for readers learning variational methods.
Simulated Author's Rebuttal
We thank the referee for their positive assessment and recommendation to accept. The referee's summary correctly identifies the manuscript's focus on a self-contained derivation of the Gaussian VAE ELBO.
Circularity Check
No significant circularity
full rationale
The paper is a tutorial deriving the standard Gaussian VAE ELBO from elementary probability (Bayes' theorem onward) under explicitly stated Gaussian prior/posterior assumptions that enable the known closed-form KL. No equation reduces to a fitted input, self-citation chain, or renamed ansatz; the central claim is the step-by-step assembly of the familiar result, which remains independent of the paper's own outputs.
Axiom & Free-Parameter Ledger
axioms (3)
- standard math Bayes' theorem
- standard math Definition and properties of Kullback-Leibler divergence
- standard math Closed-form KL divergence between two univariate or multivariate Gaussians
Forward citations
Cited by 2 Pith papers
-
What's in the latent space? Exploring coupled tropical Pacific variability within a Multi-branch $\beta$-Variational Autoencoder
A multi-branch β-VAE on tropical Pacific SST, OHC, and OLR fields yields a latent space that reconstructs data well and aligns with physical ENSO and longer-term coupled variability modes.
-
VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling
VAMP-Diff is a jointly trained variational diffusion model using VampPrior on pooled latents to generate realistic PPG waveforms with better reconstruction fidelity and physiological rate preservation than Gaussian-pr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.