Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function

Stephen Odaibo

arxiv: 1907.08956 · v1 · pith:GKDJDK53new · submitted 2019-07-21 · 💻 cs.LG · stat.ML

Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function

Stephen Odaibo This is my paper

Pith reviewed 2026-05-24 18:44 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords variational autoencoderevidence lower boundvariational inferenceKullback-Leibler divergenceGaussian priorGaussian posteriorloss function derivation

0 comments

The pith

The standard VAE loss is the evidence lower bound on data log-likelihood, which simplifies to a closed-form expression when the latent prior and approximate posterior are both Gaussian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This tutorial derives the training objective for variational autoencoders starting from Bayes' theorem and the definition of the Kullback-Leibler divergence. It shows that the intractable log marginal likelihood of the data can be bounded from below by an expression consisting of a reconstruction term and a regularization term. When the prior over latent variables is a standard Gaussian and the encoder outputs parameters of another Gaussian, the regularization term evaluates to an explicit formula involving only the mean and variance vectors. A reader cares because the derivation explains the origin of the loss used in nearly all VAE implementations and removes the need to treat the formula as a black box. The result is a practical recipe for writing the objective that can be optimized by gradient descent.

Core claim

The paper establishes that the variational lower bound on the log likelihood equals the expected reconstruction log probability under the approximate posterior minus the Kullback-Leibler divergence between the approximate posterior and the prior; under the stated Gaussian assumptions this divergence admits the closed-form expression negative one-half times the sum over dimensions of one plus the log variance minus the squared mean minus the variance.

What carries the argument

The evidence lower bound (ELBO), obtained by decomposing the marginal log likelihood into the bound plus a non-negative KL term and then dropping the KL.

If this is right

Training reduces to maximizing the expected log reconstruction probability while penalizing deviation of the encoder distribution from the prior.
Gradients of the KL term can be computed analytically without sampling or numerical integration.
The overall loss separates into a term that depends on the decoder and a term that depends only on the encoder parameters.
The bound becomes tight only when the approximate posterior equals the true posterior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same derivation strategy can be applied to other pairs of distributions that admit closed-form KL terms, such as certain exponential-family members.
If the covariance matrices are full rather than diagonal the closed-form expression changes but the overall structure of the bound remains identical.
The tutorial's emphasis on deriving everything from Bayes' theorem makes the same steps reusable for other variational models beyond the autoencoder setting.

Load-bearing premise

The prior distribution over latent variables and the approximate posterior produced by the encoder are both Gaussian.

What would settle it

A step-by-step algebraic check that the KL divergence integral between two unit-variance Gaussians equals the stated closed-form expression, or fails to match when the distributions are changed to non-Gaussian.

read the original abstract

In Bayesian machine learning, the posterior distribution is typically computationally intractable, hence variational inference is often required. In this approach, an evidence lower bound on the log likelihood of data is maximized during training. Variational Autoencoders (VAE) are one important example where variational inference is utilized. In this tutorial, we derive the variational lower bound loss function of the standard variational autoencoder. We do so in the instance of a gaussian latent prior and gaussian approximate posterior, under which assumptions the Kullback-Leibler term in the variational lower bound has a closed form solution. We derive essentially everything we use along the way; everything from Bayes' theorem to the Kullback-Leibler divergence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A clear tutorial on the standard Gaussian VAE ELBO derivation with no new results.

read the letter

This paper is a tutorial that derives the standard VAE evidence lower bound under Gaussian prior and approximate posterior, reaching the closed-form KL term. It starts from basic probability and builds through Bayes' theorem and KL definitions without skipping steps. That explicit chain is the main strength and can help someone who wants to see every piece laid out for implementation or teaching purposes. The math follows the usual path and matches what is already in the literature. Nothing here is new. The derivation is the familiar one from the original VAE work and textbooks, and the paper presents itself as a tutorial rather than a research contribution. The Gaussian assumption is stated up front, so there is no hidden circularity or unstated restriction. The only real limitation is the lack of novelty or fresh angle; it assembles existing material rather than advancing understanding. A reader new to variational inference might find it a convenient reference. Anyone who already knows the ELBO derivation will not gain anything. I would not bring this to a research-focused reading group. I would not cite it. It does not need peer review because it is not research; it is educational material that could fit a tutorials venue if one existed, but it does not rise to the level of a paper requiring referee time.

Referee Report

0 major / 0 minor

Summary. The paper is a tutorial deriving the evidence lower bound (ELBO) loss for the standard VAE. It begins from Bayes' theorem, introduces variational inference to obtain the ELBO log p(x) >= E_{q(z|x)}[log p(x|z)] - KL(q(z|x)||p(z)), and shows that the KL term admits a closed-form expression when both the prior p(z) and approximate posterior q(z|x) are Gaussian.

Significance. If the derivation holds, the manuscript supplies a self-contained, from-first-principles exposition of the Gaussian VAE ELBO that explicitly derives Bayes' rule, the definition of KL divergence, and the analytic KL between two Gaussians. This level of explicitness is a pedagogical strength for readers learning variational methods.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment and recommendation to accept. The referee's summary correctly identifies the manuscript's focus on a self-contained derivation of the Gaussian VAE ELBO.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a tutorial deriving the standard Gaussian VAE ELBO from elementary probability (Bayes' theorem onward) under explicitly stated Gaussian prior/posterior assumptions that enable the known closed-form KL. No equation reduces to a fitted input, self-citation chain, or renamed ansatz; the central claim is the step-by-step assembly of the familiar result, which remains independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The paper relies exclusively on standard results from probability theory. No free parameters are fitted, no new entities are postulated, and the axioms invoked are background mathematical facts rather than domain-specific modeling choices.

axioms (3)

standard math Bayes' theorem
Invoked to express the posterior in terms of likelihood, prior, and evidence.
standard math Definition and properties of Kullback-Leibler divergence
Used to obtain the evidence lower bound from the log-likelihood.
standard math Closed-form KL divergence between two univariate or multivariate Gaussians
Required to obtain an analytic expression under the Gaussian prior and posterior assumption.

pith-pipeline@v0.9.0 · 5638 in / 1383 out tokens · 24484 ms · 2026-05-24T18:44:48.262800+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

What's in the latent space? Exploring coupled tropical Pacific variability within a Multi-branch $\beta$-Variational Autoencoder
physics.ao-ph 2026-04 unverdicted novelty 6.0

A multi-branch β-VAE on tropical Pacific SST, OHC, and OLR fields yields a latent space that reconstructs data well and aligns with physical ENSO and longer-term coupled variability modes.
VAMP-Diff: VampPrior Latent Diffusion for Photoplethysmography Modeling
eess.SP 2026-05 unverdicted novelty 5.0

VAMP-Diff is a jointly trained variational diffusion model using VampPrior on pooled latents to generate realistic PPG waveforms with better reconstruction fidelity and physiological rate preservation than Gaussian-pr...