Consistency of variational approximations under bounded Kullback--Leibler divergence
Pith reviewed 2026-06-27 05:10 UTC · model grok-4.3
The pith
On general metric spaces, a uniform bound on Kullback-Leibler divergence from approximations to tight targets forces the approximations to be tight.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On a general metric space, a uniform bound on the Kullback-Leibler divergence from the approximating measures to a tight sequence of target measures forces the approximating sequence to be tight. It follows that if the target posteriors converge weakly to a Dirac mass at the true parameter, then any variational sequence with bounded Kullback-Leibler divergence to the targets is also consistent.
What carries the argument
The uniform bound on Kullback-Leibler divergence, which transfers tightness from the target sequence to the variational approximating sequence.
If this is right
- If target posteriors converge weakly to a Dirac at the true parameter, variational approximations with bounded KL are consistent.
- Logarithmic-moment conditions on the data suffice to establish the bounded-KL hypothesis for smooth generalized posteriors.
- The tightness transfer holds on arbitrary metric spaces, including infinite-dimensional settings.
- The result supplies a general sufficient condition for consistency of variational methods whenever the targets are consistent.
Where Pith is reading between the lines
- The same tightness argument could be adapted to other f-divergences if they control total variation or weak convergence in a comparable way.
- In practice, the log-moment conditions may be easier to check than direct tightness of the variational family itself.
- The result suggests that posterior consistency proofs for variational methods can reduce to verifying a single uniform bound rather than reproving convergence from scratch.
Load-bearing premise
The sequence of target measures must itself be tight.
What would settle it
A tight sequence of target measures on a metric space together with approximating measures whose Kullback-Leibler divergences remain uniformly bounded, yet whose sequence fails to be tight, would falsify the main claim.
read the original abstract
Variational methods are widely used to approximate posterior distributions in Bayesian inference when exact computation is infeasible. We study when such approximations inherit posterior consistency. Our first result shows that, on a general metric space, a uniform bound on the Kullback--Leibler divergence from the approximating measures to a tight sequence of target measures forces the approximating sequence to be tight. It follows that if the target posteriors converge weakly to a Dirac mass at the true parameter, then any variational sequence with bounded Kullback--Leibler divergence to the targets is also consistent. We also give simple logarithmic-moment conditions that verify this boundedness condition, and illustrate them for smooth generalised posterior distributions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that on a general metric space, a uniform bound on KL(Q_n || P_n) for a tight sequence of target measures {P_n} implies tightness of the approximating sequence {Q_n}. It follows that if {P_n} converges weakly to a Dirac mass at the true parameter, then any variational sequence with bounded KL to the targets is consistent. The paper also supplies logarithmic-moment conditions to verify the bounded-KL hypothesis and illustrates them on smooth generalised posteriors.
Significance. If the central tightness implication holds under the stated hypotheses, the result supplies a broadly applicable criterion linking bounded KL to consistency of variational approximations, extending beyond case-by-case analyses. The logarithmic-moment verification conditions constitute a concrete, checkable strength that could be used in applications.
major comments (1)
- [Abstract] Abstract (and presumably §2 or the main theorem statement): the result is stated for a 'general metric space,' yet the passage from tightness of {Q_n} to weak convergence (hence consistency) to the Dirac limit of {P_n} relies on relative compactness. Prohorov's theorem requires the space to be Polish (separable and complete); on a non-separable or incomplete metric space tightness need not yield relatively compact subsequences, so the consistency conclusion does not follow in full generality. This assumption is load-bearing for the central claim.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the precise observation on topological assumptions. We address the comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (and presumably §2 or the main theorem statement): the result is stated for a 'general metric space,' yet the passage from tightness of {Q_n} to weak convergence (hence consistency) to the Dirac limit of {P_n} relies on relative compactness. Prohorov's theorem requires the space to be Polish (separable and complete); on a non-separable or incomplete metric space tightness need not yield relatively compact subsequences, so the consistency conclusion does not follow in full generality. This assumption is load-bearing for the central claim.
Authors: We agree that the comment is correct. The first result (bounded KL divergence implies tightness of the approximating sequence) holds on arbitrary metric spaces. However, the passage from tightness to relative compactness, and hence to weak convergence to the Dirac measure, invokes Prohorov's theorem and therefore requires the underlying space to be Polish. We will revise the abstract, the statement of the main theorem, and the surrounding discussion to explicitly assume that the metric space is Polish. This does not change the tightness implication but correctly restricts the consistency conclusion to the setting where Prohorov's theorem applies. revision: yes
Circularity Check
No circularity: purely theoretical derivation of tightness from bounded KL on metric spaces
full rationale
The paper presents a mathematical theorem establishing that a uniform bound on KL(Q_n || P_n) implies tightness of {Q_n} when {P_n} is tight, followed by a consistency implication when P_n converges weakly to a Dirac. No parameters are fitted, no predictions are made from subsets of data, and no self-citations or ansatzes are invoked as load-bearing steps in the provided abstract or description. The derivation is self-contained as a direct proof in measure-theoretic probability, with no reduction of outputs to inputs by construction. The skeptic's concern about Polish vs. general metric spaces pertains to correctness of the statement (Prohorov's theorem), not to circularity in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Kullback-Leibler divergence is well-defined and non-negative on probability measures on a metric space
- standard math Weak convergence to a Dirac measure implies consistency of the sequence
Reference graph
Works this paper leans on
-
[1]
and Ridgway, J
Alquier, P. and Ridgway, J. (2020). Concentration of tempered posteriors and of their variational approximations.The Annals of Statistics48, 1475–1497. Bissiri, P. G., Holmes, C. C. and Walker, S. G. (2016). A general framework for updating belief distributions.Journal of the Royal Statistical Society: Series B (Statistical Methodology)78, 1103–1130. Blei...
2020
-
[2]
Sinceω∈Ω ∇, there existsn2(ω)∈Nsuch that ∥∇zgn(0)∥ ≤1for alln≥n 2(ω)
Therefore ∇zgn(0) =n −1/2∇θ logπ( ˆθn). Sinceω∈Ω ∇, there existsn2(ω)∈Nsuch that ∥∇zgn(0)∥ ≤1for alln≥n 2(ω). Sinceω∈Ω w, we have ˜µn ⇝µ ∞ andµ ∞(B(0, r))>0. The Portmanteau theorem gives lim inf n→∞ ˜µn(B(0, r))≥µ ∞(B(0, r))>0. Hence, with α= 1 2 µ∞(B(0, r))>0, there existsn 3(ω)∈Nsuch that ˜µn(B(0, r))≥αfor alln≥n 3(ω). Applying Proposition 3 to the det...
2021
-
[3]
Thus the identifiability condition in Miller (2021, Thm
2021
-
[4]
sup θ∈B(θ0,r0) |b′′′(θ⊤W1)| |W1jW1kW1ℓ| # ≤E
holds. Second, for everyj, k, ℓ∈ {1, . . . , p}, |W1jW1kW1ℓ| ≤ ∥W 1∥3, and hence E " sup θ∈B(θ0,r0) |b′′′(θ⊤W1)| |W1jW1kW1ℓ| # ≤E " sup θ∈B(θ0,r0) |b′′′(θ⊤W1)| ∥W1∥3 # <∞. Therefore Miller (2021, Thm
2021
-
[5]
implies that, on an eventΩM ∈Awithpr(Ω M) = 1, the sequence(g n(ω,·)) n≥1 satisfies the hypotheses of case (2) of Miller (2021, Thm
2021
-
[6]
Sinceη n →η ∗ ∈(0,∞), it follows that, for everyω∈Ω M, the sequence(˜gn(ω,·)) n≥1 also satisfies the hypotheses of case (2) of Miller (2021, Thm
for everyω∈ΩM. Sinceη n →η ∗ ∈(0,∞), it follows that, for everyω∈Ω M, the sequence(˜gn(ω,·)) n≥1 also satisfies the hypotheses of case (2) of Miller (2021, Thm. 5), with limit˜g. In particular, by Miller (2021, Thm. 7), for everyω∈Ω M, ˜gn(ω,·)→˜gand∇ 2 θ˜gn(ω,·)→ ∇ 2 θ˜g uniformly onB
2021
-
[7]
Since case (2) of Miller (2021, Thm
2021
-
[8]
We now verify the hypotheses of Miller (2021, Thm. 6). Condition (2) holds because ∇2 θ˜gn(θ0)→ ∇ 2 θ˜g(θ0), and, for everya∈R p \ {0}, a⊤∇2 θ˜g(θ0)a=η ∗ a⊤E h b′′(θ⊤ 0 W1)W1W ⊤ 1 i a =η ∗ E h b′′(θ⊤ 0 W1)(a⊤W1)2 i >0. 14 Here we used thatη∗ >0, thatb ′′ >0by assumption, and thata ⊤W1 is not almost surely zero by the identifiability argument above. Hence∇...
2021
-
[9]
Consequently, Assumption (1) of Miller (2021, Thm
holds. Consequently, Assumption (1) of Miller (2021, Thm
2021
-
[10]
To verify Assumption (2) of Miller (2021, Thm
is satisfied for˜gn with centring sequenceˆθn. To verify Assumption (2) of Miller (2021, Thm. 4), fixε >0. Since case (2) of Miller (2021, Thm
2021
-
[11]
Therefore, for all sufficiently largen, inf θ∈B( ˆθn,ε)c {˜gn(θ)−˜gn(ˆθn)} ≥inf θ∈B(θ 0,ε/2)c {˜gn(θ)−˜gn(ˆθn)} ≥inf θ∈B(θ 0,ε/2)c {˜gn(θ)−˜gn(θ0)}, since ˆθn minimises˜gn
Since ˆθn →θ 0, we have B(θ0, ε/2)⊂B( ˆθn, ε) for all sufficiently largen. Therefore, for all sufficiently largen, inf θ∈B( ˆθn,ε)c {˜gn(θ)−˜gn(ˆθn)} ≥inf θ∈B(θ 0,ε/2)c {˜gn(θ)−˜gn(ˆθn)} ≥inf θ∈B(θ 0,ε/2)c {˜gn(θ)−˜gn(θ0)}, since ˆθn minimises˜gn. It follows from the two preceding displays that lim inf n→∞ inf θ∈B( ˆθn,ε)c {˜gn(θ)−˜gn(ˆθn)}>0. Thus Assump...
2021
-
[12]
Sinceπis strictly positive and twice continuously differentiable by (C2), the prior assumptions in Miller (2021, Thm
holds. Sinceπis strictly positive and twice continuously differentiable by (C2), the prior assumptions in Miller (2021, Thm
2021
-
[13]
Finally, µn(dθ)∝exp{−n˜g n(θ)}π(θ) dθ
are also satisfied. Finally, µn(dθ)∝exp{−n˜g n(θ)}π(θ) dθ. Hence Miller (2021, Thm
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.