pith. sign in

arxiv: 2606.22239 · v1 · pith:6Q6NRWIZnew · submitted 2026-06-20 · 📊 stat.ML · cs.LG

Variance-Tilted Diffusion Models for Diverse Sampling

Pith reviewed 2026-06-26 10:46 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords diffusion modelsdiverse samplinginteracting particlesDoob h-transformvariance-weighted distributionbatch generationgenerative models
0
0 comments X

The pith

Diffusion models obtain diverse batches by tilting the target to high empirical variance and deriving the sampler as the corresponding Doob h-transform.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that independent diffusion trajectories can be corrected into an interacting particle system whose joint law targets collections of samples with large spread after a fixed linear feature map. The correction consists of an explicit repulsion between posterior denoised means plus a curvature term that shifts particles toward regions of higher feature variance. A reader would care because many generation tasks require spread-out candidates rather than independent draws that may repeat similar modes. The construction keeps the target distribution fully specified and derives the dynamics in closed form without data-dependent fitting or heuristics. This replaces ad-hoc repulsion with a transparent probabilistic objective on batches.

Core claim

The central claim is that the Doob h-transform of independent diffusion dynamics with respect to an explicitly specified variance-weighted batch distribution produces a compact correction: an interaction term that repels posterior denoised means together with a curvature term that moves particles toward higher feature variance, thereby yielding an interacting-particle sampler whose stationary law is the desired variance-tilted batch measure rather than a heuristic repulsive drift.

What carries the argument

The Doob h-transform of the independent diffusion process with respect to the variance-weighted batch distribution, which supplies the closed-form interaction and curvature correction terms.

If this is right

  • Batches produced by the sampler exhibit larger spread in the chosen feature space than independent trajectories.
  • The joint law on collections of samples remains a well-defined probability measure at every step.
  • The dynamics follow directly from the chosen target without requiring separate repulsive heuristics.
  • All correction terms are expressed in terms of posterior means and feature variances and remain computable during sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same construction could be applied to other explicitly specified batch objectives, such as minimum pairwise distance or coverage of a reference set.
  • In downstream pipelines the method might reduce reliance on post-processing steps that enforce diversity after independent generation.
  • Because the feature map is required to be linear, extensions to nonlinear embeddings would need additional approximation steps not derived in the paper.

Load-bearing premise

The variance-weighted batch distribution after the prescribed linear feature map can be written explicitly and the corresponding Doob h-transform derived in closed form without approximations.

What would settle it

Simulate both the variance-tilted interacting sampler and independent sampling on a low-dimensional multimodal target, then measure whether the empirical variance of generated batches under the tilted sampler exceeds that of independent sampling by the amount predicted by the explicit target measure.

Figures

Figures reproduced from arXiv: 2606.22239 by Iskander Azangulov, Kianoosh Ashouritaklimi, Leo Zhang.

Figure 1
Figure 1. Figure 1: Qualitative comparison of diverse sampling for the prompt ‘A transparent sculpture of a duck made out of glass’ class. Rows correspond to VT + Divergence (ours, top), VT (ours, middle), and CFG (bottom). Our approach produces a more diverse set of samples, covering a wider range of poses, backgrounds, and visual styles. Once h A t is available, the exact target score is s π A t (xt) = st(xt) + ∇xt log h A … view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparison of diverse sampling for the prompt corgi with a ball. Rows correspond to VT + Divergence (ours, top), VT (ours, middle), and CFG (bottom). Our approach produces a more diverse set of samples, covering a wider range of poses, backgrounds, and visual styles. of A. Then the divergence term in (19) is the Laplacian of st projected onto U, i.e. div A T A∇st(x)  = X k ℓ=1 ∂ 2 aℓ st(x) =: … view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of diverse sampling for the prompt ‘A unicorn in a snowy forest’. Rows correspond to VT + Divergence (ours, top), VT (ours, middle), and CFG (bottom) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of diverse sampling for the prompt ‘Van Gogh Cafe Terasse’. Rows correspond to VT + Divergence (ours, top), VT (ours, middle), and CFG (bottom) [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of diverse sampling for the prompt ‘Portrait of tiger in black and white by Lukas Holas’. Rows correspond to VT + Divergence (ours, top), VT (ours, middle), and CFG (bottom). 12 [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Diffusion models are typically sampled independently, even when the downstream objective is to obtain a diverse set of candidates. We introduce a variance-weighted batch distribution that favours collections of samples with large empirical spread after a prescribed linear feature map. The target is specified explicitly, and the sampler is derived as the corresponding Doob $h$-transform of independent diffusion dynamics. The resulting correction has a compact form: an interaction term that repels posterior denoised means, together with a curvature term that moves particles to the region of higher feature variance. This yields an interacting-particle sampler with a transparent probabilistic target rather than a heuristic repulsive drift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes variance-tilted diffusion models to generate diverse batches from diffusion models. It defines an explicit target batch distribution that reweights independent marginals by the empirical variance of a prescribed linear feature map Φ, then derives the corresponding sampler as the Doob h-transform of the independent diffusion dynamics. The resulting correction consists of an interaction term that repels posterior denoised means and a curvature term that shifts particles toward regions of higher feature variance, yielding an interacting-particle system with a transparent probabilistic target.

Significance. If the closed-form derivation of the h-transform holds exactly, the work supplies a principled, non-heuristic alternative to repulsive-drift methods for diversity in generative sampling. The explicit target distribution and compact correction terms would constitute a clear advance for applications that require controlled batch diversity (e.g., molecular design, prompt variation). The absence of free parameters in the target specification is a notable strength.

major comments (2)
  1. [§3] §3 (Doob h-transform derivation): the claim that the h-function admits an exact closed-form gradient yielding only the stated repulsion-plus-curvature correction must be verified against the Fokker-Planck or backward Kolmogorov equation for the variance-weighted product measure. Any implicit assumption that Φ commutes with the diffusion operator or that the variance functional remains linear under the noising process would invalidate the compact form; the manuscript should exhibit the explicit steps without series expansions or data-dependent fitting.
  2. [§2.2] Definition of the target (likely Eq. (target) or §2.2): the variance-weighted batch distribution is specified via an empirical variance after a fixed linear map Φ. The manuscript must confirm that this target remains a valid probability measure for arbitrary batch size and that the corresponding h-transform reduces exactly to the claimed interaction term without additional normalization constants that depend on the data.
minor comments (2)
  1. Notation for the posterior denoised means and the curvature term should be introduced with explicit dependence on the diffusion time t to avoid ambiguity when the correction is applied at different noise levels.
  2. The abstract states the sampler is 'derived as the Doob h-transform'; the introduction or §1 should include a one-sentence pointer to the precise location of the full derivation for readers who wish to verify the compact form immediately.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of the significance of the work. We address each major comment below, providing the requested verification and clarifications.

read point-by-point responses
  1. Referee: [§3] §3 (Doob h-transform derivation): the claim that the h-function admits an exact closed-form gradient yielding only the stated repulsion-plus-curvature correction must be verified against the Fokker-Planck or backward Kolmogorov equation for the variance-weighted product measure. Any implicit assumption that Φ commutes with the diffusion operator or that the variance functional remains linear under the noising process would invalidate the compact form; the manuscript should exhibit the explicit steps without series expansions or data-dependent fitting.

    Authors: The h-transform derivation in §3 follows directly from the Doob formula applied to the product diffusion on the batch under the variance-weighted target. Because Φ is linear, the noising process on the transformed features Φ(x_t) is an Ornstein-Uhlenbeck process with the same variance schedule, so the empirical variance functional remains exactly quadratic in the batch means at every time. We verify the gradient of log h by substituting the weighted measure into the backward Kolmogorov equation and differentiating: the first-order term produces the repulsion between posterior means, while the second-order term produces the curvature correction. The algebra is exact (no expansions or fitting) and is now exhibited in full in the revised §3 together with a new appendix containing the Fokker-Planck verification. revision: yes

  2. Referee: [§2.2] Definition of the target (likely Eq. (target) or §2.2): the variance-weighted batch distribution is specified via an empirical variance after a fixed linear map Φ. The manuscript must confirm that this target remains a valid probability measure for arbitrary batch size and that the corresponding h-transform reduces exactly to the claimed interaction term without additional normalization constants that depend on the data.

    Authors: The target is p(batch) ∝ [∏_i p(x^i)] · Var_Φ({Φ(x^i)}), normalized by the constant Z = E[Var_Φ] taken over independent draws from p. This is a valid probability measure for any finite B ≥ 2 because Var_Φ ≥ 0 and Z is finite and positive under standard moment assumptions on p. In the h-transform the constant Z cancels identically in the ratio that defines h, so the resulting SDE contains no data-dependent normalization factors; the interaction term arises solely from the quadratic structure of Var_Φ. A clarifying sentence confirming these facts has been added to §2.2. revision: partial

Circularity Check

0 steps flagged

No circularity: target distribution defined externally; h-transform derived from it

full rationale

The paper explicitly defines the variance-weighted batch target via a prescribed linear feature map Φ and empirical variance objective, then derives the Doob h-transform correction (repulsion plus curvature) from the independent diffusion dynamics. No equations reduce the claimed sampler to a fitted parameter or self-citation that bears the central claim; the derivation is presented as closed-form from the Fokker-Planck structure without data-dependent fitting or renaming of known results. This matches the default case of a self-contained derivation against external probabilistic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the existence of a closed-form Doob h-transform for the chosen target and on the ability to compute or estimate the required interaction and curvature terms during sampling.

axioms (1)
  • domain assumption Doob h-transform exists and yields the stated compact interaction and curvature terms for the variance-weighted target.
    Invoked when the sampler is derived from independent diffusion dynamics.

pith-pipeline@v0.9.1-grok · 5631 in / 1178 out tokens · 18024 ms · 2026-06-26T10:46:46.111836+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 2 linked inside Pith

  1. [1]

    Bambrick, J., et al. (2024). Accurate structure prediction of biomolecular interactions with alphafold 3.Nature, 630(8016):493–500

  2. [2]

    Anderson, B. D. (1982). Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326

  3. [3]

    Corso, G., Xu, Y ., de Bortoli, V ., Barzilay, R., and Jaakkola, T. (2023). Particle guidance: non-i.i.d. diverse sampling with diffusion models

  4. [4]

    J., and Lio, P

    Dutordoir, V ., Barbano, R., Mathieu, E., Komorowska, U. J., and Lio, P. (2025). Deft: Efficient fine-tuning of diffusion models by learning the generalised h-transform

  5. [5]

    and Nichol, A

    Dhariwal, P. and Nichol, A. (2021). Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794

  6. [6]

    Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851

  7. [7]

    and Salimans, T

    Ho, J. and Salimans, T. (2022). Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598

  8. [8]

    Karras, T., Aittala, M., Aila, T., and Laine, S. (2022). Elu- cidating the design space of diffusion-based generative models

  9. [9]

    M., Teh, Y

    Prat, A., Zhang, L., Deane, C. M., Teh, Y . W., and Morris, G. M. (2025). Sigmadock: Untwisting molecular docking with fragment-based se (3) diffusion.arXiv preprint arXiv:2511.04854

  10. [10]

    Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. InInternational con- ference on machine learning, pages 2256–2265. pmlr

  11. [11]

    Ermon, S., and Poole, B. (2020). Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456

  12. [12]

    E., Ahern, W., Borst, A

    Yim, J., Eisenach, H. E., Ahern, W., Borst, A. J., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Hanikel, N., Pel- lock, S. J., Courbet, A., Sheffler, W., Wang, J., Venkatesh, 5 Variance-Tilted Diffusion Models for Diverse Sampling P., Sappington, I., Torres, S. V ., Lauko, A., De Bortoli, V ., Mathieu, E., Ovchinnikov, S., Barzilay, R., Jaakkola, T. S....