On Divergence Measures for Training GFlowNets
Pith reviewed 2026-05-23 18:48 UTC · model grok-4.3
The pith
Minimizing Renyi, Tsallis, reverse KL and forward KL divergences trains GFlowNets correctly and often faster than the standard objective.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Properly minimizing the Renyi-alpha, Tsallis-alpha, reverse KL and forward KL divergences between proposal and target distributions yields a provably correct training scheme for GFlowNets that, when paired with the designed control variates, produces low-variance gradients and often converges significantly faster than the conventional flow-matching objective.
What carries the argument
Statistically efficient stochastic gradient estimators for the four divergences, each paired with REINFORCE leave-one-out or score-matching control variates that reduce variance while remaining unbiased.
If this is right
- GFlowNets trained under any of the four divergences still sample exactly from the unnormalized target when the objective reaches zero.
- The same control-variate construction can be reused for other divergence families in future GFlowNet variants.
- Training time to a given effective sample size is reduced on tasks where the new objectives reach low error sooner.
- The variational-inference viewpoint supplies a systematic way to derive new objectives for GFlowNets beyond the four examined here.
Where Pith is reading between the lines
- The same estimator-plus-control-variate pattern may transfer directly to other amortized samplers that rely on forward-backward consistency.
- Alpha-parameter schedules could be used to anneal between different divergences during a single training run.
- Because the method narrows the gap with generalized variational approximations, existing VI diagnostics and convergence theory may now apply to GFlowNets with only minor adaptation.
Load-bearing premise
The designed stochastic gradient estimators for the four divergences remain unbiased and, after the control variates are applied, have low enough variance to produce stable and faster convergence.
What would settle it
An experiment on a small, exactly solvable GFlowNet task in which the learned policy is sampled exhaustively and checked for exact match to the target distribution; mismatch or slower convergence than the log-squared baseline would falsify the claim.
Figures
read the original abstract
Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects, with applications in generative modeling for tasks in fields such as causal discovery, NLP, and drug discovery. Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) and a target (backward policy) distribution, which enforces certain flow-matching conditions. While this training procedure is closely related to variational inference (VI), directly attempting standard Kullback-Leibler (KL) divergence minimization can lead to proven biased and potentially high-variance estimators. Therefore, we first review four divergence measures, namely, Renyi-$\alpha$'s, Tsallis-$\alpha$'s, reverse and forward KL's, and design statistically efficient estimators for their stochastic gradients in the context of training GFlowNets. Then, we verify that properly minimizing these divergences yields a provably correct and empirically effective training scheme, often leading to significantly faster convergence than previously proposed optimization. To achieve this, we design control variates based on the REINFORCE leave-one-out and score-matching estimators to reduce the variance of the learning objectives' gradients. Our work contributes by narrowing the gap between GFlowNets training and generalized variational approximations, paving the way for algorithmic ideas informed by the divergence minimization viewpoint.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews four divergence measures (Renyi-α, Tsallis-α, reverse KL, forward KL) as alternatives to the standard log-squared objective for training GFlowNets. It designs stochastic gradient estimators for these divergences and introduces control variates based on REINFORCE leave-one-out and score-matching to reduce gradient variance, claiming that proper minimization yields provably correct training with significantly faster convergence than prior methods.
Significance. If the estimators are unbiased and the variance reduction is effective without compromising correctness, the work would usefully connect GFlowNet training to the broader literature on generalized variational approximations, potentially enabling more stable and efficient optimization in applications such as causal discovery and molecular design. The explicit design of control variates is a constructive contribution when accompanied by bias and variance analysis.
major comments (2)
- [Abstract and estimator design section] The central claim that the four divergence estimators plus control variates produce unbiased gradients with sufficiently low variance for provably correct and faster training is load-bearing, yet the manuscript provides no explicit bias proofs, variance bounds, or comparison to the standard objective's known bias (abstract and the section introducing the estimators).
- [Section verifying correctness of the training scheme] The assertion that minimizing these divergences 'yields a provably correct' scheme requires a derivation showing equivalence (or controlled deviation) to the flow-matching conditions; without this, the 'provably correct' guarantee remains unverified.
minor comments (2)
- [Notation and preliminaries] Clarify the precise definitions of the four divergences in the GFlowNet context (forward vs. backward policies) and ensure consistent notation for the control variates across sections.
- [Experiments] The empirical claims of 'significantly faster convergence' would benefit from additional baseline comparisons and reporting of variance across multiple random seeds.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments. Below we address each major comment, committing to revisions to enhance the rigor of the proofs and derivations as requested.
read point-by-point responses
-
Referee: [Abstract and estimator design section] The central claim that the four divergence estimators plus control variates produce unbiased gradients with sufficiently low variance for provably correct and faster training is load-bearing, yet the manuscript provides no explicit bias proofs, variance bounds, or comparison to the standard objective's known bias (abstract and the section introducing the estimators).
Authors: The manuscript derives the estimators using established techniques from variational inference and control variates, which are known to yield unbiased gradients. However, we agree that formal bias proofs and variance bounds were not explicitly provided. In the revised version, we will include a new section or appendix with detailed bias analysis for the Renyi, Tsallis, and KL estimators, along with a comparison to the bias properties of the standard log-squared objective. This will strengthen the central claim. revision: yes
-
Referee: [Section verifying correctness of the training scheme] The assertion that minimizing these divergences 'yields a provably correct' scheme requires a derivation showing equivalence (or controlled deviation) to the flow-matching conditions; without this, the 'provably correct' guarantee remains unverified.
Authors: We believe the verification in the manuscript demonstrates that the minima of these divergences correspond to the desired flow-matching conditions, but we acknowledge that a more explicit derivation would be beneficial. We will revise the relevant section to provide a detailed proof showing how minimizing each of the four divergences enforces the flow-matching conditions, thereby making the 'provably correct' claim fully rigorous. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's central claims rest on designing stochastic gradient estimators for standard divergences (Renyi, Tsallis, forward/reverse KL) using established REINFORCE leave-one-out and score-matching control variates. These techniques are imported from the broader literature rather than defined or fitted within the paper itself. No equations reduce a claimed prediction to a fitted parameter by construction, no uniqueness theorems are invoked via self-citation, and no ansatz is smuggled through prior work by the same authors. The derivation chain therefore remains self-contained against external benchmarks and does not collapse to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
S. Axler. Measure, Integration &; Real Analysis. Springer International Publishing, 2020. ISBN 9783030331436. doi: 10.1007/978-3-030-33143-6
-
[2]
A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 2018
work page 2018
- [3]
- [4]
-
[5]
A Conceptual Introduction to Hamiltonian Monte Carlo
M. Betancourt. A conceptual introduction to hamiltonian monte carlo. arXiv preprint arXiv:1701.02434, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[6]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007
work page 2007
-
[7]
D. M. Blei and et al. Variational inference: A review for statisticians. Journal of the American Statistical Association, 2017
work page 2017
-
[8]
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: composable transforma- tions of Python+NumPy programs, 2018
work page 2018
-
[9]
L. Buesing, N. Heess, and T. Weber. Approximate inference in discrete distributions with monte carlo tree search and value functions. In AISTATS, pages 624–634. PMLR, 2020
work page 2020
- [10]
-
[11]
P. Carbonetto, M. King, and F. Hamze. A stochastic approximation method for inference in probabilistic graphical models. NeurIPS, 2009
work page 2009
-
[12]
B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of statistical software, 2017
work page 2017
-
[13]
T. da Silva, E. Silva, A. Ribeiro, A. Góis, D. Heider, S. Kaski, and D. Mesquita. Human- in-the-loop causal discovery under latent confounding using ancestral gflownets. arXiv preprint:2309.12032, 2023
-
[14]
M. P. Deisenroth and C. E. Rasmussen. PILCO: A model-based and data-efficient approach to policy search. In ICML, Proceedings of Machine Learning Research, pages 465–472. PMLR, 2011
work page 2011
- [15]
- [16]
- [17]
-
[18]
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks
S. Depeweg, J. M. Hernández-Lobato, F. Doshi-Velez, and S. Udluft. Learning and pol- icy search in stochastic dynamical systems with bayesian neural networks. arXiv preprint arXiv:1605.07127, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[19]
A. B. Dieng, D. Tran, R. Ranganath, J. Paisley, and D. Blei. Variational inference via χ upper bound minimization. NeurIPS, 2017
work page 2017
-
[20]
J. Domke. Provable gradient variance guarantees for black-box variational inference. In NeurIPS, pages 328–337, 2019
work page 2019
-
[21]
J. Domke. Provable smoothness guarantees for black-box variational inference. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 2587–2596. PMLR, 2020
work page 2020
-
[22]
K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient langevin dynamics. In NeurIPS, 2016
work page 2016
- [23]
- [24]
-
[25]
J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 1981
work page 1981
-
[26]
T. Garipov, S. D. Peuter, G. Yang, V . Garg, S. Kaski, and T. S. Jaakkola. Compositional sculpting of iterative generative processes. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[27]
C. J. Geyer. Markov chain monte carlo maximum likelihood. 1991
work page 1991
- [28]
-
[29]
E. J. Hu, M. Jain, E. Elmoznino, Y . Kaddar, and et al. Amortizing intractable inference in large language models, 2023
work page 2023
-
[30]
E. J. Hu, N. Malkin, M. Jain, K. E. Everett, A. Graikos, and Y . Bengio. Gflownet-em for learning compositional latent variable models. In International Conference on Machine Learning (ICLR), 2023
work page 2023
-
[31]
Z. Huang and S. Becker. Stochastic gradient langevin dynamics with variance reduction. CoRR, 2021
work page 2021
-
[32]
M. Jain, E. Bengio, A. Hernandez-Garcia, J. Rector-Brooks, B. F. P. Dossou, C. A. Ekbote, J. Fu, T. Zhang, M. Kilgour, D. Zhang, L. Simine, P. Das, and Y . Bengio. Biological sequence design with GFlowNets. In International Conference on Machine Learning (ICML), 2022
work page 2022
-
[33]
M. Jain, T. Deleu, J. Hartford, C.-H. Liu, A. Hernandez-Garcia, and Y . Bengio. Gflownets for ai-driven scientific discovery. Digital Discovery, 2023
work page 2023
-
[34]
H. Jang, M. Kim, and S. Ahn. Learning energy decompositions for partial inference in GFlownets. In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[35]
M. Järvenpää and J. Corander. On predictive inference for intractable models via approximate bayesian computation. Statistics and Computing, 33(2), Feb. 2023. ISSN 1573-1375. 11
work page 2023
-
[36]
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Mach. Learn., 37(2):183–233, 1999
work page 1999
-
[37]
T. H. Jukes and C. R. Cantor. Evolution of protein molecules. In Mammalian Protein Metabolism. Elsevier, 1969
work page 1969
-
[38]
H. J. Kappen, V . Gómez, and M. Opper. Optimal control as a graphical model inference problem. Mach. Learn., 87(2):159–182, 2012
work page 2012
-
[39]
K. Kim, Y . Ma, and J. Gardner. Linear convergence of black-box variational inference: Should we stick the landing? In AISTATS, volume 238 of Proceedings of Machine Learning Research, pages 235–243. PMLR, 2024
work page 2024
- [40]
-
[41]
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[42]
D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling. Semi-supervised learning with deep generative models. NeurIPS, 2014
work page 2014
-
[43]
D. P. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models, 2023
work page 2023
-
[44]
Y . Kinoshita and T. Suzuki. Improved convergence rate of stochastic gradient langevin dynamics with variance reduction and its application to optimization. In NeurIPS, 2022
work page 2022
-
[45]
J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference. J. Mach. Learn. Res., 23:132:1–132:109, 2022
work page 2022
-
[46]
A. Kucukelbir, D. Tran, R. Ranganath, A. Gelman, and D. M. Blei. Automatic differentiation variational inference. J. Mach. Learn. Res., 18:14:1–14:45, 2017
work page 2017
-
[47]
S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 1951
work page 1951
- [48]
-
[49]
E. Lau, N. M. Vemgal, D. Precup, and E. Bengio. DGFN: Double generative flow networks. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023
work page 2023
-
[50]
S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [51]
-
[52]
D. Liu and et al. Gflowout: Dropout with generative flow networks. InInternational Conference on Machine Learning, ICML’23. JMLR.org, 2023
work page 2023
-
[53]
R. Liu, J. Regier, N. Tripuraneni, M. I. Jordan, and J. D. McAuliffe. Rao-blackwellized stochastic gradients for discrete distributions. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 4023–4031. PMLR, 2019
work page 2019
- [54]
- [55]
- [56]
-
[57]
D. Mesquita, P. Blomstedt, and S. Kaski. Embarrassingly parallel MCMC using deep invertible transformations. In UAI, 2019
work page 2019
-
[58]
T. Minka. Divergence measures and message passing. Technical report, Research, Microsoft, 2005. URL https://www.seas.harvard.edu/courses/cs281/papers/ minka-divergence.pdf
work page 2005
-
[59]
S. Mohamed, M. Rosca, M. Figurnov, and A. Mnih. Monte carlo gradient estimation in machine learning. J. Mach. Learn. Res., 21:132:1–132:62, 2020
work page 2020
-
[60]
S. Mohammadpour, E. Bengio, E. Frejinger, and P.-L. Bacon. Maximum entropy gflownets with soft q-learning, 2023
work page 2023
-
[61]
R. M. Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2011
work page 2011
-
[62]
A. C. Nica, M. Jain, E. Bengio, C.-H. Liu, M. Korablyov, M. M. Bronstein, and Y . Bengio. Evaluating generalization in gflownets for molecule design. In ICLR2022 Machine Learning for Drug Discovery, 2022
work page 2022
-
[63]
A. B. Owen. Monte Carlo theory, methods and examples. 2013
work page 2013
- [64]
-
[65]
L. Pan, D. Zhang, A. Courville, L. Huang, and Y . Bengio. Generative augmented flow networks. In International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[66]
L. Pan, D. Zhang, M. Jain, L. Huang, and Y . Bengio. Stochastic generative flow networks. In UAI, volume 216 of Proceedings of Machine Learning Research, pages 1628–1638. PMLR, 2023
work page 2023
- [67]
-
[68]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library, 2019
work page 2019
-
[69]
B. Poczos and J. Schneider. On the estimation of α-divergences. In AISTATS. PMLR, 2011
work page 2011
-
[70]
R. Ranganath, S. Gerrish, and D. M. Blei. Black box variational inference. In AISTATS, volume 33 of JMLR Workshop and Conference Proceedings, pages 814–822. JMLR.org, 2014
work page 2014
-
[71]
R. Ranganath, D. Tran, J. Altosaar, and D. M. Blei. Operator variational inference. InNeurIPS, pages 496–504, 2016
work page 2016
-
[72]
J. Rector-Brooks, K. Madan, M. Jain, M. Korablyov, C.-H. Liu, S. Chandar, N. Malkin, and Y . Bengio. Thompson sampling for improved exploration in gflownets, 2023
work page 2023
-
[73]
A. Rényi. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. University of California Press, 1961
work page 1961
-
[74]
D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International conference on machine learning. PMLR, 2015
work page 2015
-
[75]
B. Rhodes and M. U. Gutmann. Variational noise-contrastive estimation. In AISTATS, 2019
work page 2019
-
[76]
L. Richter, A. Boustati, N. Nüsken, F. J. R. Ruiz, and Ö. D. Akyildiz. Vargrad: A low-variance gradient estimator for variational inference. 2020
work page 2020
-
[77]
C. P. Robert et al. The Bayesian choice: from decision-theoretic foundations to computational implementation, volume 2. Springer, 2007. 13
work page 2007
- [78]
- [79]
-
[80]
T. G. J. Rudner, V . Pong, R. McAllister, Y . Gal, and S. Levine. Outcome-driven reinforcement learning via variational inference. In NeurIPS, pages 13045–13058, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.