pith. sign in

arxiv: 2410.09355 · v2 · submitted 2024-10-12 · 💻 cs.LG · cs.AI· stat.ML

On Divergence Measures for Training GFlowNets

Pith reviewed 2026-05-23 18:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords GFlowNetsdivergence measuresvariational inferencecontrol variatesREINFORCEscore matchinggenerative modelingstochastic gradients
0
0 comments X

The pith

Minimizing Renyi, Tsallis, reverse KL and forward KL divergences trains GFlowNets correctly and often faster than the standard objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that GFlowNets can be trained by minimizing any of four divergences between the forward proposal and backward target policies rather than the usual expected log-squared difference. This change preserves the correctness of the induced sampling distribution while allowing the use of gradient estimators drawn from variational inference. To keep the estimators practical, the authors introduce control variates based on REINFORCE leave-one-out and score-matching terms that lower gradient variance without adding bias. Experiments show that the resulting procedures reach the target distribution with fewer training steps in several tasks.

Core claim

Properly minimizing the Renyi-alpha, Tsallis-alpha, reverse KL and forward KL divergences between proposal and target distributions yields a provably correct training scheme for GFlowNets that, when paired with the designed control variates, produces low-variance gradients and often converges significantly faster than the conventional flow-matching objective.

What carries the argument

Statistically efficient stochastic gradient estimators for the four divergences, each paired with REINFORCE leave-one-out or score-matching control variates that reduce variance while remaining unbiased.

If this is right

  • GFlowNets trained under any of the four divergences still sample exactly from the unnormalized target when the objective reaches zero.
  • The same control-variate construction can be reused for other divergence families in future GFlowNet variants.
  • Training time to a given effective sample size is reduced on tasks where the new objectives reach low error sooner.
  • The variational-inference viewpoint supplies a systematic way to derive new objectives for GFlowNets beyond the four examined here.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same estimator-plus-control-variate pattern may transfer directly to other amortized samplers that rely on forward-backward consistency.
  • Alpha-parameter schedules could be used to anneal between different divergences during a single training run.
  • Because the method narrows the gap with generalized variational approximations, existing VI diagnostics and convergence theory may now apply to GFlowNets with only minor adaptation.

Load-bearing premise

The designed stochastic gradient estimators for the four divergences remain unbiased and, after the control variates are applied, have low enough variance to produce stable and faster convergence.

What would settle it

An experiment on a small, exactly solvable GFlowNet task in which the learned policy is sampled exhaustively and checked for exact match to the target distribution; mismatch or slower convergence than the log-squared baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 2410.09355 by Diego Mesquita, Eliezer de Souza da Silva, Tiago da Silva.

Figure 1
Figure 1. Figure 1: Mode-seeking (α = 2) versus mass-covering (α = −2) behaviour in α-divergences. Importantly, we need only the gradients of Rα and Tα for solving the optimization problem in Equation 3 and, in particular, learning the target distribution’s normalizing constant is unnecessary, as we underline in the lemma below. This property distinguishes such divergence measures from both TB and DB losses in Equation 1 and,… view at source ↗
Figure 2
Figure 2. Figure 2: Variance of the estimated gradients as a function of the trajectories’ batch size. Our control variates greatly reduce the estimator’s variance, even for relatively small batch sizes. Correspondingly, the gradient of DKL[PB||PF ] wrt θ is ∇θDKL[PB||PF ] C= −Eτ∼PF (so,·)  pFθ (τ |so) pB(τ |x)r(x) ∇θs(τ ; θ)  . Crucially, choosing an appropriate learning objective is an empirical question that one should c… view at source ↗
Figure 3
Figure 3. Figure 3: Divergence-based learning objectives often lead to faster training than TB loss. Notably, contrasting with the experiments of [56], there is no single best loss function always conducting to the fastest convergence rate, and minimizing well-known divergence measures is often on par with or better than minimizing the TB loss in terms of convergence speed. Results were averaged across three different seeds. … view at source ↗
Figure 4
Figure 4. Figure 4: Learned distributions for the banana-shaped target. Tsallis-α, Renyi-α and for. KL leads to a better model than TB and Rev. KL, which behave similarly — as predicted by Proposition 1. 5.1 Generative tasks Below, we provide a high-level characterization of the generative tasks used for synthetic data generation and training. For a more rigorous description of Section 2, see Appendix B. Set generation [3, 34… view at source ↗
Figure 5
Figure 5. Figure 5: Learning curves for different objective functions in the task of set generation. The reduced variance of the gradient estimates notably increases training stability and speed. 6 Conclusions, limitations and broader impact We showed in a comprehensive range of experiments that f-divergence measures commonly employed in VI — forward KL, reverse KL, Renyi-α, and Tsallis-α — are effective learning objectives f… view at source ↗
Figure 6
Figure 6. Figure 6: Learning curves for a GFlowNet trained by minimizing the TB loss. The curves’ smoothness highlights the low variance of the optimization steps incurred by the stochastic gradients of LT B, which do not use a score function estimator. C.4 Proof of Proposition 2 We will derive an expression for the optimal baseline of a vector-valued control variate. For this, let f be the averaged function and g : τ 7→ g(τ … view at source ↗
Figure 7
Figure 7. Figure 7: Results for sequence generation with larger batches. Forward KL for sequence generation [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
read the original abstract

Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distributions over composable objects, with applications in generative modeling for tasks in fields such as causal discovery, NLP, and drug discovery. Traditionally, the training procedure for GFlowNets seeks to minimize the expected log-squared difference between a proposal (forward policy) and a target (backward policy) distribution, which enforces certain flow-matching conditions. While this training procedure is closely related to variational inference (VI), directly attempting standard Kullback-Leibler (KL) divergence minimization can lead to proven biased and potentially high-variance estimators. Therefore, we first review four divergence measures, namely, Renyi-$\alpha$'s, Tsallis-$\alpha$'s, reverse and forward KL's, and design statistically efficient estimators for their stochastic gradients in the context of training GFlowNets. Then, we verify that properly minimizing these divergences yields a provably correct and empirically effective training scheme, often leading to significantly faster convergence than previously proposed optimization. To achieve this, we design control variates based on the REINFORCE leave-one-out and score-matching estimators to reduce the variance of the learning objectives' gradients. Our work contributes by narrowing the gap between GFlowNets training and generalized variational approximations, paving the way for algorithmic ideas informed by the divergence minimization viewpoint.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reviews four divergence measures (Renyi-α, Tsallis-α, reverse KL, forward KL) as alternatives to the standard log-squared objective for training GFlowNets. It designs stochastic gradient estimators for these divergences and introduces control variates based on REINFORCE leave-one-out and score-matching to reduce gradient variance, claiming that proper minimization yields provably correct training with significantly faster convergence than prior methods.

Significance. If the estimators are unbiased and the variance reduction is effective without compromising correctness, the work would usefully connect GFlowNet training to the broader literature on generalized variational approximations, potentially enabling more stable and efficient optimization in applications such as causal discovery and molecular design. The explicit design of control variates is a constructive contribution when accompanied by bias and variance analysis.

major comments (2)
  1. [Abstract and estimator design section] The central claim that the four divergence estimators plus control variates produce unbiased gradients with sufficiently low variance for provably correct and faster training is load-bearing, yet the manuscript provides no explicit bias proofs, variance bounds, or comparison to the standard objective's known bias (abstract and the section introducing the estimators).
  2. [Section verifying correctness of the training scheme] The assertion that minimizing these divergences 'yields a provably correct' scheme requires a derivation showing equivalence (or controlled deviation) to the flow-matching conditions; without this, the 'provably correct' guarantee remains unverified.
minor comments (2)
  1. [Notation and preliminaries] Clarify the precise definitions of the four divergences in the GFlowNet context (forward vs. backward policies) and ensure consistent notation for the control variates across sections.
  2. [Experiments] The empirical claims of 'significantly faster convergence' would benefit from additional baseline comparisons and reporting of variance across multiple random seeds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments. Below we address each major comment, committing to revisions to enhance the rigor of the proofs and derivations as requested.

read point-by-point responses
  1. Referee: [Abstract and estimator design section] The central claim that the four divergence estimators plus control variates produce unbiased gradients with sufficiently low variance for provably correct and faster training is load-bearing, yet the manuscript provides no explicit bias proofs, variance bounds, or comparison to the standard objective's known bias (abstract and the section introducing the estimators).

    Authors: The manuscript derives the estimators using established techniques from variational inference and control variates, which are known to yield unbiased gradients. However, we agree that formal bias proofs and variance bounds were not explicitly provided. In the revised version, we will include a new section or appendix with detailed bias analysis for the Renyi, Tsallis, and KL estimators, along with a comparison to the bias properties of the standard log-squared objective. This will strengthen the central claim. revision: yes

  2. Referee: [Section verifying correctness of the training scheme] The assertion that minimizing these divergences 'yields a provably correct' scheme requires a derivation showing equivalence (or controlled deviation) to the flow-matching conditions; without this, the 'provably correct' guarantee remains unverified.

    Authors: We believe the verification in the manuscript demonstrates that the minima of these divergences correspond to the desired flow-matching conditions, but we acknowledge that a more explicit derivation would be beneficial. We will revise the relevant section to provide a detailed proof showing how minimizing each of the four divergences enforces the flow-matching conditions, thereby making the 'provably correct' claim fully rigorous. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on designing stochastic gradient estimators for standard divergences (Renyi, Tsallis, forward/reverse KL) using established REINFORCE leave-one-out and score-matching control variates. These techniques are imported from the broader literature rather than defined or fitted within the paper itself. No equations reduce a claimed prediction to a fitted parameter by construction, no uniqueness theorems are invoked via self-citation, and no ansatz is smuggled through prior work by the same authors. The derivation chain therefore remains self-contained against external benchmarks and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the paper does not introduce new free parameters, domain-specific axioms beyond standard mathematical properties of divergences, or invented entities; it builds on existing divergence families and variance-reduction methods from the literature.

pith-pipeline@v0.9.0 · 5780 in / 1147 out tokens · 50368 ms · 2026-05-23T18:48:13.444607+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages · 6 internal anchors

  1. [1]

    S. Axler. Measure, Integration &; Real Analysis. Springer International Publishing, 2020. ISBN 9783030331436. doi: 10.1007/978-3-030-33143-6

  2. [2]

    A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 2018

  3. [3]

    Bengio, M

    E. Bengio, M. Jain, M. Korablyov, D. Precup, and Y . Bengio. Flow network based generative models for non-iterative diverse candidate generation. In NeurIPS (NeurIPS), 2021

  4. [4]

    Bengio, S

    Y . Bengio, S. Lahlou, T. Deleu, E. J. Hu, M. Tiwari, and E. Bengio. Gflownet foundations. Journal of Machine Learning Research (JMLR), 2023

  5. [5]

    A Conceptual Introduction to Hamiltonian Monte Carlo

    M. Betancourt. A conceptual introduction to hamiltonian monte carlo. arXiv preprint arXiv:1701.02434, 2017

  6. [6]

    C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007

  7. [7]

    D. M. Blei and et al. Variational inference: A review for statisticians. Journal of the American Statistical Association, 2017

  8. [8]

    Bradbury, R

    J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: composable transforma- tions of Python+NumPy programs, 2018

  9. [9]

    Buesing, N

    L. Buesing, N. Heess, and T. Weber. Approximate inference in discrete distributions with monte carlo tree search and value functions. In AISTATS, pages 624–634. PMLR, 2020

  10. [10]

    Burda, R

    Y . Burda, R. B. Grosse, and R. Salakhutdinov. Importance weighted autoencoders, 2016

  11. [11]

    Carbonetto, M

    P. Carbonetto, M. King, and F. Hamze. A stochastic approximation method for inference in probabilistic graphical models. NeurIPS, 2009

  12. [12]

    Carpenter, A

    B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language. Journal of statistical software, 2017

  13. [13]

    da Silva, E

    T. da Silva, E. Silva, A. Ribeiro, A. Góis, D. Heider, S. Kaski, and D. Mesquita. Human- in-the-loop causal discovery under latent confounding using ancestral gflownets. arXiv preprint:2309.12032, 2023

  14. [14]

    M. P. Deisenroth and C. E. Rasmussen. PILCO: A model-based and data-efficient approach to policy search. In ICML, Proceedings of Machine Learning Research, pages 465–472. PMLR, 2011

  15. [15]

    Deleu, A

    T. Deleu, A. Góis, C. C. Emezue, M. Rankawat, S. Lacoste-Julien, S. Bauer, and Y . Bengio. Bayesian structure learning with generative flow networks. In UAI, 2022. 10

  16. [16]

    Deleu, M

    T. Deleu, M. Nishikawa-Toomey, J. Subramanian, N. Malkin, L. Charlin, and Y . Bengio. Joint Bayesian inference of graphical structure and parameters with a single generative flow network. In Advances in Neural Processing Systems (NeurIPS), 2023

  17. [17]

    Deleu, P

    T. Deleu, P. Nouri, N. Malkin, D. Precup, and Y . Bengio. Discrete probabilistic inference as control in multi-path environments. CoRR, abs/2402.10309, 2024

  18. [18]

    Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

    S. Depeweg, J. M. Hernández-Lobato, F. Doshi-Velez, and S. Udluft. Learning and pol- icy search in stochastic dynamical systems with bayesian neural networks. arXiv preprint arXiv:1605.07127, 2016

  19. [19]

    A. B. Dieng, D. Tran, R. Ranganath, J. Paisley, and D. Blei. Variational inference via χ upper bound minimization. NeurIPS, 2017

  20. [20]

    J. Domke. Provable gradient variance guarantees for black-box variational inference. In NeurIPS, pages 328–337, 2019

  21. [21]

    J. Domke. Provable smoothness guarantees for black-box variational inference. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 2587–2596. PMLR, 2020

  22. [22]

    K. A. Dubey, S. J. Reddi, S. A. Williamson, B. Poczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient langevin dynamics. In NeurIPS, 2016

  23. [23]

    Dutta, M

    S. Dutta, M. Das, and U. Maulik. Towards causality-based explanation of aerial scene classifiers. IEEE Geoscience and Remote Sensing Letters, 2023

  24. [24]

    Falet, H

    J.-P. Falet, H. B. Lee, N. Malkin, C. Sun, D. Secrieru, D. Zhang, G. Lajoie, and Y . Bengio. Delta-ai: Local objectives for amortized inference in sparse graphical models, 2023

  25. [25]

    Felsenstein

    J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 1981

  26. [26]

    Garipov, S

    T. Garipov, S. D. Peuter, G. Yang, V . Garg, S. Kaski, and T. S. Jaakkola. Compositional sculpting of iterative generative processes. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  27. [27]

    C. J. Geyer. Markov chain monte carlo maximum likelihood. 1991

  28. [28]

    Hinton, N

    G. Hinton, N. Srivastava, and K. Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. 2012

  29. [29]

    E. J. Hu, M. Jain, E. Elmoznino, Y . Kaddar, and et al. Amortizing intractable inference in large language models, 2023

  30. [30]

    E. J. Hu, N. Malkin, M. Jain, K. E. Everett, A. Graikos, and Y . Bengio. Gflownet-em for learning compositional latent variable models. In International Conference on Machine Learning (ICLR), 2023

  31. [31]

    Huang and S

    Z. Huang and S. Becker. Stochastic gradient langevin dynamics with variance reduction. CoRR, 2021

  32. [32]

    M. Jain, E. Bengio, A. Hernandez-Garcia, J. Rector-Brooks, B. F. P. Dossou, C. A. Ekbote, J. Fu, T. Zhang, M. Kilgour, D. Zhang, L. Simine, P. Das, and Y . Bengio. Biological sequence design with GFlowNets. In International Conference on Machine Learning (ICML), 2022

  33. [33]

    M. Jain, T. Deleu, J. Hartford, C.-H. Liu, A. Hernandez-Garcia, and Y . Bengio. Gflownets for ai-driven scientific discovery. Digital Discovery, 2023

  34. [34]

    H. Jang, M. Kim, and S. Ahn. Learning energy decompositions for partial inference in GFlownets. In The Twelfth International Conference on Learning Representations, 2024

  35. [35]

    Järvenpää and J

    M. Järvenpää and J. Corander. On predictive inference for intractable models via approximate bayesian computation. Statistics and Computing, 33(2), Feb. 2023. ISSN 1573-1375. 11

  36. [36]

    M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Mach. Learn., 37(2):183–233, 1999

  37. [37]

    T. H. Jukes and C. R. Cantor. Evolution of protein molecules. In Mammalian Protein Metabolism. Elsevier, 1969

  38. [38]

    H. J. Kappen, V . Gómez, and M. Opper. Optimal control as a graphical model inference problem. Mach. Learn., 87(2):159–182, 2012

  39. [39]

    K. Kim, Y . Ma, and J. Gardner. Linear convergence of black-box variational inference: Should we stick the landing? In AISTATS, volume 238 of Proceedings of Machine Learning Research, pages 235–243. PMLR, 2024

  40. [40]

    M. Kim, T. Yun, E. Bengio, D. Zhang, Y . Bengio, S. Ahn, and J. Park. Local search gflownets. arXiv preprint arXiv:2310.02710, 2023

  41. [41]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  42. [42]

    D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling. Semi-supervised learning with deep generative models. NeurIPS, 2014

  43. [43]

    D. P. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models, 2023

  44. [44]

    Kinoshita and T

    Y . Kinoshita and T. Suzuki. Improved convergence rate of stochastic gradient langevin dynamics with variance reduction and its application to optimization. In NeurIPS, 2022

  45. [45]

    Knoblauch, J

    J. Knoblauch, J. Jewson, and T. Damoulas. An optimization-centric view on bayes’ rule: Reviewing and generalizing variational inference. J. Mach. Learn. Res., 23:132:1–132:109, 2022

  46. [46]

    Kucukelbir, D

    A. Kucukelbir, D. Tran, R. Ranganath, A. Gelman, and D. M. Blei. Automatic differentiation variational inference. J. Mach. Learn. Res., 18:14:1–14:45, 2017

  47. [47]

    Kullback and R

    S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 1951

  48. [48]

    Lahlou, T

    S. Lahlou, T. Deleu, P. Lemos, D. Zhang, A. V olokhova, A. Hernández-García, L. N. Ezzine, Y . Bengio, and N. Malkin. A theory of continuous generative flow networks. InICML, volume 202 of Proceedings of Machine Learning Research, pages 18269–18300. PMLR, 2023

  49. [49]

    E. Lau, N. M. Vemgal, D. Precup, and E. Bengio. DGFN: Double generative flow networks. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023

  50. [50]

    S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. CoRR, abs/1805.00909, 2018

  51. [51]

    Li and R

    Y . Li and R. E. Turner. Rényi divergence variational inference. NeurIPS, 29, 2016

  52. [52]

    Liu and et al

    D. Liu and et al. Gflowout: Dropout with generative flow networks. InInternational Conference on Machine Learning, ICML’23. JMLR.org, 2023

  53. [53]

    R. Liu, J. Regier, N. Tripuraneni, M. I. Jordan, and J. D. McAuliffe. Rao-blackwellized stochastic gradients for discrete distributions. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 4023–4031. PMLR, 2019

  54. [54]

    Madan, J

    K. Madan, J. Rector-Brooks, M. Korablyov, E. Bengio, M. Jain, A. C. Nica, T. Bosc, Y . Bengio, and N. Malkin. Learning gflownets from partial episodes for improved convergence and stability. In International Conference on Machine Learning, 2022

  55. [55]

    Malkin, M

    N. Malkin, M. Jain, E. Bengio, C. Sun, and Y . Bengio. Trajectory balance: Improved credit assignment in GFlownets. In NeurIPS (NeurIPS), 2022

  56. [56]

    Malkin, S

    N. Malkin, S. Lahlou, T. Deleu, X. Ji, E. Hu, K. Everett, D. Zhang, and Y . Bengio. GFlowNets and variational inference. International Conference on Learning Representations (ICLR) , 2023. 12

  57. [57]

    Mesquita, P

    D. Mesquita, P. Blomstedt, and S. Kaski. Embarrassingly parallel MCMC using deep invertible transformations. In UAI, 2019

  58. [58]

    T. Minka. Divergence measures and message passing. Technical report, Research, Microsoft, 2005. URL https://www.seas.harvard.edu/courses/cs281/papers/ minka-divergence.pdf

  59. [59]

    Mohamed, M

    S. Mohamed, M. Rosca, M. Figurnov, and A. Mnih. Monte carlo gradient estimation in machine learning. J. Mach. Learn. Res., 21:132:1–132:62, 2020

  60. [60]

    Mohammadpour, E

    S. Mohammadpour, E. Bengio, E. Frejinger, and P.-L. Bacon. Maximum entropy gflownets with soft q-learning, 2023

  61. [61]

    R. M. Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2011

  62. [62]

    A. C. Nica, M. Jain, E. Bengio, C.-H. Liu, M. Korablyov, M. M. Bronstein, and Y . Bengio. Evaluating generalization in gflownets for molecule design. In ICLR2022 Machine Learning for Drug Discovery, 2022

  63. [63]

    A. B. Owen. Monte Carlo theory, methods and examples. 2013

  64. [64]

    L. Pan, N. Malkin, D. Zhang, and Y . Bengio. Better training of gflownets with local credit and incomplete trajectories. arXiv preprint arXiv:2302.01687, 2023

  65. [65]

    L. Pan, D. Zhang, A. Courville, L. Huang, and Y . Bengio. Generative augmented flow networks. In International Conference on Learning Representations (ICLR), 2023

  66. [66]

    L. Pan, D. Zhang, M. Jain, L. Huang, and Y . Bengio. Stochastic generative flow networks. In UAI, volume 216 of Proceedings of Machine Learning Research, pages 1628–1638. PMLR, 2023

  67. [67]

    Papini, D

    M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, and M. Restelli. Stochastic variance-reduced policy gradient. In International Conference on Machine Learning, 2018

  68. [68]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library, 2019

  69. [69]

    Poczos and J

    B. Poczos and J. Schneider. On the estimation of α-divergences. In AISTATS. PMLR, 2011

  70. [70]

    Ranganath, S

    R. Ranganath, S. Gerrish, and D. M. Blei. Black box variational inference. In AISTATS, volume 33 of JMLR Workshop and Conference Proceedings, pages 814–822. JMLR.org, 2014

  71. [71]

    Ranganath, D

    R. Ranganath, D. Tran, J. Altosaar, and D. M. Blei. Operator variational inference. InNeurIPS, pages 496–504, 2016

  72. [72]

    Rector-Brooks, K

    J. Rector-Brooks, K. Madan, M. Jain, M. Korablyov, C.-H. Liu, S. Chandar, N. Malkin, and Y . Bengio. Thompson sampling for improved exploration in gflownets, 2023

  73. [73]

    A. Rényi. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. University of California Press, 1961

  74. [74]

    Rezende and S

    D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International conference on machine learning. PMLR, 2015

  75. [75]

    Rhodes and M

    B. Rhodes and M. U. Gutmann. Variational noise-contrastive estimation. In AISTATS, 2019

  76. [76]

    Richter, A

    L. Richter, A. Boustati, N. Nüsken, F. J. R. Ruiz, and Ö. D. Akyildiz. Vargrad: A low-variance gradient estimator for variational inference. 2020

  77. [77]

    C. P. Robert et al. The Bayesian choice: from decision-theoretic foundations to computational implementation, volume 2. Springer, 2007. 13

  78. [78]

    Roeder, Y

    G. Roeder, Y . Wu, and D. Duvenaud. Sticking the landing: simple, lower-variance gradient estimators for variational inference. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6928–6937, Red Hook, NY , USA,

  79. [79]

    ISBN 9781510860964

    Curran Associates Inc. ISBN 9781510860964

  80. [80]

    T. G. J. Rudner, V . Pong, R. McAllister, Y . Gal, and S. Levine. Outcome-driven reinforcement learning via variational inference. In NeurIPS, pages 13045–13058, 2021

Showing first 80 references.