pith. sign in

arxiv: 2306.10430 · v2 · submitted 2023-06-17 · 📊 stat.ML · cs.AI· cs.LG· stat.CO· stat.ME

Variational Sequential Optimal Experimental Design using Reinforcement Learning

Pith reviewed 2026-05-24 08:19 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGstat.COstat.ME
keywords variational sequential optimal experimental designreinforcement learningexpected information gainBayesian experimental designactor-critic methodsGaussian mixture modelsnormalizing flowssequential design
0
0 comments X

The pith

vsOED uses one-point variational rewards and actor-critic reinforcement learning to optimize sequences of experiments for expected information gain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents variational sequential optimal experimental design as a method to choose finite sequences of experiments under Bayesian information criteria. It formulates the problem with a one-point reward based on variational posterior approximations that yields a provable lower bound on expected information gain. Actor-critic reinforcement learning then optimizes the design policy by estimating variational and policy gradients, while approximating posteriors with Gaussian mixture models or normalizing flows. The approach supports nuisance parameters, implicit likelihoods, multiple models, and flexible criteria that combine model discrimination with parameter inference or prediction goals.

Core claim

vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations.

What carries the argument

One-point reward formulation with variational posterior approximations that supplies a provable lower bound to expected information gain, optimized by actor-critic reinforcement learning with GMM or normalizing-flow posteriors.

If this is right

  • The method handles nuisance parameters, implicit likelihoods, and multiple candidate models without requiring explicit likelihood evaluations.
  • Flexible weighted combinations of design criteria become available for model discrimination, parameter inference, or goal-oriented prediction.
  • Numerical demonstrations show superior sample efficiency relative to prior sequential experimental design algorithms across engineering and science applications.
  • Posterior approximations via Gaussian mixture models and normalizing flows enable gradient-based policy updates inside the reinforcement learning loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the lower bound remains tight in higher dimensions, the same RL loop could support real-time adaptive design in settings where exact information gain is intractable.
  • The framework may connect to active learning loops in which the same variational reward structure is reused for online model updating rather than fixed-horizon sequences.
  • Testing the method on problems with discontinuous design spaces or non-stationary noise would reveal whether the current gradient estimation steps extend without modification.

Load-bearing premise

The variational approximation to the posterior stays accurate enough across the design sequence that the lower bound remains useful for producing a near-optimal policy.

What would settle it

A low-dimensional test case with known exact posteriors where the learned vsOED policy yields measurably lower realized information gain than the exact optimal policy computed by dynamic programming.

Figures

Figures reproduced from arXiv: 2306.10430 by Jiayuan Dong, Wanggang Shen, Xun Huan.

Figure 1
Figure 1. Figure 1: Case 1a. Expected utility comparisons using policies resulting from different algorithms. The shaded [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Case 1a. Examples of GMM and NFs approximate posterior and true posterior for PoIs. The red [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Case 1a. Average U˜ over four training replicates for ‘OED for QoIs’. The shaded regions represent the standard error. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Case 1a. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Case 1a. Design and posterior comparisons for ‘OED for QoIs’. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Case 1b. Average expected utility or U˜ over two training replicates versus design horizon N using policies resulting from the five OED scenarios. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Case 1b. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case 1b. Examples of approximate posterior and true posterior for model indicator using policy [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case 2. Expected utility comparisons using policies resulting from different algorithms. The shaded [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Case 2. Examples of GMM approximate posterior and true posterior at [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Case 3. Average U˜ over four training replicates versus design horizon N. The shaded regions represent the standard error. 0 20 40 60 80 100 Time 0 50 100 150 200 250 300 350 400 # Infected people R=2.18 R=5.42 R=19.34 (a) I(t) 0 20 40 60 80 100 Time 1 2 3 4 5 6 7 8 9 10 Stage R=2.18 R=5.42 R=19.34 (b) ξk [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Case 3. Examples of infected state trajectory [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Case 4. Examples of the true and surrogate concentration fields [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Case 4. Examples of approximate posterior and true posterior for model indicator and PoIs using [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Case 4. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗
read the original abstract

We present variational sequential optimal experimental design (vsOED), a novel method for optimally designing a finite sequence of experiments within a Bayesian framework with information-theoretic criteria. vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations. We demonstrate vsOED across various engineering and science applications, illustrating its superior sample efficiency compared to existing sequential experimental design algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces variational sequential optimal experimental design (vsOED) for optimally designing finite sequences of experiments under Bayesian information-theoretic criteria. It employs a one-point reward formulation based on variational posterior approximations (via GMMs or normalizing flows) that yields a provable lower bound on expected information gain, optimizes the resulting design policy via actor-critic reinforcement learning with derived variational and policy gradients, accommodates nuisance parameters/implicit likelihoods/multiple models, supports flexible weighted criteria (model discrimination, parameter inference, goal-oriented prediction), and demonstrates superior sample efficiency on engineering and science applications.

Significance. If the lower-bound property and gradient derivations hold, the work provides a scalable, theoretically grounded framework for sequential OED that extends standard variational inference and policy-gradient methods to handle sequential information gain while supporting practical model complexities; the explicit use of a provable ELBO-style bound and standard RL machinery for the surrogate objective is a strength.

minor comments (2)
  1. [Abstract] Abstract: the statement that the one-point reward 'provides a provable lower bound' would benefit from an explicit forward reference to the section containing the tower-property argument or derivation that preserves the bound across the sequence.
  2. The manuscript would be strengthened by adding a short paragraph in the methods section clarifying how the variational approximation error is controlled or monitored across design steps, as this directly affects whether the lower bound remains informative in practice.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary of our work and the positive recommendation for minor revision. No specific major comments were provided in the report, so we have no points requiring direct response or revision at this stage.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation relies on standard variational lower bounds (provable via KL divergence properties) applied to mutual information terms, combined with off-the-shelf actor-critic RL for policy optimization. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or self-citation chain from the same authors; the one-point reward and sequential decomposition preserve the bound via the tower property without internal redefinition. The GMM/NF approximations are treated as practical choices, not foundational inputs that force the result. This is the common case of a self-contained application of existing machinery.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method relies on standard assumptions from variational inference and reinforcement learning; no new free parameters, axioms, or invented entities are introduced beyond those already present in the cited literature on OED, VI, and RL.

axioms (2)
  • domain assumption Variational family (GMM or normalizing flow) can approximate the true posterior sufficiently well for the lower bound to be useful.
    Invoked when the one-point reward is defined via the variational posterior.
  • standard math The policy gradient and variational gradient estimators are unbiased or have controlled bias.
    Standard assumption when deriving gradients for actor-critic optimization.

pith-pipeline@v0.9.0 · 5674 in / 1412 out tokens · 23200 ms · 2026-05-24T08:19:07.323964+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 8 internal anchors

  1. [1]

    X. Huan, J. Jagalur, Y. Marzouk, Optimal experimental design: Formulations and com- putations, Acta Numerica 33 (2024) 715–840. doi:10.1017/S0962492924000023

  2. [2]

    Chaloner, I

    K. Chaloner, I. Verdinelli, Bayesian experimental design: A review, Statistical Science 10 (3) (1995) 273–304. doi:10.1214/ss/1177009939

  3. [3]

    E. G. Ryan, C. C. Drovandi, J. M. Mcgree, A. N. Pettitt, A review of modern compu- tational algorithms for Bayesian optimal design, International Statistical Review 84 (1) (2016) 128–154. doi:10.1111/insr.12107

  4. [4]

    Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001

    A. Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001. doi: 10.1088/1361-6420/abe10c

  5. [5]

    Rainforth, A

    T. Rainforth, A. Foster, D. R. Ivanova, F. B. Smith, Modern Bayesian experimental design, Statistical Science 39 (1) (2024) 100–114. doi:10.1214/23-STS915. 30

  6. [6]

    Strutz, A

    D. Strutz, A. Curtis, Variational Bayesian experimental design for geophysical applica- tions: Seismic source location, amplitude versus offset inversion, and estimating CO 2 saturations in a subsurface reservoir, Geophysical Journal International 236 (3) (2024) 1309–1331. doi:10.1093/gji/ggad492

  7. [7]

    D. V. Lindley, On a measure of the information provided by an experiment, The Annals of Mathematical Statistics 27 (4) (1956) 986–1005. doi:10.1214/aoms/1177728069

  8. [8]

    G. E. P. Box, Sequential experimentation and sequential assembly of designs, Quality Engineering 5 (2) (1992) 321–330. doi:10.1080/08982119208918971

  9. [9]

    H. A. Dror, D. M. Steinberg, Sequential experimental designs for generalized linear models, Journal of the American Statistical Association 103 (481) (2008) 288–298. doi:10.1198/016214507000001346

  10. [10]

    D. R. Cavagnaro, J. I. Myung, M. A. Pitt, J. V. Kujala, Adaptive design optimization: A mutual information-based approach to model discrimination in cognitive science, Neural Computation 22 (4) (2010) 887–905. doi:10.1162/neco.2009.02-09-959

  11. [11]

    Solonen, H

    A. Solonen, H. Haario, M. Laine, Simulation-based optimal design using a response vari- ance criterion, Journal of Computational and Graphical Statistics 21 (1) (2012) 234–252. doi:10.1198/jcgs.2011.10070

  12. [12]

    C. C. Drovandi, J. M. McGree, A. N. Pettitt, Sequential Monte Carlo for Bayesian sequen- tially designed experiments for discrete data, Computational Statistics & Data Analysis 57 (1) (2013) 320–335. doi:10.1016/j.csda.2012.05.014

  13. [13]

    C. C. Drovandi, J. M. McGree, A. N. Pettitt, A sequential Monte Carlo algorithm to incorporate model uncertainty in Bayesian sequential design, Journal of Computational and Graphical Statistics 23 (1) (2014) 3–24. doi:10.1080/10618600.2012.730083

  14. [14]

    W. Kim, M. A. Pitt, Z.-L. Lu, M. Steyvers, J. I. Myung, A hierarchical adaptive approach to optimal experimental design, Neural Computation 26 (2014) 2565–2492.doi:10.1162/ NECO_a_00654

  15. [15]

    Hainy, C

    M. Hainy, C. C. Drovandi, J. M. McGree, Likelihood-free extensions for Bayesian sequen- tially designed experiments, in: J. Kunert, C. M¨ uller, A. Atkinson (Eds.), mODa 11: Advances in Model-Oriented Design and Analysis, Contributions to Statistics, Springer, 2016, pp. 153–161

  16. [16]

    Kleinegesse, C

    S. Kleinegesse, C. Drovandi, M. U. Gutmann, Sequential Bayesian experimental design for implicit models via mutual information, Bayesian Analysis 16 (3) (2021) 773–802. doi:10.1214/20-BA1225

  17. [17]

    M¨ uller, D

    P. M¨ uller, D. A. Berry, A. P. Grieve, M. Smith, M. Krams, Simulation-based sequential Bayesian design, Journal of Statistical Planning and Inference 137 (10) (2007) 3140–3150. doi:10.1016/j.jspi.2006.05.021

  18. [18]

    Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999

    U. Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999. doi:10.1103/RevModPhys.83.943. 31

  19. [19]

    Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D

    X. Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D. thesis, Massachusetts Institute of Technology (2015)

  20. [20]

    X. Huan, Y. M. Marzouk, Sequential Bayesian optimal experimental design via approxi- mate dynamic programming (2016). arXiv:1604.08320

  21. [21]

    W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning, Computer Methods in Applied Mechanics and Engineering 416 (2023) 116304. doi:10.1016/j.cma.2023.116304

  22. [22]

    B. P. Carlin, J. B. Kadane, A. E. Gelfand, Approaches for optimal sequential decision analysis in clinical trials, Biometrics 54 (3) (1998) 964–975. doi:10.2307/2533849

  23. [23]

    Gautier, L

    R. Gautier, L. Pronzato, Adaptive control for sequential design, Discussiones Mathemat- icae Probability and Statistics 20 (1) (2000) 97–113. doi:10.7151/dmps.1006

  24. [24]

    Pronzato, ´E

    L. Pronzato, ´E. Thierry, Sequential experimental design and response optimisation, Sta- tistical Methods and Applications 11 (3) (2002) 277–292. doi:10.1007/BF02509828

  25. [25]

    A. E. Brockwell, J. B. Kadane, A gridding method for Bayesian sequential decision problems, Journal of Computational and Graphical Statistics 12 (3) (2003) 566–584. doi:10.1198/1061860032274

  26. [26]

    J. A. Christen, M. Nakamura, Sequential stopping rules for species accumulation, Journal of Agricultural, Biological & Environmental Statistics 8 (2) (2003) 184–195. doi:10. 1198/108571103322161540

  27. [27]

    S. A. Murphy, Optimal dynamic treatment regimes, Journal of the Royal Statistical Soci- ety: Series B (Statistical Methodology) 65 (2) (2003) 331–355. doi:10.1111/1467-9868. 00389

  28. [28]

    J. K. Wathen, J. A. Christen, Implementation of backward induction for sequentially adaptive clinical trials, Journal of Computational and Graphical Statistics 15 (2) (2006) 398–413. doi:10.1198/016214506X113406

  29. [29]

    M¨ uller, Y

    P. M¨ uller, Y. Duan, M. Garcia Tec, Simulation-based sequential design, Pharmaceutical Statistics 21 (4) (2022) 729–739. doi:10.1002/pst.2216

  30. [30]

    M. Tec, Y. Duan, P. M¨ uller, A comparative tutorial of Bayesian sequential design and reinforcement learning, The American Statistician 77 (2) (2023) 223–233. doi:10.1080/ 00031305.2022.2129787

  31. [31]

    W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning (2021). arXiv:2110.15335

  32. [32]

    Foster, D

    A. Foster, D. R. Ivanova, I. Malik, T. Rainforth, Deep adaptive design: Amortizing sequential Bayesian experimental design, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Vol. 139 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 3384–3395. 32

  33. [33]

    D. R. Ivanova, A. Foster, S. Kleinegesse, M. U. Gutmann, T. Rainforth, Implicit deep adaptive design: Policy-based experimental design without likelihoods, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Infor- mation Processing Systems 34, Curran Associates, 2021, pp. 25785–25798

  34. [34]

    T. Blau, E. V. Bonilla, I. Chades, A. Dezfouli, Optimizing sequential experimental design with deep reinforcement learning, in: K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, S. Sabato (Eds.), Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Vol. 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 2107–2128

  35. [35]

    X. Chen, C. Wang, Z. Zhou, K. Ross, Randomized ensembled double Q-learning: Learn- ing fast without a model, in: 9th International Conference on Learning Representations (ICLR 2021), 2021, available at https://openreview.net/forum?id=AY8zfZm0tDd

  36. [36]

    Poole, S

    B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, G. Tucker, On variational bounds of mutual information, in: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Vol. 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 5171–5180

  37. [37]

    Nguyen, M

    X. Nguyen, M. J. Wainwright, M. I. Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory 56 (11) (2010) 5847–5861. doi:10.1109/TIT.2010.2068870

  38. [38]

    M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, R. D. Hjelm, Mutual information neural estimation, in: Proceedings of the 35th International Con- ference on Machine Learning (ICML 2018), Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 531–540

  39. [39]

    Kleinegesse, M

    S. Kleinegesse, M. U. Gutmann, Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation (2020). arXiv:2002.08129

  40. [40]

    Representation Learning with Contrastive Predictive Coding

    A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding (2018). arXiv:1807.03748

  41. [41]

    Barber, F

    D. Barber, F. Agakov, The IM algorithm: A variational approach to information maxi- mization, in: Advances in Neural Information Processing Systems 16, MIT Press, 2003, pp. 201–208

  42. [42]

    Foster, M

    A. Foster, M. Jankowiak, E. Bingham, P. Horsfall, Y. W. Teh, T. Rainforth, N. Good- man, Variational Bayesian optimal experimental design, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Informa- tion Processing Systems 32, Curran Associates, 2019, pp. 14036–14047

  43. [43]

    J. Dong, C. Jacobsen, M. Khalloufi, M. Akram, W. Liu, K. Duraisamy, X. Huan, Varia- tional Bayesian optimal experimental design with normalizing flows, Computer Methods in Applied Mechanics and Engineering 433 (2025) 117457. doi:10.1016/j.cma.2024. 117457. 33

  44. [44]

    Papamakarios, E

    G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, B. Lakshminarayanan, Nor- malizing flows for probabilistic modeling and inference, Journal of Machine Learning Research 22 (1) (2021) 2617–2680

  45. [45]

    Kobyzev, S

    I. Kobyzev, S. J. Prince, M. A. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11) (2020) 3964–3979. doi:10.1109/TPAMI.2020.2992934

  46. [46]

    A. C. Atkinson, A. N. Donev, R. D. Tobias, Optimum Experimental Designs, with SAS, Oxford University Press, 2007

  47. [47]

    Attia, A

    A. Attia, A. Alexanderian, A. K. Saibaba, Goal-oriented optimal design of experiments for large-scale Bayesian linear inverse problems, Inverse Problems 34 (9) (2018) 095009. doi:10.1088/1361-6420/aad210

  48. [48]

    K. Wu, P. Chen, O. Ghattas, An offline-online decomposition method for efficient lin- ear Bayesian goal-oriented optimal experimental design: Application to optimal sen- sor placement, SIAM Journal on Scientific Computing 45 (1) (2023) B57–B77. doi: 10.1137/21M1466542

  49. [49]

    J. M. Bernardo, Expected information as expected utility, The Annals of Statistics 7 (3) (1979) 686–690. doi:10.1214/aos/1176344689

  50. [50]

    Butler, J

    T. Butler, J. D. Jakeman, T. Wildey, Optimal experimental design for prediction based on push-forward probability measures, Journal of Computational Physics 416 (2020) 109518. doi:10.1016/j.jcp.2020.109518

  51. [51]

    Butler, J

    T. Butler, J. Jakeman, T. Wildey, Combining push-forward measures and Bayes’ rule to construct consistent solutions to stochastic inverse problems, SIAM Journal on Scientific Computing 40 (2) (2018) A984–A1011. doi:10.1137/16M1087229

  52. [52]

    Butler, J

    T. Butler, J. Jakeman, T. Wildey, Convergence of probability densities using approximate models for forward and inverse problems in uncertainty quantification, SIAM Journal on Scientific Computing 40 (5) (2018) A3523–A3548. doi:10.1137/18M1181675

  53. [53]

    Bickford Smith, A

    F. Bickford Smith, A. Kirsch, S. Farquhar, Y. Gal, A. Foster, T. Rainforth, Prediction- oriented bayesian active learning, in: F. Ruiz, J. Dy, J.-W. van de Meent (Eds.), Proceed- ings of the 26th International Conference on Artificial Intelligence and Statistics, Vol. 206 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 7331–7348

  54. [54]

    Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

    S. Zhong, W. Shen, T. Catanach, X. Huan, Goal-oriented Bayesian optimal experimental design for nonlinear models using Markov chain Monte Carlo (2024). arXiv:2403.18072

  55. [55]

    Kleinegesse, M

    S. Kleinegesse, M. U. Gutmann, Gradient-based Bayesian experimental design for implicit models using mutual information lower bounds (2021). arXiv:2105.04379

  56. [56]

    Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212

    J. Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212. doi:10.1214/07-BA207

  57. [57]

    R. J. Williams, Simple statistical gradient-following algorithms for connectionist rein- forcement learning, Machine learning 8 (3) (1992) 229–256. doi:10.1007/BF00992696. 34

  58. [58]

    D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014). arXiv: 1412.6980

  59. [59]

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning (2015). arXiv:1509.02971

  60. [60]

    C. J. Watkins, P. Dayan, Q-learning, Machine learning 8 (3-4) (1992) 279–292. doi: 10.1007/BF00992698

  61. [61]

    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human- level control through deep reinforcement learning, Nature 518 (2015) 529–533. doi: 10.1038/nature14236

  62. [62]

    Foster, M

    A. Foster, M. Jankowiak, M. O’Meara, Y. W. Teh, T. Rainforth, A unified stochastic gradient approach to designing Bayesian-optimal experiments, in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Vol. 108 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 2959–2969

  63. [63]

    K. J. Arrow, H. B. Chenery, B. S. Minhas, R. M. Solow, Capital-labor substitution and economic efficiency, The Review of Economics and Statistics 43 (3) (1961) 225–250. doi:10.2307/1927286

  64. [64]

    A. R. Cook, G. J. Gibson, C. A. Gilligan, Optimal observation times in experimental epidemic processes, Biometrics 64 (3) (2008) 860–868. doi:10.1111/j.1541-0420.2007. 00931.x

  65. [65]

    L. J. Allen, A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis, Infectious Disease Modelling 2 (2) (2017) 128–142. doi:10.1016/j.idm. 2017.03.001

  66. [66]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimiza- tion algorithms (2017). arXiv:1707.06347

  67. [67]

    Schulman, S

    J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1889– 1897

  68. [68]

    Haarnoja, A

    T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum en- tropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Vol. 80 of Pro- ceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870

  69. [69]

    Fujimoto, H

    S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor- critic methods, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Confer- ence on Machine Learning, Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1587–1596. 35

  70. [70]

    D. M. Borth, A total entropy criterion for the dual problem of model discrimination and parameter estimation, Journal of the Royal Statistical Society: Series B (Methodological) 37 (1) (1975) 77–87. doi:10.1111/j.2517-6161.1975.tb01032.x

  71. [71]

    Burkardt, The truncated normal distribution, Tech

    J. Burkardt, The truncated normal distribution, Tech. rep., Florida State University, available at https://people.sc.fsu.edu/~jburkardt/presentations/truncated_ normal.pdf (2023)

  72. [72]

    Rezende, S

    D. Rezende, S. Mohamed, Variational inference with normalizing flows, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1530–1538

  73. [73]

    E. G. Tabak, E. Vanden-Eijnden, Density estimation by dual ascent of the log-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233

  74. [74]

    L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using Real NVP (2016). arXiv: 1605.08803

  75. [75]

    Kruse, G

    J. Kruse, G. Detommaso, U. K¨ othe, R. Scheichl, HINT: Hierarchical invertible neural transport for density estimation and Bayesian inference, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 8191–8199. doi:10.1609/aaai. v35i9.16997

  76. [76]

    S. T. Radev, U. K. Mertens, A. Voss, L. Ardizzone, U. K¨ othe, BayesFlow: Learning complex stochastic models with invertible neural networks, IEEE Transactions on Neural Networks and Learning Systems 33 (4) (2020) 1452–1466. doi:10.1109/TNNLS.2020. 3042395

  77. [77]

    D. P. Kingma, P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc., 2018

  78. [78]

    Ardizzone, J

    L. Ardizzone, J. Kruse, C. Rother, U. K¨ othe, Analyzing inverse problems with invertible neural networks, in: 7th International Conference on Learning Representations (ICLR 2019), 2019, available at https://openreview.net/forum?id=rJed6j0cKX

  79. [79]

    Draxler, S

    F. Draxler, S. Wahl, C. Schn¨ orr, U. K¨ othe, On the universality of coupling-based nor- malizing flows (2024). arXiv:2402.06578

  80. [80]

    G. A. Padmanabha, N. Zabaras, Solving inverse problems using conditional invertible neural networks, Journal of Computational Physics 433 (2021) 110194. doi:10.1016/j. jcp.2021.110194

Showing first 80 references.