Variational Sequential Optimal Experimental Design using Reinforcement Learning

Jiayuan Dong; Wanggang Shen; Xun Huan

arxiv: 2306.10430 · v2 · submitted 2023-06-17 · 📊 stat.ML · cs.AI· cs.LG· stat.CO· stat.ME

Variational Sequential Optimal Experimental Design using Reinforcement Learning

Wanggang Shen , Jiayuan Dong , Xun Huan This is my paper

Pith reviewed 2026-05-24 08:19 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGstat.COstat.ME

keywords variational sequential optimal experimental designreinforcement learningexpected information gainBayesian experimental designactor-critic methodsGaussian mixture modelsnormalizing flowssequential design

0 comments

The pith

vsOED uses one-point variational rewards and actor-critic reinforcement learning to optimize sequences of experiments for expected information gain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents variational sequential optimal experimental design as a method to choose finite sequences of experiments under Bayesian information criteria. It formulates the problem with a one-point reward based on variational posterior approximations that yields a provable lower bound on expected information gain. Actor-critic reinforcement learning then optimizes the design policy by estimating variational and policy gradients, while approximating posteriors with Gaussian mixture models or normalizing flows. The approach supports nuisance parameters, implicit likelihoods, multiple models, and flexible criteria that combine model discrimination with parameter inference or prediction goals.

Core claim

vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations.

What carries the argument

One-point reward formulation with variational posterior approximations that supplies a provable lower bound to expected information gain, optimized by actor-critic reinforcement learning with GMM or normalizing-flow posteriors.

If this is right

The method handles nuisance parameters, implicit likelihoods, and multiple candidate models without requiring explicit likelihood evaluations.
Flexible weighted combinations of design criteria become available for model discrimination, parameter inference, or goal-oriented prediction.
Numerical demonstrations show superior sample efficiency relative to prior sequential experimental design algorithms across engineering and science applications.
Posterior approximations via Gaussian mixture models and normalizing flows enable gradient-based policy updates inside the reinforcement learning loop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the lower bound remains tight in higher dimensions, the same RL loop could support real-time adaptive design in settings where exact information gain is intractable.
The framework may connect to active learning loops in which the same variational reward structure is reused for online model updating rather than fixed-horizon sequences.
Testing the method on problems with discontinuous design spaces or non-stationary noise would reveal whether the current gradient estimation steps extend without modification.

Load-bearing premise

The variational approximation to the posterior stays accurate enough across the design sequence that the lower bound remains useful for producing a near-optimal policy.

What would settle it

A low-dimensional test case with known exact posteriors where the learned vsOED policy yields measurably lower realized information gain than the exact optimal policy computed by dynamic programming.

Figures

Figures reproduced from arXiv: 2306.10430 by Jiayuan Dong, Wanggang Shen, Xun Huan.

**Figure 2.** Figure 2: Case 1a. Examples of GMM and NFs approximate posterior and true posterior for PoIs. The red [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Case 1a. Average U˜ over four training replicates for ‘OED for QoIs’. The shaded regions represent the standard error. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Case 1a. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Case 1a. Design and posterior comparisons for ‘OED for QoIs’. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Case 1b. Average expected utility or U˜ over two training replicates versus design horizon N using policies resulting from the five OED scenarios. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Case 1b. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Case 1b. Examples of approximate posterior and true posterior for model indicator using policy [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Case 2. Expected utility comparisons using policies resulting from different algorithms. The shaded [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Case 2. Examples of GMM approximate posterior and true posterior at [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: Case 3. Average U˜ over four training replicates versus design horizon N. The shaded regions represent the standard error. 0 20 40 60 80 100 Time 0 50 100 150 200 250 300 350 400 # Infected people R=2.18 R=5.42 R=19.34 (a) I(t) 0 20 40 60 80 100 Time 1 2 3 4 5 6 7 8 9 10 Stage R=2.18 R=5.42 R=19.34 (b) ξk [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Case 3. Examples of infected state trajectory [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

**Figure 13.** Figure 13: Case 4. Examples of the true and surrogate concentration fields [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Case 4. Examples of approximate posterior and true posterior for model indicator and PoIs using [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

**Figure 15.** Figure 15: Case 4. Examples of policy trajectory for [PITH_FULL_IMAGE:figures/full_fig_p029_15.png] view at source ↗

read the original abstract

We present variational sequential optimal experimental design (vsOED), a novel method for optimally designing a finite sequence of experiments within a Bayesian framework with information-theoretic criteria. vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations. We demonstrate vsOED across various engineering and science applications, illustrating its superior sample efficiency compared to existing sequential experimental design algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

vsOED gives a clean variational lower bound plus RL policy for sequential OED that handles implicit likelihoods, but the bound's practical tightness rests on the quality of the GMM or flow approximations over multiple steps.

read the letter

The main point is that this paper puts together a one-point reward based on a variational lower bound to expected information gain, then optimizes the design policy with actor-critic RL. The bound follows from standard variational inference on the posterior and carries through the sequential decomposition by the usual tower property for mutual information. That construction is new in this combination and lets the method work with implicit likelihoods, nuisance parameters, and weighted combinations of model discrimination and parameter inference goals. The demonstrations claim better sample efficiency than earlier sequential algorithms, which is the practical payoff they emphasize. Posterior approximation uses GMMs and normalizing flows, which are standard choices here. The RL gradients are derived in the usual way for the surrogate objective. One soft spot is that the lower bound only stays informative if the variational approximation does not degrade too much as the sequence of designs proceeds; if the GMM or flow fit loosens, the policy is optimizing something farther from the true information gain. The paper does not appear to include extensive checks on bound tightness across steps, so that remains a practical question rather than a theoretical one. The citation pattern looks normal for the subfield and does not lean on self-referential fitting. This work is aimed at people already doing Bayesian experimental design in engineering or scientific settings where models are expensive or black-box. A reader who needs to run sequential designs with mixed objectives would find the algorithm and the reported efficiency gains worth looking at. It is grounded enough and the claims are specific enough to deserve referee time rather than a desk reject.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces variational sequential optimal experimental design (vsOED) for optimally designing finite sequences of experiments under Bayesian information-theoretic criteria. It employs a one-point reward formulation based on variational posterior approximations (via GMMs or normalizing flows) that yields a provable lower bound on expected information gain, optimizes the resulting design policy via actor-critic reinforcement learning with derived variational and policy gradients, accommodates nuisance parameters/implicit likelihoods/multiple models, supports flexible weighted criteria (model discrimination, parameter inference, goal-oriented prediction), and demonstrates superior sample efficiency on engineering and science applications.

Significance. If the lower-bound property and gradient derivations hold, the work provides a scalable, theoretically grounded framework for sequential OED that extends standard variational inference and policy-gradient methods to handle sequential information gain while supporting practical model complexities; the explicit use of a provable ELBO-style bound and standard RL machinery for the surrogate objective is a strength.

minor comments (2)

[Abstract] Abstract: the statement that the one-point reward 'provides a provable lower bound' would benefit from an explicit forward reference to the section containing the tower-property argument or derivation that preserves the bound across the sequence.
The manuscript would be strengthened by adding a short paragraph in the methods section clarifying how the variational approximation error is controlled or monitored across design steps, as this directly affects whether the lower bound remains informative in practice.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary of our work and the positive recommendation for minor revision. No specific major comments were provided in the report, so we have no points requiring direct response or revision at this stage.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation relies on standard variational lower bounds (provable via KL divergence properties) applied to mutual information terms, combined with off-the-shelf actor-critic RL for policy optimization. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or self-citation chain from the same authors; the one-point reward and sequential decomposition preserve the bound via the tower property without internal redefinition. The GMM/NF approximations are treated as practical choices, not foundational inputs that force the result. This is the common case of a self-contained application of existing machinery.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method relies on standard assumptions from variational inference and reinforcement learning; no new free parameters, axioms, or invented entities are introduced beyond those already present in the cited literature on OED, VI, and RL.

axioms (2)

domain assumption Variational family (GMM or normalizing flow) can approximate the true posterior sufficiently well for the lower bound to be useful.
Invoked when the one-point reward is defined via the variational posterior.
standard math The policy gradient and variational gradient estimators are unbiased or have controlled bias.
Standard assumption when deriving gradients for actor-critic optimization.

pith-pipeline@v0.9.0 · 5674 in / 1412 out tokens · 23200 ms · 2026-05-24T08:19:07.323964+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 8 internal anchors

[1]

X. Huan, J. Jagalur, Y. Marzouk, Optimal experimental design: Formulations and com- putations, Acta Numerica 33 (2024) 715–840. doi:10.1017/S0962492924000023

work page doi:10.1017/s0962492924000023 2024
[2]

Chaloner, I

K. Chaloner, I. Verdinelli, Bayesian experimental design: A review, Statistical Science 10 (3) (1995) 273–304. doi:10.1214/ss/1177009939

work page doi:10.1214/ss/1177009939 1995
[3]

E. G. Ryan, C. C. Drovandi, J. M. Mcgree, A. N. Pettitt, A review of modern compu- tational algorithms for Bayesian optimal design, International Statistical Review 84 (1) (2016) 128–154. doi:10.1111/insr.12107

work page doi:10.1111/insr.12107 2016
[4]

Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001

A. Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001. doi: 10.1088/1361-6420/abe10c

work page doi:10.1088/1361-6420/abe10c 2021
[5]

Rainforth, A

T. Rainforth, A. Foster, D. R. Ivanova, F. B. Smith, Modern Bayesian experimental design, Statistical Science 39 (1) (2024) 100–114. doi:10.1214/23-STS915. 30

work page doi:10.1214/23-sts915 2024
[6]

Strutz, A

D. Strutz, A. Curtis, Variational Bayesian experimental design for geophysical applica- tions: Seismic source location, amplitude versus offset inversion, and estimating CO 2 saturations in a subsurface reservoir, Geophysical Journal International 236 (3) (2024) 1309–1331. doi:10.1093/gji/ggad492

work page doi:10.1093/gji/ggad492 2024
[7]

D. V. Lindley, On a measure of the information provided by an experiment, The Annals of Mathematical Statistics 27 (4) (1956) 986–1005. doi:10.1214/aoms/1177728069

work page doi:10.1214/aoms/1177728069 1956
[8]

G. E. P. Box, Sequential experimentation and sequential assembly of designs, Quality Engineering 5 (2) (1992) 321–330. doi:10.1080/08982119208918971

work page doi:10.1080/08982119208918971 1992
[9]

H. A. Dror, D. M. Steinberg, Sequential experimental designs for generalized linear models, Journal of the American Statistical Association 103 (481) (2008) 288–298. doi:10.1198/016214507000001346

work page doi:10.1198/016214507000001346 2008
[10]

D. R. Cavagnaro, J. I. Myung, M. A. Pitt, J. V. Kujala, Adaptive design optimization: A mutual information-based approach to model discrimination in cognitive science, Neural Computation 22 (4) (2010) 887–905. doi:10.1162/neco.2009.02-09-959

work page doi:10.1162/neco.2009.02-09-959 2010
[11]

Solonen, H

A. Solonen, H. Haario, M. Laine, Simulation-based optimal design using a response vari- ance criterion, Journal of Computational and Graphical Statistics 21 (1) (2012) 234–252. doi:10.1198/jcgs.2011.10070

work page doi:10.1198/jcgs.2011.10070 2012
[12]

C. C. Drovandi, J. M. McGree, A. N. Pettitt, Sequential Monte Carlo for Bayesian sequen- tially designed experiments for discrete data, Computational Statistics & Data Analysis 57 (1) (2013) 320–335. doi:10.1016/j.csda.2012.05.014

work page doi:10.1016/j.csda.2012.05.014 2013
[13]

C. C. Drovandi, J. M. McGree, A. N. Pettitt, A sequential Monte Carlo algorithm to incorporate model uncertainty in Bayesian sequential design, Journal of Computational and Graphical Statistics 23 (1) (2014) 3–24. doi:10.1080/10618600.2012.730083

work page doi:10.1080/10618600.2012.730083 2014
[14]

W. Kim, M. A. Pitt, Z.-L. Lu, M. Steyvers, J. I. Myung, A hierarchical adaptive approach to optimal experimental design, Neural Computation 26 (2014) 2565–2492.doi:10.1162/ NECO_a_00654

work page 2014
[15]

Hainy, C

M. Hainy, C. C. Drovandi, J. M. McGree, Likelihood-free extensions for Bayesian sequen- tially designed experiments, in: J. Kunert, C. M¨ uller, A. Atkinson (Eds.), mODa 11: Advances in Model-Oriented Design and Analysis, Contributions to Statistics, Springer, 2016, pp. 153–161

work page 2016
[16]

Kleinegesse, C

S. Kleinegesse, C. Drovandi, M. U. Gutmann, Sequential Bayesian experimental design for implicit models via mutual information, Bayesian Analysis 16 (3) (2021) 773–802. doi:10.1214/20-BA1225

work page doi:10.1214/20-ba1225 2021
[17]

M¨ uller, D

P. M¨ uller, D. A. Berry, A. P. Grieve, M. Smith, M. Krams, Simulation-based sequential Bayesian design, Journal of Statistical Planning and Inference 137 (10) (2007) 3140–3150. doi:10.1016/j.jspi.2006.05.021

work page doi:10.1016/j.jspi.2006.05.021 2007
[18]

Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999

U. Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999. doi:10.1103/RevModPhys.83.943. 31

work page doi:10.1103/revmodphys.83.943 2011
[19]

Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D

X. Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D. thesis, Massachusetts Institute of Technology (2015)

work page 2015
[20]

X. Huan, Y. M. Marzouk, Sequential Bayesian optimal experimental design via approxi- mate dynamic programming (2016). arXiv:1604.08320

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning, Computer Methods in Applied Mechanics and Engineering 416 (2023) 116304. doi:10.1016/j.cma.2023.116304

work page doi:10.1016/j.cma.2023.116304 2023
[22]

B. P. Carlin, J. B. Kadane, A. E. Gelfand, Approaches for optimal sequential decision analysis in clinical trials, Biometrics 54 (3) (1998) 964–975. doi:10.2307/2533849

work page doi:10.2307/2533849 1998
[23]

Gautier, L

R. Gautier, L. Pronzato, Adaptive control for sequential design, Discussiones Mathemat- icae Probability and Statistics 20 (1) (2000) 97–113. doi:10.7151/dmps.1006

work page doi:10.7151/dmps.1006 2000
[24]

Pronzato, ´E

L. Pronzato, ´E. Thierry, Sequential experimental design and response optimisation, Sta- tistical Methods and Applications 11 (3) (2002) 277–292. doi:10.1007/BF02509828

work page doi:10.1007/bf02509828 2002
[25]

A. E. Brockwell, J. B. Kadane, A gridding method for Bayesian sequential decision problems, Journal of Computational and Graphical Statistics 12 (3) (2003) 566–584. doi:10.1198/1061860032274

work page doi:10.1198/1061860032274 2003
[26]

J. A. Christen, M. Nakamura, Sequential stopping rules for species accumulation, Journal of Agricultural, Biological & Environmental Statistics 8 (2) (2003) 184–195. doi:10. 1198/108571103322161540

work page 2003
[27]

S. A. Murphy, Optimal dynamic treatment regimes, Journal of the Royal Statistical Soci- ety: Series B (Statistical Methodology) 65 (2) (2003) 331–355. doi:10.1111/1467-9868. 00389

work page doi:10.1111/1467-9868 2003
[28]

J. K. Wathen, J. A. Christen, Implementation of backward induction for sequentially adaptive clinical trials, Journal of Computational and Graphical Statistics 15 (2) (2006) 398–413. doi:10.1198/016214506X113406

work page doi:10.1198/016214506x113406 2006
[29]

M¨ uller, Y

P. M¨ uller, Y. Duan, M. Garcia Tec, Simulation-based sequential design, Pharmaceutical Statistics 21 (4) (2022) 729–739. doi:10.1002/pst.2216

work page doi:10.1002/pst.2216 2022
[30]

M. Tec, Y. Duan, P. M¨ uller, A comparative tutorial of Bayesian sequential design and reinforcement learning, The American Statistician 77 (2) (2023) 223–233. doi:10.1080/ 00031305.2022.2129787

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning (2021). arXiv:2110.15335

work page arXiv 2021
[32]

Foster, D

A. Foster, D. R. Ivanova, I. Malik, T. Rainforth, Deep adaptive design: Amortizing sequential Bayesian experimental design, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Vol. 139 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 3384–3395. 32

work page 2021
[33]

D. R. Ivanova, A. Foster, S. Kleinegesse, M. U. Gutmann, T. Rainforth, Implicit deep adaptive design: Policy-based experimental design without likelihoods, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Infor- mation Processing Systems 34, Curran Associates, 2021, pp. 25785–25798

work page 2021
[34]

T. Blau, E. V. Bonilla, I. Chades, A. Dezfouli, Optimizing sequential experimental design with deep reinforcement learning, in: K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, S. Sabato (Eds.), Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Vol. 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 2107–2128

work page 2022
[35]

X. Chen, C. Wang, Z. Zhou, K. Ross, Randomized ensembled double Q-learning: Learn- ing fast without a model, in: 9th International Conference on Learning Representations (ICLR 2021), 2021, available at https://openreview.net/forum?id=AY8zfZm0tDd

work page 2021
[36]

Poole, S

B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, G. Tucker, On variational bounds of mutual information, in: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Vol. 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 5171–5180

work page 2019
[37]

Nguyen, M

X. Nguyen, M. J. Wainwright, M. I. Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory 56 (11) (2010) 5847–5861. doi:10.1109/TIT.2010.2068870

work page doi:10.1109/tit.2010.2068870 2010
[38]

M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, R. D. Hjelm, Mutual information neural estimation, in: Proceedings of the 35th International Con- ference on Machine Learning (ICML 2018), Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 531–540

work page 2018
[39]

Kleinegesse, M

S. Kleinegesse, M. U. Gutmann, Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation (2020). arXiv:2002.08129

work page arXiv 2020
[40]

Representation Learning with Contrastive Predictive Coding

A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding (2018). arXiv:1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

Barber, F

D. Barber, F. Agakov, The IM algorithm: A variational approach to information maxi- mization, in: Advances in Neural Information Processing Systems 16, MIT Press, 2003, pp. 201–208

work page 2003
[42]

Foster, M

A. Foster, M. Jankowiak, E. Bingham, P. Horsfall, Y. W. Teh, T. Rainforth, N. Good- man, Variational Bayesian optimal experimental design, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Informa- tion Processing Systems 32, Curran Associates, 2019, pp. 14036–14047

work page 2019
[43]

J. Dong, C. Jacobsen, M. Khalloufi, M. Akram, W. Liu, K. Duraisamy, X. Huan, Varia- tional Bayesian optimal experimental design with normalizing flows, Computer Methods in Applied Mechanics and Engineering 433 (2025) 117457. doi:10.1016/j.cma.2024. 117457. 33

work page doi:10.1016/j.cma.2024 2025
[44]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, B. Lakshminarayanan, Nor- malizing flows for probabilistic modeling and inference, Journal of Machine Learning Research 22 (1) (2021) 2617–2680

work page 2021
[45]

Kobyzev, S

I. Kobyzev, S. J. Prince, M. A. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11) (2020) 3964–3979. doi:10.1109/TPAMI.2020.2992934

work page doi:10.1109/tpami.2020.2992934 2020
[46]

A. C. Atkinson, A. N. Donev, R. D. Tobias, Optimum Experimental Designs, with SAS, Oxford University Press, 2007

work page 2007
[47]

Attia, A

A. Attia, A. Alexanderian, A. K. Saibaba, Goal-oriented optimal design of experiments for large-scale Bayesian linear inverse problems, Inverse Problems 34 (9) (2018) 095009. doi:10.1088/1361-6420/aad210

work page doi:10.1088/1361-6420/aad210 2018
[48]

K. Wu, P. Chen, O. Ghattas, An offline-online decomposition method for efficient lin- ear Bayesian goal-oriented optimal experimental design: Application to optimal sen- sor placement, SIAM Journal on Scientific Computing 45 (1) (2023) B57–B77. doi: 10.1137/21M1466542

work page doi:10.1137/21m1466542 2023
[49]

J. M. Bernardo, Expected information as expected utility, The Annals of Statistics 7 (3) (1979) 686–690. doi:10.1214/aos/1176344689

work page doi:10.1214/aos/1176344689 1979
[50]

Butler, J

T. Butler, J. D. Jakeman, T. Wildey, Optimal experimental design for prediction based on push-forward probability measures, Journal of Computational Physics 416 (2020) 109518. doi:10.1016/j.jcp.2020.109518

work page doi:10.1016/j.jcp.2020.109518 2020
[51]

Butler, J

T. Butler, J. Jakeman, T. Wildey, Combining push-forward measures and Bayes’ rule to construct consistent solutions to stochastic inverse problems, SIAM Journal on Scientific Computing 40 (2) (2018) A984–A1011. doi:10.1137/16M1087229

work page doi:10.1137/16m1087229 2018
[52]

Butler, J

T. Butler, J. Jakeman, T. Wildey, Convergence of probability densities using approximate models for forward and inverse problems in uncertainty quantification, SIAM Journal on Scientific Computing 40 (5) (2018) A3523–A3548. doi:10.1137/18M1181675

work page doi:10.1137/18m1181675 2018
[53]

Bickford Smith, A

F. Bickford Smith, A. Kirsch, S. Farquhar, Y. Gal, A. Foster, T. Rainforth, Prediction- oriented bayesian active learning, in: F. Ruiz, J. Dy, J.-W. van de Meent (Eds.), Proceed- ings of the 26th International Conference on Artificial Intelligence and Statistics, Vol. 206 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 7331–7348

work page 2023
[54]

Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

S. Zhong, W. Shen, T. Catanach, X. Huan, Goal-oriented Bayesian optimal experimental design for nonlinear models using Markov chain Monte Carlo (2024). arXiv:2403.18072

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

Kleinegesse, M

S. Kleinegesse, M. U. Gutmann, Gradient-based Bayesian experimental design for implicit models using mutual information lower bounds (2021). arXiv:2105.04379

work page arXiv 2021
[56]

Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212

J. Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212. doi:10.1214/07-BA207

work page doi:10.1214/07-ba207 2007
[57]

R. J. Williams, Simple statistical gradient-following algorithms for connectionist rein- forcement learning, Machine learning 8 (3) (1992) 229–256. doi:10.1007/BF00992696. 34

work page doi:10.1007/bf00992696 1992
[58]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014). arXiv: 1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[59]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning (2015). arXiv:1509.02971

work page internal anchor Pith review Pith/arXiv arXiv 2015
[60]

C. J. Watkins, P. Dayan, Q-learning, Machine learning 8 (3-4) (1992) 279–292. doi: 10.1007/BF00992698

work page doi:10.1007/bf00992698 1992
[61]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human- level control through deep reinforcement learning, Nature 518 (2015) 529–533. doi: 10.1038/nature14236

work page doi:10.1038/nature14236 2015
[62]

Foster, M

A. Foster, M. Jankowiak, M. O’Meara, Y. W. Teh, T. Rainforth, A unified stochastic gradient approach to designing Bayesian-optimal experiments, in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Vol. 108 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 2959–2969

work page 2020
[63]

K. J. Arrow, H. B. Chenery, B. S. Minhas, R. M. Solow, Capital-labor substitution and economic efficiency, The Review of Economics and Statistics 43 (3) (1961) 225–250. doi:10.2307/1927286

work page doi:10.2307/1927286 1961
[64]

A. R. Cook, G. J. Gibson, C. A. Gilligan, Optimal observation times in experimental epidemic processes, Biometrics 64 (3) (2008) 860–868. doi:10.1111/j.1541-0420.2007. 00931.x

work page doi:10.1111/j.1541-0420.2007 2008
[65]

L. J. Allen, A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis, Infectious Disease Modelling 2 (2) (2017) 128–142. doi:10.1016/j.idm. 2017.03.001

work page doi:10.1016/j.idm 2017
[66]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimiza- tion algorithms (2017). arXiv:1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[67]

Schulman, S

J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1889– 1897

work page 2015
[68]

Haarnoja, A

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum en- tropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Vol. 80 of Pro- ceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870

work page 2018
[69]

Fujimoto, H

S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor- critic methods, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Confer- ence on Machine Learning, Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1587–1596. 35

work page 2018
[70]

D. M. Borth, A total entropy criterion for the dual problem of model discrimination and parameter estimation, Journal of the Royal Statistical Society: Series B (Methodological) 37 (1) (1975) 77–87. doi:10.1111/j.2517-6161.1975.tb01032.x

work page doi:10.1111/j.2517-6161.1975.tb01032.x 1975
[71]

Burkardt, The truncated normal distribution, Tech

J. Burkardt, The truncated normal distribution, Tech. rep., Florida State University, available at https://people.sc.fsu.edu/~jburkardt/presentations/truncated_ normal.pdf (2023)

work page 2023
[72]

Rezende, S

D. Rezende, S. Mohamed, Variational inference with normalizing flows, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1530–1538

work page 2015
[73]

E. G. Tabak, E. Vanden-Eijnden, Density estimation by dual ascent of the log-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233

work page 2010
[74]

L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using Real NVP (2016). arXiv: 1605.08803

work page internal anchor Pith review Pith/arXiv arXiv 2016
[75]

Kruse, G

J. Kruse, G. Detommaso, U. K¨ othe, R. Scheichl, HINT: Hierarchical invertible neural transport for density estimation and Bayesian inference, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 8191–8199. doi:10.1609/aaai. v35i9.16997

work page doi:10.1609/aaai 2021
[76]

S. T. Radev, U. K. Mertens, A. Voss, L. Ardizzone, U. K¨ othe, BayesFlow: Learning complex stochastic models with invertible neural networks, IEEE Transactions on Neural Networks and Learning Systems 33 (4) (2020) 1452–1466. doi:10.1109/TNNLS.2020. 3042395

work page doi:10.1109/tnnls.2020 2020
[77]

D. P. Kingma, P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc., 2018

work page 2018
[78]

Ardizzone, J

L. Ardizzone, J. Kruse, C. Rother, U. K¨ othe, Analyzing inverse problems with invertible neural networks, in: 7th International Conference on Learning Representations (ICLR 2019), 2019, available at https://openreview.net/forum?id=rJed6j0cKX

work page 2019
[79]

Draxler, S

F. Draxler, S. Wahl, C. Schn¨ orr, U. K¨ othe, On the universality of coupling-based nor- malizing flows (2024). arXiv:2402.06578

work page arXiv 2024
[80]

G. A. Padmanabha, N. Zabaras, Solving inverse problems using conditional invertible neural networks, Journal of Computational Physics 433 (2021) 110194. doi:10.1016/j. jcp.2021.110194

work page doi:10.1016/j 2021

Showing first 80 references.

[1] [1]

X. Huan, J. Jagalur, Y. Marzouk, Optimal experimental design: Formulations and com- putations, Acta Numerica 33 (2024) 715–840. doi:10.1017/S0962492924000023

work page doi:10.1017/s0962492924000023 2024

[2] [2]

Chaloner, I

K. Chaloner, I. Verdinelli, Bayesian experimental design: A review, Statistical Science 10 (3) (1995) 273–304. doi:10.1214/ss/1177009939

work page doi:10.1214/ss/1177009939 1995

[3] [3]

E. G. Ryan, C. C. Drovandi, J. M. Mcgree, A. N. Pettitt, A review of modern compu- tational algorithms for Bayesian optimal design, International Statistical Review 84 (1) (2016) 128–154. doi:10.1111/insr.12107

work page doi:10.1111/insr.12107 2016

[4] [4]

Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001

A. Alexanderian, Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review, Inverse Problems 37 (4) (2021) 043001. doi: 10.1088/1361-6420/abe10c

work page doi:10.1088/1361-6420/abe10c 2021

[5] [5]

Rainforth, A

T. Rainforth, A. Foster, D. R. Ivanova, F. B. Smith, Modern Bayesian experimental design, Statistical Science 39 (1) (2024) 100–114. doi:10.1214/23-STS915. 30

work page doi:10.1214/23-sts915 2024

[6] [6]

Strutz, A

D. Strutz, A. Curtis, Variational Bayesian experimental design for geophysical applica- tions: Seismic source location, amplitude versus offset inversion, and estimating CO 2 saturations in a subsurface reservoir, Geophysical Journal International 236 (3) (2024) 1309–1331. doi:10.1093/gji/ggad492

work page doi:10.1093/gji/ggad492 2024

[7] [7]

D. V. Lindley, On a measure of the information provided by an experiment, The Annals of Mathematical Statistics 27 (4) (1956) 986–1005. doi:10.1214/aoms/1177728069

work page doi:10.1214/aoms/1177728069 1956

[8] [8]

G. E. P. Box, Sequential experimentation and sequential assembly of designs, Quality Engineering 5 (2) (1992) 321–330. doi:10.1080/08982119208918971

work page doi:10.1080/08982119208918971 1992

[9] [9]

H. A. Dror, D. M. Steinberg, Sequential experimental designs for generalized linear models, Journal of the American Statistical Association 103 (481) (2008) 288–298. doi:10.1198/016214507000001346

work page doi:10.1198/016214507000001346 2008

[10] [10]

D. R. Cavagnaro, J. I. Myung, M. A. Pitt, J. V. Kujala, Adaptive design optimization: A mutual information-based approach to model discrimination in cognitive science, Neural Computation 22 (4) (2010) 887–905. doi:10.1162/neco.2009.02-09-959

work page doi:10.1162/neco.2009.02-09-959 2010

[11] [11]

Solonen, H

A. Solonen, H. Haario, M. Laine, Simulation-based optimal design using a response vari- ance criterion, Journal of Computational and Graphical Statistics 21 (1) (2012) 234–252. doi:10.1198/jcgs.2011.10070

work page doi:10.1198/jcgs.2011.10070 2012

[12] [12]

C. C. Drovandi, J. M. McGree, A. N. Pettitt, Sequential Monte Carlo for Bayesian sequen- tially designed experiments for discrete data, Computational Statistics & Data Analysis 57 (1) (2013) 320–335. doi:10.1016/j.csda.2012.05.014

work page doi:10.1016/j.csda.2012.05.014 2013

[13] [13]

C. C. Drovandi, J. M. McGree, A. N. Pettitt, A sequential Monte Carlo algorithm to incorporate model uncertainty in Bayesian sequential design, Journal of Computational and Graphical Statistics 23 (1) (2014) 3–24. doi:10.1080/10618600.2012.730083

work page doi:10.1080/10618600.2012.730083 2014

[14] [14]

W. Kim, M. A. Pitt, Z.-L. Lu, M. Steyvers, J. I. Myung, A hierarchical adaptive approach to optimal experimental design, Neural Computation 26 (2014) 2565–2492.doi:10.1162/ NECO_a_00654

work page 2014

[15] [15]

Hainy, C

M. Hainy, C. C. Drovandi, J. M. McGree, Likelihood-free extensions for Bayesian sequen- tially designed experiments, in: J. Kunert, C. M¨ uller, A. Atkinson (Eds.), mODa 11: Advances in Model-Oriented Design and Analysis, Contributions to Statistics, Springer, 2016, pp. 153–161

work page 2016

[16] [16]

Kleinegesse, C

S. Kleinegesse, C. Drovandi, M. U. Gutmann, Sequential Bayesian experimental design for implicit models via mutual information, Bayesian Analysis 16 (3) (2021) 773–802. doi:10.1214/20-BA1225

work page doi:10.1214/20-ba1225 2021

[17] [17]

M¨ uller, D

P. M¨ uller, D. A. Berry, A. P. Grieve, M. Smith, M. Krams, Simulation-based sequential Bayesian design, Journal of Statistical Planning and Inference 137 (10) (2007) 3140–3150. doi:10.1016/j.jspi.2006.05.021

work page doi:10.1016/j.jspi.2006.05.021 2007

[18] [18]

Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999

U. Von Toussaint, Bayesian inference in physics, Reviews of Modern Physics 83 (2011) 943–999. doi:10.1103/RevModPhys.83.943. 31

work page doi:10.1103/revmodphys.83.943 2011

[19] [19]

Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D

X. Huan, Numerical approaches for sequential Bayesian optimal experimental design, Ph.D. thesis, Massachusetts Institute of Technology (2015)

work page 2015

[20] [20]

X. Huan, Y. M. Marzouk, Sequential Bayesian optimal experimental design via approxi- mate dynamic programming (2016). arXiv:1604.08320

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning, Computer Methods in Applied Mechanics and Engineering 416 (2023) 116304. doi:10.1016/j.cma.2023.116304

work page doi:10.1016/j.cma.2023.116304 2023

[22] [22]

B. P. Carlin, J. B. Kadane, A. E. Gelfand, Approaches for optimal sequential decision analysis in clinical trials, Biometrics 54 (3) (1998) 964–975. doi:10.2307/2533849

work page doi:10.2307/2533849 1998

[23] [23]

Gautier, L

R. Gautier, L. Pronzato, Adaptive control for sequential design, Discussiones Mathemat- icae Probability and Statistics 20 (1) (2000) 97–113. doi:10.7151/dmps.1006

work page doi:10.7151/dmps.1006 2000

[24] [24]

Pronzato, ´E

L. Pronzato, ´E. Thierry, Sequential experimental design and response optimisation, Sta- tistical Methods and Applications 11 (3) (2002) 277–292. doi:10.1007/BF02509828

work page doi:10.1007/bf02509828 2002

[25] [25]

A. E. Brockwell, J. B. Kadane, A gridding method for Bayesian sequential decision problems, Journal of Computational and Graphical Statistics 12 (3) (2003) 566–584. doi:10.1198/1061860032274

work page doi:10.1198/1061860032274 2003

[26] [26]

J. A. Christen, M. Nakamura, Sequential stopping rules for species accumulation, Journal of Agricultural, Biological & Environmental Statistics 8 (2) (2003) 184–195. doi:10. 1198/108571103322161540

work page 2003

[27] [27]

S. A. Murphy, Optimal dynamic treatment regimes, Journal of the Royal Statistical Soci- ety: Series B (Statistical Methodology) 65 (2) (2003) 331–355. doi:10.1111/1467-9868. 00389

work page doi:10.1111/1467-9868 2003

[28] [28]

J. K. Wathen, J. A. Christen, Implementation of backward induction for sequentially adaptive clinical trials, Journal of Computational and Graphical Statistics 15 (2) (2006) 398–413. doi:10.1198/016214506X113406

work page doi:10.1198/016214506x113406 2006

[29] [29]

M¨ uller, Y

P. M¨ uller, Y. Duan, M. Garcia Tec, Simulation-based sequential design, Pharmaceutical Statistics 21 (4) (2022) 729–739. doi:10.1002/pst.2216

work page doi:10.1002/pst.2216 2022

[30] [30]

M. Tec, Y. Duan, P. M¨ uller, A comparative tutorial of Bayesian sequential design and reinforcement learning, The American Statistician 77 (2) (2023) 223–233. doi:10.1080/ 00031305.2022.2129787

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

W. Shen, X. Huan, Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning (2021). arXiv:2110.15335

work page arXiv 2021

[32] [32]

Foster, D

A. Foster, D. R. Ivanova, I. Malik, T. Rainforth, Deep adaptive design: Amortizing sequential Bayesian experimental design, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Vol. 139 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 3384–3395. 32

work page 2021

[33] [33]

D. R. Ivanova, A. Foster, S. Kleinegesse, M. U. Gutmann, T. Rainforth, Implicit deep adaptive design: Policy-based experimental design without likelihoods, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Infor- mation Processing Systems 34, Curran Associates, 2021, pp. 25785–25798

work page 2021

[34] [34]

T. Blau, E. V. Bonilla, I. Chades, A. Dezfouli, Optimizing sequential experimental design with deep reinforcement learning, in: K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, S. Sabato (Eds.), Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Vol. 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 2107–2128

work page 2022

[35] [35]

X. Chen, C. Wang, Z. Zhou, K. Ross, Randomized ensembled double Q-learning: Learn- ing fast without a model, in: 9th International Conference on Learning Representations (ICLR 2021), 2021, available at https://openreview.net/forum?id=AY8zfZm0tDd

work page 2021

[36] [36]

Poole, S

B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, G. Tucker, On variational bounds of mutual information, in: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Vol. 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 5171–5180

work page 2019

[37] [37]

Nguyen, M

X. Nguyen, M. J. Wainwright, M. I. Jordan, Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory 56 (11) (2010) 5847–5861. doi:10.1109/TIT.2010.2068870

work page doi:10.1109/tit.2010.2068870 2010

[38] [38]

M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, R. D. Hjelm, Mutual information neural estimation, in: Proceedings of the 35th International Con- ference on Machine Learning (ICML 2018), Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 531–540

work page 2018

[39] [39]

Kleinegesse, M

S. Kleinegesse, M. U. Gutmann, Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation (2020). arXiv:2002.08129

work page arXiv 2020

[40] [40]

Representation Learning with Contrastive Predictive Coding

A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding (2018). arXiv:1807.03748

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

Barber, F

D. Barber, F. Agakov, The IM algorithm: A variational approach to information maxi- mization, in: Advances in Neural Information Processing Systems 16, MIT Press, 2003, pp. 201–208

work page 2003

[42] [42]

Foster, M

A. Foster, M. Jankowiak, E. Bingham, P. Horsfall, Y. W. Teh, T. Rainforth, N. Good- man, Variational Bayesian optimal experimental design, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Informa- tion Processing Systems 32, Curran Associates, 2019, pp. 14036–14047

work page 2019

[43] [43]

J. Dong, C. Jacobsen, M. Khalloufi, M. Akram, W. Liu, K. Duraisamy, X. Huan, Varia- tional Bayesian optimal experimental design with normalizing flows, Computer Methods in Applied Mechanics and Engineering 433 (2025) 117457. doi:10.1016/j.cma.2024. 117457. 33

work page doi:10.1016/j.cma.2024 2025

[44] [44]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, B. Lakshminarayanan, Nor- malizing flows for probabilistic modeling and inference, Journal of Machine Learning Research 22 (1) (2021) 2617–2680

work page 2021

[45] [45]

Kobyzev, S

I. Kobyzev, S. J. Prince, M. A. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11) (2020) 3964–3979. doi:10.1109/TPAMI.2020.2992934

work page doi:10.1109/tpami.2020.2992934 2020

[46] [46]

A. C. Atkinson, A. N. Donev, R. D. Tobias, Optimum Experimental Designs, with SAS, Oxford University Press, 2007

work page 2007

[47] [47]

Attia, A

A. Attia, A. Alexanderian, A. K. Saibaba, Goal-oriented optimal design of experiments for large-scale Bayesian linear inverse problems, Inverse Problems 34 (9) (2018) 095009. doi:10.1088/1361-6420/aad210

work page doi:10.1088/1361-6420/aad210 2018

[48] [48]

K. Wu, P. Chen, O. Ghattas, An offline-online decomposition method for efficient lin- ear Bayesian goal-oriented optimal experimental design: Application to optimal sen- sor placement, SIAM Journal on Scientific Computing 45 (1) (2023) B57–B77. doi: 10.1137/21M1466542

work page doi:10.1137/21m1466542 2023

[49] [49]

J. M. Bernardo, Expected information as expected utility, The Annals of Statistics 7 (3) (1979) 686–690. doi:10.1214/aos/1176344689

work page doi:10.1214/aos/1176344689 1979

[50] [50]

Butler, J

T. Butler, J. D. Jakeman, T. Wildey, Optimal experimental design for prediction based on push-forward probability measures, Journal of Computational Physics 416 (2020) 109518. doi:10.1016/j.jcp.2020.109518

work page doi:10.1016/j.jcp.2020.109518 2020

[51] [51]

Butler, J

T. Butler, J. Jakeman, T. Wildey, Combining push-forward measures and Bayes’ rule to construct consistent solutions to stochastic inverse problems, SIAM Journal on Scientific Computing 40 (2) (2018) A984–A1011. doi:10.1137/16M1087229

work page doi:10.1137/16m1087229 2018

[52] [52]

Butler, J

T. Butler, J. Jakeman, T. Wildey, Convergence of probability densities using approximate models for forward and inverse problems in uncertainty quantification, SIAM Journal on Scientific Computing 40 (5) (2018) A3523–A3548. doi:10.1137/18M1181675

work page doi:10.1137/18m1181675 2018

[53] [53]

Bickford Smith, A

F. Bickford Smith, A. Kirsch, S. Farquhar, Y. Gal, A. Foster, T. Rainforth, Prediction- oriented bayesian active learning, in: F. Ruiz, J. Dy, J.-W. van de Meent (Eds.), Proceed- ings of the 26th International Conference on Artificial Intelligence and Statistics, Vol. 206 of Proceedings of Machine Learning Research, PMLR, 2023, pp. 7331–7348

work page 2023

[54] [54]

Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

S. Zhong, W. Shen, T. Catanach, X. Huan, Goal-oriented Bayesian optimal experimental design for nonlinear models using Markov chain Monte Carlo (2024). arXiv:2403.18072

work page internal anchor Pith review Pith/arXiv arXiv 2024

[55] [55]

Kleinegesse, M

S. Kleinegesse, M. U. Gutmann, Gradient-based Bayesian experimental design for implicit models using mutual information lower bounds (2021). arXiv:2105.04379

work page arXiv 2021

[56] [56]

Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212

J. Ginebra, On the measure of the information in a statistical experiment, Bayesian Analysis 2 (1) (2007) 167–212. doi:10.1214/07-BA207

work page doi:10.1214/07-ba207 2007

[57] [57]

R. J. Williams, Simple statistical gradient-following algorithms for connectionist rein- forcement learning, Machine learning 8 (3) (1992) 229–256. doi:10.1007/BF00992696. 34

work page doi:10.1007/bf00992696 1992

[58] [58]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014). arXiv: 1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[59] [59]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with deep reinforcement learning (2015). arXiv:1509.02971

work page internal anchor Pith review Pith/arXiv arXiv 2015

[60] [60]

C. J. Watkins, P. Dayan, Q-learning, Machine learning 8 (3-4) (1992) 279–292. doi: 10.1007/BF00992698

work page doi:10.1007/bf00992698 1992

[61] [61]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human- level control through deep reinforcement learning, Nature 518 (2015) 529–533. doi: 10.1038/nature14236

work page doi:10.1038/nature14236 2015

[62] [62]

Foster, M

A. Foster, M. Jankowiak, M. O’Meara, Y. W. Teh, T. Rainforth, A unified stochastic gradient approach to designing Bayesian-optimal experiments, in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Vol. 108 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 2959–2969

work page 2020

[63] [63]

K. J. Arrow, H. B. Chenery, B. S. Minhas, R. M. Solow, Capital-labor substitution and economic efficiency, The Review of Economics and Statistics 43 (3) (1961) 225–250. doi:10.2307/1927286

work page doi:10.2307/1927286 1961

[64] [64]

A. R. Cook, G. J. Gibson, C. A. Gilligan, Optimal observation times in experimental epidemic processes, Biometrics 64 (3) (2008) 860–868. doi:10.1111/j.1541-0420.2007. 00931.x

work page doi:10.1111/j.1541-0420.2007 2008

[65] [65]

L. J. Allen, A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis, Infectious Disease Modelling 2 (2) (2017) 128–142. doi:10.1016/j.idm. 2017.03.001

work page doi:10.1016/j.idm 2017

[66] [66]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimiza- tion algorithms (2017). arXiv:1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017

[67] [67]

Schulman, S

J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1889– 1897

work page 2015

[68] [68]

Haarnoja, A

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum en- tropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Vol. 80 of Pro- ceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870

work page 2018

[69] [69]

Fujimoto, H

S. Fujimoto, H. van Hoof, D. Meger, Addressing function approximation error in actor- critic methods, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Confer- ence on Machine Learning, Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1587–1596. 35

work page 2018

[70] [70]

D. M. Borth, A total entropy criterion for the dual problem of model discrimination and parameter estimation, Journal of the Royal Statistical Society: Series B (Methodological) 37 (1) (1975) 77–87. doi:10.1111/j.2517-6161.1975.tb01032.x

work page doi:10.1111/j.2517-6161.1975.tb01032.x 1975

[71] [71]

Burkardt, The truncated normal distribution, Tech

J. Burkardt, The truncated normal distribution, Tech. rep., Florida State University, available at https://people.sc.fsu.edu/~jburkardt/presentations/truncated_ normal.pdf (2023)

work page 2023

[72] [72]

Rezende, S

D. Rezende, S. Mohamed, Variational inference with normalizing flows, in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, Vol. 37 of Proceedings of Machine Learning Research, PMLR, 2015, pp. 1530–1538

work page 2015

[73] [73]

E. G. Tabak, E. Vanden-Eijnden, Density estimation by dual ascent of the log-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233

work page 2010

[74] [74]

L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using Real NVP (2016). arXiv: 1605.08803

work page internal anchor Pith review Pith/arXiv arXiv 2016

[75] [75]

Kruse, G

J. Kruse, G. Detommaso, U. K¨ othe, R. Scheichl, HINT: Hierarchical invertible neural transport for density estimation and Bayesian inference, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 8191–8199. doi:10.1609/aaai. v35i9.16997

work page doi:10.1609/aaai 2021

[76] [76]

S. T. Radev, U. K. Mertens, A. Voss, L. Ardizzone, U. K¨ othe, BayesFlow: Learning complex stochastic models with invertible neural networks, IEEE Transactions on Neural Networks and Learning Systems 33 (4) (2020) 1452–1466. doi:10.1109/TNNLS.2020. 3042395

work page doi:10.1109/tnnls.2020 2020

[77] [77]

D. P. Kingma, P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc., 2018

work page 2018

[78] [78]

Ardizzone, J

L. Ardizzone, J. Kruse, C. Rother, U. K¨ othe, Analyzing inverse problems with invertible neural networks, in: 7th International Conference on Learning Representations (ICLR 2019), 2019, available at https://openreview.net/forum?id=rJed6j0cKX

work page 2019

[79] [79]

Draxler, S

F. Draxler, S. Wahl, C. Schn¨ orr, U. K¨ othe, On the universality of coupling-based nor- malizing flows (2024). arXiv:2402.06578

work page arXiv 2024

[80] [80]

G. A. Padmanabha, N. Zabaras, Solving inverse problems using conditional invertible neural networks, Journal of Computational Physics 433 (2021) 110194. doi:10.1016/j. jcp.2021.110194

work page doi:10.1016/j 2021