Optimal Stopping for Sequential Bayesian Experimental Design
read the original abstract
Sequential Bayesian experimental design typically assumes that the number of experiments is fixed before data collection begins. In practical campaigns, however, experimentation may need to terminate early because additional measurements can provide diminishing information relative to their cost, raising the central decision question: when should one stop? Common threshold-based stopping rules are easy to implement but myopic, because they compare the current state with a fixed criterion without accounting for the expected value of future experiments. This work develops a Bayesian optimal stopping framework for sequential experimental design by formulating stopping and design as coupled decisions in a Markov decision process. We prove that, for any design policy, the optimal stopping rule terminates exactly when the immediate terminal reward exceeds the expected continuation value. We then derive a policy gradient method for learning value-based stopping and design policies. Na\"ive joint training can create a circular dependency that traps learning in early-stopping local optima. We address this difficulty with a curriculum learning strategy that gradually transitions from forced continuation to adaptive stopping during training. Numerical studies on a linear-Gaussian benchmark, a one-dimensional nonlinear test problem, and a contaminant source detection problem show that the proposed approach learns stable design-stopping policies and improves resource-aware performance, with the largest gains in settings with strong sequential dependence.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Optimizing Social Utility in Sequential Experiments
A subsidy-based sequential RCT protocol modeled as a belief MDP increases social utility by more than 35% compared to standard trials.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.