Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data
Pith reviewed 2026-06-29 10:25 UTC · model grok-4.3
The pith
E-value designs optimized via dynamic programming for finite samples match standard adaptive designs in single-arm binary trials while guaranteeing anytime validity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We investigate e-value-based designs with finite-horizon optimality for single-arm multi-stage clinical trials with binary data. We construct these designs through constrained dynamic programming based on the currently observed e-value, the maximum sample size, and the pre-specified significance level. Using exact calculations, we show that e-value-based designs can provide competitive operating characteristics to standard designs and outperform growth-rate-optimal e-values in finite samples. In addition, small e-values automatically indicate trial continuation is futile.
What carries the argument
Constrained dynamic programming to build design-optimal e-values that maximize power or minimize expected sample size under a finite horizon and significance level constraint.
If this is right
- E-value designs guarantee type I error control at any stopping time, enabling flexible interim analyses.
- Designs can be tuned to maximize power or minimize expected sample size with minimum power constraints.
- Automatic futility stopping occurs when e-values reach zero or low values indicating impossibility of efficacy conclusion.
- These designs provide competitive or better operating characteristics than non-adaptive or standard adaptive designs with futility stopping.
- Outperformance over growth-rate-optimal e-values holds specifically in finite samples.
Where Pith is reading between the lines
- The approach may generalize to multi-arm or response-adaptive designs as hinted in the conclusion.
- Clinicians could use the betting interpretation to communicate evidence more intuitively to stakeholders.
- Integration with existing trial software could allow routine use in early-phase cancer studies.
Load-bearing premise
The dynamic programming procedure based on the currently observed e-value, the maximum sample size, and the pre-specified significance level produces designs that are optimal for the finite-horizon binary-data setting.
What would settle it
Exact enumeration of type I error, power, and expected sample size for a given maximum sample size and alpha level, comparing the proposed e-value designs against group sequential designs with futility stopping.
Figures
read the original abstract
The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I error rate control at any stopping time, facilitating repeated interim analyses, complex stopping rules, and valid inference under protocol deviations. The e-value literature focuses mostly on asymptotic optimality; however, sample sizes in clinical trials are often limited. To this end, we investigate e-value-based designs with finite-horizon optimality for single-arm multi-stage clinical trials with binary data. This setting is relevant in early-phase cancer trials, but it also facilitates an accessible introduction to the betting interpretation of e-values, which we use to construct e-values that either (1) maximize statistical power, or (2) minimize the expected sample size, with or without constraints on the minimum power. We construct these designs through (constrained) dynamic programming based on the currently observed e-value, the maximum sample size, and the pre-specified significance level. Using exact calculations, we show that, next to robustness, e-value-based designs can provide competitive operating characteristics to standard (non-)adaptive designs with and without futility stopping and outperform growth-rate-optimal e-values in finite samples. In addition, small e-values automatically indicate trial continuation is futile, e.g., an e-value of zero indicates the impossibility of an efficacy conclusion. Hence, e-value-based designs provide a viable alternative to the current state-of-the-art in single-arm binary trials, warranting extension to other adaptive clinical trial settings such as multi-arm multi-stage and response-adaptive designs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops e-value-based adaptive designs for single-arm multi-stage clinical trials with binary data. Designs are constructed via constrained dynamic programming that optimizes either power or expected sample size (with power constraints) over a finite horizon, using the current e-value as the state variable. Exact backward induction and enumeration are used to compute operating characteristics, which are reported as competitive with or superior to standard (non-)adaptive designs with/without futility stopping, and superior to growth-rate-optimal e-values in finite samples. Automatic curtailment occurs when the e-value reaches zero or other small values indicating futility.
Significance. If the exact calculations and optimality claims hold, the work supplies a concrete, robust alternative to p-value-based adaptive designs that inherits anytime-valid type-I-error control from the e-value martingale property. The finite-horizon optimality via dynamic programming on the discrete e-value state, together with explicit comparisons on power, type-I error, and expected sample size, addresses a practical gap in early-phase binary trials. The automatic futility indication is an additional operational advantage.
minor comments (3)
- §3 (or wherever the Bellman recursion is stated): the transition probabilities for the binary e-value process should be written explicitly as functions of the success probability p under both null and alternative; this would make the exact enumeration reproducible without re-deriving the betting kernel.
- Table 1 or the operating-characteristics tables: report the exact maximum sample size N_max and the grid of possible e-value values used in the DP; without these the reader cannot verify that the reported ESS and power figures are obtained from the claimed finite enumeration.
- The comparison to growth-rate-optimal e-values would be strengthened by stating the precise objective function (e.g., expected log-growth) that is being optimized in the DP versus the asymptotic growth-rate criterion.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our manuscript on finite-horizon optimal e-value designs for adaptive single-arm binary trials and for recommending minor revision. The referee's description accurately reflects the paper's contributions, including the use of constrained dynamic programming on the e-value state, exact operating characteristic calculations, and the automatic futility indication property.
Circularity Check
No significant circularity
full rationale
The paper constructs finite-horizon optimal designs via explicit constrained dynamic programming whose state is the current e-value (a non-negative martingale), with the Bellman recursion, transition probabilities for binary data, and exact backward induction all stated directly in the manuscript. Operating characteristics (power, type-I error, expected sample size) are obtained by separate exact enumeration over the same state space, independent of the optimality objective. No load-bearing step reduces by definition or by self-citation chain to the reported performance numbers; the e-value betting construction and DP optimality criterion are external to the numerical comparisons.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption e-values ensure type I error rate control at any stopping time
Reference graph
Works this paper leans on
-
[1]
Christopher Jennison and Bruce W
URL https://doi.org/10.1186/s12916-018-1017-7. Christopher Jennison and Bruce W. Turnbull.Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC,
-
[2]
Judith ter Schure and Peter Grünwald
URLhttps://doi.org/10.1093/jrsssb/qkae011. Judith ter Schure and Peter Grünwald. ALL-IN meta-analysis: Breathing life into living systematic reviews and prospec- tive meta-analyses.F1000Research, 11(549):1–31,
-
[3]
URL https://doi.org/10.12688/f1000research. 74223.2. Mohamed Ben-Eltriki, Aisha Rafiq, Arun Paul, Devashree Prabhu, Michael O. S. Afolabi, Robert Baslhaw, Christine J. Neilson, Michelle Driedger, Salaheddin M. Mahmud, Thierry Lacaze-Masmonteil, Susan Marlin, Martin Offringa, Nancy Butcher, Anna Heath, and Lauren E. Kelly. Adaptive designs in clinical tria...
-
[4]
URL https://doi.org/10.1186/s12874-024-02272-9 . Michael J. Grayling and Adrian P. Mander. Optimised point estimators for multi-stage single-arm phase II oncology trials. Journal of Biopharmaceutical Statistics, 32(6):817–831,
-
[5]
URL https://doi.org/10.1080/10543406. 2022.2041656. Yunchan Chi and Chia-Min Chen. Curtailed two-stage designs in phase II clinical trials.Statistics in Medicine, 27(29): 6175–6189,
-
[6]
URLhttps://doi.org/10.1002/sim.3424. Ayanbola O. Ayanlowo and David T. Redden. Stochastically curtailed phase II clinical trials.Statistics in medicine, 26 (7):1462–1472,
-
[7]
URLhttps://doi.org/10.1002/sim.2653. Martin Law, Michael J. Grayling, and Adrian P. Mander. A stochastically curtailed single-arm phase II trial design for binary outcomes.Journal of Biopharmaceutical Statistics, 32(5):671–691,
-
[8]
URLhttps://doi.org/10. 1080/10543406.2021.2009498. Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir V ovk. Test martingales, Bayes factors and p-values. Statistical Science, 26(1):84–101,
-
[9]
Aaditya Ramdas, Peter Grünwald, Vladimir V ovk, and Glenn Shafer
URLhttps://doi.org/10.1214/10-STS347. Aaditya Ramdas, Peter Grünwald, Vladimir V ovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601,
-
[10]
URLhttps://doi.org/10.1214/23-STS894. Nick W. Koning and Sam van Meer. Anytime validity is free: Inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, pages 1–19,
-
[11]
doi:10.1002/j.1538-7305.1956.tb03809.x. Abraham Wald. Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186,
-
[12]
URLhttps://www.jstor.org/stable/2235829. Leo Breiman. Optimal gambling systems for favorable games. InProc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pages 65–78. Univ. California Press, Berkeley-Los Angeles, Calif.,
-
[13]
Testing by betting: A strategy for statistical and scientific communication
URLhttps://doi.org/10.1111/rssa.12647. Jean Ville.Etude critique de la notion de collectif. PhD thesis, Gauthier-Villars, Paris,
-
[14]
Hypothesis Testing with E-values.Foundations and Trends in Statistics, Vol
URLhttps://doi.org/10.1561/3600000002. Václav V oráˇcek and Francesco Orabona. STaR-bets: Sequential target-recalculating bets for tighter confidence intervals. arXiv preprint arXiv:2505.22422,
-
[15]
Ege Onur Taga, Samet Oymak, and Shubhanshu Shekhar
URLhttps://arxiv.org/abs/2505.22422. Ege Onur Taga, Samet Oymak, and Shubhanshu Shekhar. Learning to bet for horizon-aware anytime-valid testing. arXiv preprint arXiv:2603.19551,
-
[16]
Learning to Bet for Horizon-Aware Anytime-Valid Testing
URLhttps://arxiv.org/abs/2603.19551. Eugenio Clerico, Tobias Wegel, Iskander Azangulov, and Patrick Rebeschini. Time-sensitive anytime-valid testing. arXiv preprint arXiv:2605.06521,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Time-sensitive anytime-valid testing
URLhttps://arxiv.org/abs/2605.06521. Tatsuki Koyama and Heidi Chen. Proper inference from Simon’s two-stage designs.Statistics in Medicine, 27(16): 3145–3154,
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
URLhttps://doi.org/10.1002/sim.3123. 14 Adaptive clinical trials based on design-optimal e-values with automatic curtailmentA PREPRINT Lasse Fischer and Aaditya Ramdas. Improving Wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, 72(4):2457–2471,
- [19]
-
[20]
URLhttps://doi.org/10.1002/9780470316887. Valen E. Johnson. Uniformly most powerful Bayesian tests.Annals of statistics, 41(4):1716–1741,
-
[21]
URL https://doi.org/10.1214/13-AOS1123. Adrien P. Mander and Simon G. Thompson. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials.Contemporary Clinical Trials, 31(6):572–578,
-
[22]
URL https: //doi.org/10.1016/j.cct.2010.07.008
ISSN 1551-7144. URL https: //doi.org/10.1016/j.cct.2010.07.008. Stef Baas, Aleida Braaksma, and Richard J. Boucherie. Constrained Markov decision processes for response-adaptive procedures in clinical trials with binary outcomes.Annals of Operations Research, :1–51, 2025a. URL https: //doi.org/10.1007/s10479-025-06703-8. David S. Robertson, Kim May Lee, B...
-
[23]
doi:https://doi.org/10.1002/sim.7901. David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science, 38(4):557, 2023b. URLhttps://doi.org/10.1214/23-STS901. Jenny Lim, Robert Walley, Ji Yuan, C. Liu, D. Wright, and M. Frattini et al. Minimizing patient burden through the use of historical subject-level d...
-
[24]
15 Adaptive clinical trials based on design-optimal e-values with automatic curtailmentA PREPRINT To compare the P-max betting strategy with the theoretical results found in Taga et al. (2026), we show the e-values ¯mt exactly matching the bound 1/α at the final interim under the growth rate realized under Kelly betting, i.e., ¯mt = log(1/α)−E θ1[log(1 +B...
2026
-
[25]
is indicated in gray, as there is no optimal bet size. Second, to the left of the HZ, we have the AHZ, where it is optimal to bet more conservatively to increase the probability of having a relatively high e-value when ending up in the HZ. Third, at the top-right of the betting strategy plots (top left as well for the ESS-min betting strategy), we see an ...
1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.