pith. sign in

arxiv: 2605.28653 · v1 · pith:QVGDYG7Inew · submitted 2026-05-27 · 📊 stat.ME

Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data

Pith reviewed 2026-06-29 10:25 UTC · model grok-4.3

classification 📊 stat.ME
keywords e-valuesadaptive designsclinical trialsbinary outcomesdynamic programmingfutility stoppingsingle-arm trials
0
0 comments X

The pith

E-value designs optimized via dynamic programming for finite samples match standard adaptive designs in single-arm binary trials while guaranteeing anytime validity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs e-value based designs for adaptive single-arm trials with binary data that are optimal for a finite maximum sample size. It uses constrained dynamic programming to either maximize power or minimize expected sample size while respecting a significance level. These designs maintain anytime validity for type I error control under any stopping rule. They also feature automatic curtailment when e-values become too small to reach significance. A reader would care because clinical trials often have small samples where asymptotic optimality does not guarantee good performance.

Core claim

We investigate e-value-based designs with finite-horizon optimality for single-arm multi-stage clinical trials with binary data. We construct these designs through constrained dynamic programming based on the currently observed e-value, the maximum sample size, and the pre-specified significance level. Using exact calculations, we show that e-value-based designs can provide competitive operating characteristics to standard designs and outperform growth-rate-optimal e-values in finite samples. In addition, small e-values automatically indicate trial continuation is futile.

What carries the argument

Constrained dynamic programming to build design-optimal e-values that maximize power or minimize expected sample size under a finite horizon and significance level constraint.

If this is right

  • E-value designs guarantee type I error control at any stopping time, enabling flexible interim analyses.
  • Designs can be tuned to maximize power or minimize expected sample size with minimum power constraints.
  • Automatic futility stopping occurs when e-values reach zero or low values indicating impossibility of efficacy conclusion.
  • These designs provide competitive or better operating characteristics than non-adaptive or standard adaptive designs with futility stopping.
  • Outperformance over growth-rate-optimal e-values holds specifically in finite samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to multi-arm or response-adaptive designs as hinted in the conclusion.
  • Clinicians could use the betting interpretation to communicate evidence more intuitively to stakeholders.
  • Integration with existing trial software could allow routine use in early-phase cancer studies.

Load-bearing premise

The dynamic programming procedure based on the currently observed e-value, the maximum sample size, and the pre-specified significance level produces designs that are optimal for the finite-horizon binary-data setting.

What would settle it

Exact enumeration of type I error, power, and expected sample size for a given maximum sample size and alpha level, comparing the proposed e-value designs against group sequential designs with futility stopping.

Figures

Figures reproduced from arXiv: 2605.28653 by Joost van Rosmalen, Judith ter Schure, Stef Baas.

Figure 1
Figure 1. Figure 1: Cumulative type I error rate (first row), power (second row), probability of futility stopping when [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: n = 50, θ0 = 0.1, θ1 = 0.242, α = 0.05. Left: optimal bets BP-max t for the power-maximizing e-value. The (green) line with triangle markers indicates the e-values that exactly match the bound 1/α at the final interim under the growth rate realized under Kelly betting, calculated through (16), which corresponds to a bet equal to BKelly ≈ 0.158. The (purple) line with diamond markers indicates e-values unde… view at source ↗
Figure 3
Figure 3. Figure 3: shows the e-value-based design found under this procedure (the sets Mˆ , Bˆ were chosen the same as in the last section). We see that the initial bet is quite close to the Kelly bet (B1 = 0.161). In contrast to the ESS-min betting strategy, the betting strategy seems less time-homogeneous, and there are more settings in which it is optimal to bet more conservatively than Kelly betting. In comparison to the… view at source ↗
Figure 4
Figure 4. Figure 4: Distribution over time of the e-processes and bets for the P-max, ESS-min, and GROW betting strategies, as well as the e-value-based design. n = 50, θ0 = 0.1, θ1 = 0.242, α = 0.05, and β = 0.2. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
read the original abstract

The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I error rate control at any stopping time, facilitating repeated interim analyses, complex stopping rules, and valid inference under protocol deviations. The e-value literature focuses mostly on asymptotic optimality; however, sample sizes in clinical trials are often limited. To this end, we investigate e-value-based designs with finite-horizon optimality for single-arm multi-stage clinical trials with binary data. This setting is relevant in early-phase cancer trials, but it also facilitates an accessible introduction to the betting interpretation of e-values, which we use to construct e-values that either (1) maximize statistical power, or (2) minimize the expected sample size, with or without constraints on the minimum power. We construct these designs through (constrained) dynamic programming based on the currently observed e-value, the maximum sample size, and the pre-specified significance level. Using exact calculations, we show that, next to robustness, e-value-based designs can provide competitive operating characteristics to standard (non-)adaptive designs with and without futility stopping and outperform growth-rate-optimal e-values in finite samples. In addition, small e-values automatically indicate trial continuation is futile, e.g., an e-value of zero indicates the impossibility of an efficacy conclusion. Hence, e-value-based designs provide a viable alternative to the current state-of-the-art in single-arm binary trials, warranting extension to other adaptive clinical trial settings such as multi-arm multi-stage and response-adaptive designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper develops e-value-based adaptive designs for single-arm multi-stage clinical trials with binary data. Designs are constructed via constrained dynamic programming that optimizes either power or expected sample size (with power constraints) over a finite horizon, using the current e-value as the state variable. Exact backward induction and enumeration are used to compute operating characteristics, which are reported as competitive with or superior to standard (non-)adaptive designs with/without futility stopping, and superior to growth-rate-optimal e-values in finite samples. Automatic curtailment occurs when the e-value reaches zero or other small values indicating futility.

Significance. If the exact calculations and optimality claims hold, the work supplies a concrete, robust alternative to p-value-based adaptive designs that inherits anytime-valid type-I-error control from the e-value martingale property. The finite-horizon optimality via dynamic programming on the discrete e-value state, together with explicit comparisons on power, type-I error, and expected sample size, addresses a practical gap in early-phase binary trials. The automatic futility indication is an additional operational advantage.

minor comments (3)
  1. §3 (or wherever the Bellman recursion is stated): the transition probabilities for the binary e-value process should be written explicitly as functions of the success probability p under both null and alternative; this would make the exact enumeration reproducible without re-deriving the betting kernel.
  2. Table 1 or the operating-characteristics tables: report the exact maximum sample size N_max and the grid of possible e-value values used in the DP; without these the reader cannot verify that the reported ESS and power figures are obtained from the claimed finite enumeration.
  3. The comparison to growth-rate-optimal e-values would be strengthened by stating the precise objective function (e.g., expected log-growth) that is being optimized in the DP versus the asymptotic growth-rate criterion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our manuscript on finite-horizon optimal e-value designs for adaptive single-arm binary trials and for recommending minor revision. The referee's description accurately reflects the paper's contributions, including the use of constrained dynamic programming on the e-value state, exact operating characteristic calculations, and the automatic futility indication property.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper constructs finite-horizon optimal designs via explicit constrained dynamic programming whose state is the current e-value (a non-negative martingale), with the Bellman recursion, transition probabilities for binary data, and exact backward induction all stated directly in the manuscript. Operating characteristics (power, type-I error, expected sample size) are obtained by separate exact enumeration over the same state space, independent of the optimality objective. No load-bearing step reduces by definition or by self-citation chain to the reported performance numbers; the e-value betting construction and DP optimality criterion are external to the numerical comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard anytime-validity property of e-values and the binary data model for clinical outcomes; no free parameters are fitted to data in the described approach, and no new entities are postulated.

axioms (1)
  • domain assumption e-values ensure type I error rate control at any stopping time
    Invoked as the basis for allowing repeated interim analyses and valid inference under protocol deviations.

pith-pipeline@v0.9.1-grok · 5839 in / 1214 out tokens · 38342 ms · 2026-06-29T10:25:54.620325+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 23 canonical work pages · 2 internal anchors

  1. [1]

    Christopher Jennison and Bruce W

    URL https://doi.org/10.1186/s12916-018-1017-7. Christopher Jennison and Bruce W. Turnbull.Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC,

  2. [2]

    Judith ter Schure and Peter Grünwald

    URLhttps://doi.org/10.1093/jrsssb/qkae011. Judith ter Schure and Peter Grünwald. ALL-IN meta-analysis: Breathing life into living systematic reviews and prospec- tive meta-analyses.F1000Research, 11(549):1–31,

  3. [3]

    URL https://doi.org/10.12688/f1000research. 74223.2. Mohamed Ben-Eltriki, Aisha Rafiq, Arun Paul, Devashree Prabhu, Michael O. S. Afolabi, Robert Baslhaw, Christine J. Neilson, Michelle Driedger, Salaheddin M. Mahmud, Thierry Lacaze-Masmonteil, Susan Marlin, Martin Offringa, Nancy Butcher, Anna Heath, and Lauren E. Kelly. Adaptive designs in clinical tria...

  4. [4]

    Michael J

    URL https://doi.org/10.1186/s12874-024-02272-9 . Michael J. Grayling and Adrian P. Mander. Optimised point estimators for multi-stage single-arm phase II oncology trials. Journal of Biopharmaceutical Statistics, 32(6):817–831,

  5. [5]

    2022.2041656

    URL https://doi.org/10.1080/10543406. 2022.2041656. Yunchan Chi and Chia-Min Chen. Curtailed two-stage designs in phase II clinical trials.Statistics in Medicine, 27(29): 6175–6189,

  6. [6]

    Ayanbola O

    URLhttps://doi.org/10.1002/sim.3424. Ayanbola O. Ayanlowo and David T. Redden. Stochastically curtailed phase II clinical trials.Statistics in medicine, 26 (7):1462–1472,

  7. [7]

    Martin Law, Michael J

    URLhttps://doi.org/10.1002/sim.2653. Martin Law, Michael J. Grayling, and Adrian P. Mander. A stochastically curtailed single-arm phase II trial design for binary outcomes.Journal of Biopharmaceutical Statistics, 32(5):671–691,

  8. [8]

    1080/10543406.2021.2009498

    URLhttps://doi.org/10. 1080/10543406.2021.2009498. Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir V ovk. Test martingales, Bayes factors and p-values. Statistical Science, 26(1):84–101,

  9. [9]

    Aaditya Ramdas, Peter Grünwald, Vladimir V ovk, and Glenn Shafer

    URLhttps://doi.org/10.1214/10-STS347. Aaditya Ramdas, Peter Grünwald, Vladimir V ovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference.Statistical Science, 38(4):576–601,

  10. [10]

    URLhttps://doi.org/10.1214/23-STS894. Nick W. Koning and Sam van Meer. Anytime validity is free: Inducing sequential tests.Journal of the Royal Statistical Society Series B: Statistical Methodology, pages 1–19,

  11. [11]

    Abraham Wald

    doi:10.1002/j.1538-7305.1956.tb03809.x. Abraham Wald. Sequential tests of statistical hypotheses.The Annals of Mathematical Statistics, 16(2):117–186,

  12. [12]

    Leo Breiman

    URLhttps://www.jstor.org/stable/2235829. Leo Breiman. Optimal gambling systems for favorable games. InProc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, pages 65–78. Univ. California Press, Berkeley-Los Angeles, Calif.,

  13. [13]

    Testing by betting: A strategy for statistical and scientific communication

    URLhttps://doi.org/10.1111/rssa.12647. Jean Ville.Etude critique de la notion de collectif. PhD thesis, Gauthier-Villars, Paris,

  14. [14]

    Hypothesis Testing with E-values.Foundations and Trends in Statistics, Vol

    URLhttps://doi.org/10.1561/3600000002. Václav V oráˇcek and Francesco Orabona. STaR-bets: Sequential target-recalculating bets for tighter confidence intervals. arXiv preprint arXiv:2505.22422,

  15. [15]

    Ege Onur Taga, Samet Oymak, and Shubhanshu Shekhar

    URLhttps://arxiv.org/abs/2505.22422. Ege Onur Taga, Samet Oymak, and Shubhanshu Shekhar. Learning to bet for horizon-aware anytime-valid testing. arXiv preprint arXiv:2603.19551,

  16. [16]

    Learning to Bet for Horizon-Aware Anytime-Valid Testing

    URLhttps://arxiv.org/abs/2603.19551. Eugenio Clerico, Tobias Wegel, Iskander Azangulov, and Patrick Rebeschini. Time-sensitive anytime-valid testing. arXiv preprint arXiv:2605.06521,

  17. [17]

    Time-sensitive anytime-valid testing

    URLhttps://arxiv.org/abs/2605.06521. Tatsuki Koyama and Heidi Chen. Proper inference from Simon’s two-stage designs.Statistics in Medicine, 27(16): 3145–3154,

  18. [18]

    14 Adaptive clinical trials based on design-optimal e-values with automatic curtailmentA PREPRINT Lasse Fischer and Aaditya Ramdas

    URLhttps://doi.org/10.1002/sim.3123. 14 Adaptive clinical trials based on design-optimal e-values with automatic curtailmentA PREPRINT Lasse Fischer and Aaditya Ramdas. Improving Wald’s (approximate) sequential probability ratio test by avoiding overshoot.IEEE Transactions on Information Theory, 72(4):2457–2471,

  19. [19]

    Martin L

    URL https://doi.org/10.1109/ TIT.2026.3658855. Martin L. Puterman.Markov decision pocesses: Discrete stochastic dynamic programming. Hoboken, NY: John Wiley & Sons, first edition,

  20. [20]

    URLhttps://doi.org/10.1002/9780470316887. Valen E. Johnson. Uniformly most powerful Bayesian tests.Annals of statistics, 41(4):1716–1741,

  21. [21]

    Adrien P

    URL https://doi.org/10.1214/13-AOS1123. Adrien P. Mander and Simon G. Thompson. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials.Contemporary Clinical Trials, 31(6):572–578,

  22. [22]

    URL https: //doi.org/10.1016/j.cct.2010.07.008

    ISSN 1551-7144. URL https: //doi.org/10.1016/j.cct.2010.07.008. Stef Baas, Aleida Braaksma, and Richard J. Boucherie. Constrained Markov decision processes for response-adaptive procedures in clinical trials with binary outcomes.Annals of Operations Research, :1–51, 2025a. URL https: //doi.org/10.1007/s10479-025-06703-8. David S. Robertson, Kim May Lee, B...

  23. [23]

    doi:https://doi.org/10.1002/sim.7901. David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science, 38(4):557, 2023b. URLhttps://doi.org/10.1214/23-STS901. Jenny Lim, Robert Walley, Ji Yuan, C. Liu, D. Wright, and M. Frattini et al. Minimizing patient burden through the use of historical subject-level d...

  24. [24]

    15 Adaptive clinical trials based on design-optimal e-values with automatic curtailmentA PREPRINT To compare the P-max betting strategy with the theoretical results found in Taga et al. (2026), we show the e-values ¯mt exactly matching the bound 1/α at the final interim under the growth rate realized under Kelly betting, i.e., ¯mt = log(1/α)−E θ1[log(1 +B...

  25. [25]

    is indicated in gray, as there is no optimal bet size. Second, to the left of the HZ, we have the AHZ, where it is optimal to bet more conservatively to increase the probability of having a relatively high e-value when ending up in the HZ. Third, at the top-right of the betting strategy plots (top left as well for the ESS-min betting strategy), we see an ...