A Multi-Stage Drop-the-Loser Design with Superiority Boundaries
Pith reviewed 2026-05-10 16:49 UTC · model grok-4.3
The pith
A multi-stage drop-the-loser design with superiority boundaries reduces expected sample size compared to standard drop-the-loser designs while lowering maximum sample size relative to traditional MAMS trials or separate trials.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a multi-stage drop-the-loser design that also allows early stopping of the entire trial for superiority. Analytical expressions are derived for the type I error rate, power, and expected sample size. In the motivating atrial fibrillation trial, this design substantially reduces the expected sample size compared to a standard drop-the-loser design while lowering the maximum sample size relative to running a traditional MAMS trial or multiple separate trials.
What carries the argument
The multi-stage drop-the-loser design with superiority boundaries, which drops a fixed number of treatments at each interim analysis and stops the whole trial early for superiority to control both maximum and expected sample sizes.
Load-bearing premise
The reported reductions in expected and maximum sample sizes hold only under the specific treatment effect assumptions, trial parameters, and boundary values chosen for the atrial fibrillation example.
What would settle it
A recalculation or simulation of the atrial fibrillation trial parameters under the proposed design that shows no substantial drop in expected sample size or no lowering of maximum sample size would falsify the performance claim.
Figures
read the original abstract
Multi-arm multi-stage (MAMS) trials have gained popularity, due to their improved efficiency in evaluating multiple treatments. A traditional MAMS trial often decreases the expected sample size of the trial compared to just running a multi-arm approach, but with the drawback of an increase in maximum sample size. For academic led trials this poses a particular challenge, as funding is typically based on the maximum required sample size. To address this, drop-the-loser designs were introduced, where a fixed number of treatments are dropped at each interim stage, thereby reducing the maximum sample size. In this work, we propose an enhanced multi-stage drop-the-loser design that also allows for early stopping of the entire trial for superiority. This approach aims to retain the benefits of a reduced maximum sample size while also lowering the expected sample size. The proposed design is motivated by a trial in atrial fibrillation. We derive analytical expressions for the type I error rate, power, and expected sample size, and compare the proposed design's performance to alternative methods. We outline the key requirements for implementing the proposed design and discuss the contexts in which it should be considered. For the motivating example the results show that the proposed design substantially reduces the expected sample size compared to a standard drop-the-loser design, while lowering the maximum sample size relative to running a traditional MAMS trial or multiple separate trials.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a multi-stage drop-the-loser (DTL) design for multi-arm multi-stage (MAMS) trials that incorporates superiority boundaries allowing early stopping of the entire trial. Analytical expressions are derived for the type I error rate, power, and expected sample size under this adaptive design. Performance is compared to standard DTL, traditional MAMS, and separate trials using a motivating atrial fibrillation example, with claims of substantially reduced expected sample size versus standard DTL while maintaining a lower maximum sample size than alternatives.
Significance. If the derivations hold, the design addresses a key practical constraint in academic trials (funding tied to maximum sample size) by combining DTL's fixed dropping rule with early superiority stopping to reduce expected sample size. The analytical approach, if reproducible, would enable exact operating characteristic calculations without reliance on simulation for design optimization.
major comments (2)
- [Methods and Results (atrial fibrillation example)] The central performance claims for the atrial fibrillation example rest on the derived analytical expressions for type I error, power, and expected sample size. These must correctly integrate the multivariate normal joint distribution of test statistics across stages with both the fixed-number arm-dropping rule and the superiority stopping boundaries; any omission in the recursive probability calculations or boundary adjustments would invalidate the reported reductions in expected sample size (see the methods section on operating characteristic derivations and the results for the motivating example).
- [Results (atrial fibrillation example)] The weakest assumption is that the expressions remain valid under the specific trial parameters, treatment effect assumptions, and boundary calculations chosen for the example. The manuscript should include explicit verification (e.g., via simulation checks or boundary sensitivity analysis) to confirm the expressions do not under- or over-count early stopping probabilities when arms are adaptively dropped.
minor comments (2)
- [Discussion] Ensure the discussion section clearly outlines the key requirements for implementation, including how superiority boundaries are calibrated relative to the dropping rule.
- [Methods] Clarify notation for stage-specific superiority boundaries and their relation to standard MAMS boundaries to avoid ambiguity in the analytical setup.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments. We respond to each major comment below.
read point-by-point responses
-
Referee: [Methods and Results (atrial fibrillation example)] The central performance claims for the atrial fibrillation example rest on the derived analytical expressions for type I error, power, and expected sample size. These must correctly integrate the multivariate normal joint distribution of test statistics across stages with both the fixed-number arm-dropping rule and the superiority stopping boundaries; any omission in the recursive probability calculations or boundary adjustments would invalidate the reported reductions in expected sample size (see the methods section on operating characteristic derivations and the results for the motivating example).
Authors: The analytical expressions are constructed by recursively integrating the joint multivariate normal distribution of the test statistics, conditioning at each stage on the outcomes for the remaining arms after the fixed-number dropping rule is applied and checking against the superiority boundaries. Boundary adjustments are made at each stage to reflect the reduced number of arms, and all possible paths (early stopping or continuation) are enumerated in the probability calculations. This structure ensures the reported operating characteristics are exact under the stated assumptions. revision: no
-
Referee: [Results (atrial fibrillation example)] The weakest assumption is that the expressions remain valid under the specific trial parameters, treatment effect assumptions, and boundary calculations chosen for the example. The manuscript should include explicit verification (e.g., via simulation checks or boundary sensitivity analysis) to confirm the expressions do not under- or over-count early stopping probabilities when arms are adaptively dropped.
Authors: We agree that explicit verification strengthens the presentation. In the revised version we will add Monte Carlo simulation results for the atrial fibrillation example that compare the analytical type I error, power, and expected sample size against simulated values under the same parameters and boundaries, with particular attention to early-stopping probabilities. revision: yes
Circularity Check
No circularity: analytical expressions for operating characteristics are newly derived from first principles
full rationale
The paper states it derives analytical expressions for type I error, power, and expected sample size under the proposed multi-stage drop-the-loser design with superiority boundaries, motivated by the atrial fibrillation example. These derivations integrate the joint multivariate normal distribution of test statistics, early stopping rules, and fixed dropping at stages. No load-bearing step reduces by construction to fitted inputs, self-definitional loops, or self-citation chains; the expressions are presented as independent calculations compared against standard MAMS and drop-the-loser benchmarks. This is self-contained and matches the expected non-finding for papers with explicit new derivations.
Axiom & Free-Parameter Ledger
free parameters (2)
- Stage-specific superiority boundaries
- Number of treatments dropped per stage
Reference graph
Works this paper leans on
-
[1]
Abbas, R., Wason, J., Michiels, S., and Le Teuff, G. (2022). A t wo-stage drop-the-losers de- sign for time-to-event outcome using a historical control a rm. Pharmaceutical Statistics, 21(1):268–288. Albini, A., Malavasi, V. L., Vitolo, M., Imberti, J. F., Mari etta, M., Lip, G. Y., and Boriani, G. (2021). Long-term outcomes of postoperative atr ial fibrill...
work page 2022
-
[2]
Frendl, G., Sodickson, A. C., Chung, M. K., Waldo, A. L., Gers h, B. J., Tisdale, J. E., Calkins, H., Aranki, S., Kaneko, T., Cassivi, S., et al. (201 4). 2014 aats guidelines for the prevention and management of peri-operative atrial fibr illation and flutter (poaf) for thoracic surgical procedures. The Journal of thoracic and cardiovascular surgery , 148(...
-
[3]
Urach, S. and Posch, M. (2016). Multi-arm group sequential d esigns with a simultaneous stopping rule. Statistics in medicine , 35(30):5536–5550. Vaporciyan, A. A., Correa, A. M., Rice, D. C., Roth, J. A., Smy the, W., Swisher, S. G., Walsh, G. L., and Putnam Jr, J. B. (2004). Risk factors associa ted with atrial fibrillation after noncardiac thoracic surg...
work page 2016
-
[4]
Wason, J. M. S. and Jaki, T. (2012). Optimal design of multi-a rm multi-stage trials. Statistics in Medicine , 31(30):4269–4279. Wason, J. M. S., Stecher, L., and Mander, A. P. (2014). Correc ting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials, 15(1). Wassmer, G., Pahlke, F., Jensen, T., Bove, D. S., Schueuerhui s, S., an...
work page 2012
-
[5]
5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al
The power under the LFC for the motivating example is therefo re 3∑ j=1 P (Φ j). 5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al. (2017) of Vk,j = σ 2(n− 1 k,j + n− 1 0,j ) where nk,j = nk⋆ ,j for all k, k ⋆ the covariance between the events Bk,j and Bk⋆ ,j ⋆ ; or Bk,j and Ak⋆ ,j ⋆ ; or Ak,j and Bk⋆ ,j ⋆ ; or...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.