A Multi-Stage Drop-the-Loser Design with Superiority Boundaries

Andrew Seely; Manel Khan; Peter Greenstreet; Pouya Motazedian; Salmaan Kanji; Stephanie Sibley; Tim Ramsay

arxiv: 2604.09467 · v1 · submitted 2026-04-10 · 📊 stat.ME · stat.AP

A Multi-Stage Drop-the-Loser Design with Superiority Boundaries

Peter Greenstreet , Manel Khan , Salmaan Kanji , Pouya Motazedian , Andrew Seely , Stephanie Sibley , Tim Ramsay This is my paper

Pith reviewed 2026-05-10 16:49 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords multi-arm multi-stage trialsdrop-the-loser designearly stopping for superioritysample size reductionclinical trial designtype I error controlatrial fibrillationexpected sample size

0 comments

The pith

A multi-stage drop-the-loser design with superiority boundaries reduces expected sample size compared to standard drop-the-loser designs while lowering maximum sample size relative to traditional MAMS trials or separate trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an enhanced drop-the-loser design for multi-arm multi-stage trials that incorporates early stopping of the entire trial when one treatment shows superiority. Standard drop-the-loser approaches already limit the maximum number of patients by dropping a fixed number of arms at each stage, but they do not stop early for clear benefit. The new design keeps that fixed maximum while adding the early-stopping option, which lowers the average number of patients required. Analytical formulas are given for type I error, power, and expected sample size, and these are evaluated in an atrial fibrillation example where the approach improves efficiency over basic drop-the-loser, full MAMS, or multiple separate trials.

Core claim

We propose a multi-stage drop-the-loser design that also allows early stopping of the entire trial for superiority. Analytical expressions are derived for the type I error rate, power, and expected sample size. In the motivating atrial fibrillation trial, this design substantially reduces the expected sample size compared to a standard drop-the-loser design while lowering the maximum sample size relative to running a traditional MAMS trial or multiple separate trials.

What carries the argument

The multi-stage drop-the-loser design with superiority boundaries, which drops a fixed number of treatments at each interim analysis and stops the whole trial early for superiority to control both maximum and expected sample sizes.

Load-bearing premise

The reported reductions in expected and maximum sample sizes hold only under the specific treatment effect assumptions, trial parameters, and boundary values chosen for the atrial fibrillation example.

What would settle it

A recalculation or simulation of the atrial fibrillation trial parameters under the proposed design that shows no substantial drop in expected sample size or no lowering of maximum sample size would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2604.09467 by Andrew Seely, Manel Khan, Peter Greenstreet, Pouya Motazedian, Salmaan Kanji, Stephanie Sibley, Tim Ramsay.

read the original abstract

Multi-arm multi-stage (MAMS) trials have gained popularity, due to their improved efficiency in evaluating multiple treatments. A traditional MAMS trial often decreases the expected sample size of the trial compared to just running a multi-arm approach, but with the drawback of an increase in maximum sample size. For academic led trials this poses a particular challenge, as funding is typically based on the maximum required sample size. To address this, drop-the-loser designs were introduced, where a fixed number of treatments are dropped at each interim stage, thereby reducing the maximum sample size. In this work, we propose an enhanced multi-stage drop-the-loser design that also allows for early stopping of the entire trial for superiority. This approach aims to retain the benefits of a reduced maximum sample size while also lowering the expected sample size. The proposed design is motivated by a trial in atrial fibrillation. We derive analytical expressions for the type I error rate, power, and expected sample size, and compare the proposed design's performance to alternative methods. We outline the key requirements for implementing the proposed design and discuss the contexts in which it should be considered. For the motivating example the results show that the proposed design substantially reduces the expected sample size compared to a standard drop-the-loser design, while lowering the maximum sample size relative to running a traditional MAMS trial or multiple separate trials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds early superiority stopping to a drop-the-loser MAMS design and supplies closed-form expressions for the operating characteristics, but the derivations need direct verification before the claimed sample-size savings can be trusted.

read the letter

The core idea is straightforward: keep the fixed dropping rule that caps maximum sample size in multi-arm trials, but layer on the option to stop the entire study early for superiority. For the atrial fibrillation example this produces a lower expected sample size than a plain drop-the-loser design while still avoiding the higher maximum size of a standard MAMS trial. That combination is the actual novelty; prior drop-the-loser work did not include the superiority boundary, and the authors derive new analytical expressions for type I error, power, and expected sample size rather than relying solely on simulation. They also spell out the practical requirements for implementation and the settings where the design makes sense. Those pieces are useful for anyone planning academic multi-arm studies where funding is tied to the worst-case enrollment. The main soft spot is that the reported gains rest entirely on the accuracy of those analytical expressions. The stress-test note correctly flags the difficulty of correctly integrating the joint multivariate normal distributions across stages with the adaptive arm selection and the fixed dropping rule. Without seeing the explicit formulas, the recursive probability steps, or any simulation cross-checks, it is impossible to tell whether the boundary adjustments or the handling of the remaining arms are exact for the chosen number of stages and arms. Minor implementation details such as how the stage-specific superiority boundaries are calibrated could also affect the results. This is a targeted methods paper aimed at trial statisticians who already work with MAMS or drop-the-loser designs. A reader who needs the closed-form expressions or the concrete example will get something concrete from it. It is worth sending for peer review so that the derivations can be checked in detail; the idea is narrow enough and the analytical route is worth confirming rather than desk-rejecting outright.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a multi-stage drop-the-loser (DTL) design for multi-arm multi-stage (MAMS) trials that incorporates superiority boundaries allowing early stopping of the entire trial. Analytical expressions are derived for the type I error rate, power, and expected sample size under this adaptive design. Performance is compared to standard DTL, traditional MAMS, and separate trials using a motivating atrial fibrillation example, with claims of substantially reduced expected sample size versus standard DTL while maintaining a lower maximum sample size than alternatives.

Significance. If the derivations hold, the design addresses a key practical constraint in academic trials (funding tied to maximum sample size) by combining DTL's fixed dropping rule with early superiority stopping to reduce expected sample size. The analytical approach, if reproducible, would enable exact operating characteristic calculations without reliance on simulation for design optimization.

major comments (2)

[Methods and Results (atrial fibrillation example)] The central performance claims for the atrial fibrillation example rest on the derived analytical expressions for type I error, power, and expected sample size. These must correctly integrate the multivariate normal joint distribution of test statistics across stages with both the fixed-number arm-dropping rule and the superiority stopping boundaries; any omission in the recursive probability calculations or boundary adjustments would invalidate the reported reductions in expected sample size (see the methods section on operating characteristic derivations and the results for the motivating example).
[Results (atrial fibrillation example)] The weakest assumption is that the expressions remain valid under the specific trial parameters, treatment effect assumptions, and boundary calculations chosen for the example. The manuscript should include explicit verification (e.g., via simulation checks or boundary sensitivity analysis) to confirm the expressions do not under- or over-count early stopping probabilities when arms are adaptively dropped.

minor comments (2)

[Discussion] Ensure the discussion section clearly outlines the key requirements for implementation, including how superiority boundaries are calibrated relative to the dropping rule.
[Methods] Clarify notation for stage-specific superiority boundaries and their relation to standard MAMS boundaries to avoid ambiguity in the analytical setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments. We respond to each major comment below.

read point-by-point responses

Referee: [Methods and Results (atrial fibrillation example)] The central performance claims for the atrial fibrillation example rest on the derived analytical expressions for type I error, power, and expected sample size. These must correctly integrate the multivariate normal joint distribution of test statistics across stages with both the fixed-number arm-dropping rule and the superiority stopping boundaries; any omission in the recursive probability calculations or boundary adjustments would invalidate the reported reductions in expected sample size (see the methods section on operating characteristic derivations and the results for the motivating example).

Authors: The analytical expressions are constructed by recursively integrating the joint multivariate normal distribution of the test statistics, conditioning at each stage on the outcomes for the remaining arms after the fixed-number dropping rule is applied and checking against the superiority boundaries. Boundary adjustments are made at each stage to reflect the reduced number of arms, and all possible paths (early stopping or continuation) are enumerated in the probability calculations. This structure ensures the reported operating characteristics are exact under the stated assumptions. revision: no
Referee: [Results (atrial fibrillation example)] The weakest assumption is that the expressions remain valid under the specific trial parameters, treatment effect assumptions, and boundary calculations chosen for the example. The manuscript should include explicit verification (e.g., via simulation checks or boundary sensitivity analysis) to confirm the expressions do not under- or over-count early stopping probabilities when arms are adaptively dropped.

Authors: We agree that explicit verification strengthens the presentation. In the revised version we will add Monte Carlo simulation results for the atrial fibrillation example that compare the analytical type I error, power, and expected sample size against simulated values under the same parameters and boundaries, with particular attention to early-stopping probabilities. revision: yes

Circularity Check

0 steps flagged

No circularity: analytical expressions for operating characteristics are newly derived from first principles

full rationale

The paper states it derives analytical expressions for type I error, power, and expected sample size under the proposed multi-stage drop-the-loser design with superiority boundaries, motivated by the atrial fibrillation example. These derivations integrate the joint multivariate normal distribution of test statistics, early stopping rules, and fixed dropping at stages. No load-bearing step reduces by construction to fitted inputs, self-definitional loops, or self-citation chains; the expressions are presented as independent calculations compared against standard MAMS and drop-the-loser benchmarks. This is self-contained and matches the expected non-finding for papers with explicit new derivations.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the design relies on standard frequentist assumptions for sequential testing (e.g., independent increments of test statistics and normal approximations) but introduces no new free parameters or invented entities beyond conventional design parameters such as stage-specific boundaries and dropping counts.

free parameters (2)

Stage-specific superiority boundaries
Chosen to achieve desired type I error and power; values are design parameters rather than data-fitted constants.
Number of treatments dropped per stage
Fixed by design choice to control maximum sample size.

pith-pipeline@v0.9.0 · 5561 in / 1245 out tokens · 44578 ms · 2026-05-10T16:49:37.406396+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Abbas, R., Wason, J., Michiels, S., and Le Teuﬀ, G. (2022). A t wo-stage drop-the-losers de- sign for time-to-event outcome using a historical control a rm. Pharmaceutical Statistics, 21(1):268–288. Albini, A., Malavasi, V. L., Vitolo, M., Imberti, J. F., Mari etta, M., Lip, G. Y., and Boriani, G. (2021). Long-term outcomes of postoperative atr ial ﬁbrill...

work page 2022
[2]

C., Chung, M

Frendl, G., Sodickson, A. C., Chung, M. K., Waldo, A. L., Gers h, B. J., Tisdale, J. E., Calkins, H., Aranki, S., Kaneko, T., Cassivi, S., et al. (201 4). 2014 aats guidelines for the prevention and management of peri-operative atrial ﬁbr illation and ﬂutter (poaf) for thoracic surgical procedures. The Journal of thoracic and cardiovascular surgery , 148(...

work page arXiv 2014
[3]

and Posch, M

Urach, S. and Posch, M. (2016). Multi-arm group sequential d esigns with a simultaneous stopping rule. Statistics in medicine , 35(30):5536–5550. Vaporciyan, A. A., Correa, A. M., Rice, D. C., Roth, J. A., Smy the, W., Swisher, S. G., Walsh, G. L., and Putnam Jr, J. B. (2004). Risk factors associa ted with atrial ﬁbrillation after noncardiac thoracic surg...

work page 2016
[4]

Wason, J. M. S. and Jaki, T. (2012). Optimal design of multi-a rm multi-stage trials. Statistics in Medicine , 31(30):4269–4279. Wason, J. M. S., Stecher, L., and Mander, A. P. (2014). Correc ting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials, 15(1). Wassmer, G., Pahlke, F., Jensen, T., Bove, D. S., Schueuerhui s, S., an...

work page 2012
[5]

5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al

The power under the LFC for the motivating example is therefo re 3∑ j=1 P (Φ j). 5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al. (2017) of Vk,j = σ 2(n− 1 k,j + n− 1 0,j ) where nk,j = nk⋆ ,j for all k, k ⋆ the covariance between the events Bk,j and Bk⋆ ,j ⋆ ; or Bk,j and Ak⋆ ,j ⋆ ; or Ak,j and Bk⋆ ,j ⋆ ; or...

work page 2017

[1] [1]

Abbas, R., Wason, J., Michiels, S., and Le Teuﬀ, G. (2022). A t wo-stage drop-the-losers de- sign for time-to-event outcome using a historical control a rm. Pharmaceutical Statistics, 21(1):268–288. Albini, A., Malavasi, V. L., Vitolo, M., Imberti, J. F., Mari etta, M., Lip, G. Y., and Boriani, G. (2021). Long-term outcomes of postoperative atr ial ﬁbrill...

work page 2022

[2] [2]

C., Chung, M

Frendl, G., Sodickson, A. C., Chung, M. K., Waldo, A. L., Gers h, B. J., Tisdale, J. E., Calkins, H., Aranki, S., Kaneko, T., Cassivi, S., et al. (201 4). 2014 aats guidelines for the prevention and management of peri-operative atrial ﬁbr illation and ﬂutter (poaf) for thoracic surgical procedures. The Journal of thoracic and cardiovascular surgery , 148(...

work page arXiv 2014

[3] [3]

and Posch, M

Urach, S. and Posch, M. (2016). Multi-arm group sequential d esigns with a simultaneous stopping rule. Statistics in medicine , 35(30):5536–5550. Vaporciyan, A. A., Correa, A. M., Rice, D. C., Roth, J. A., Smy the, W., Swisher, S. G., Walsh, G. L., and Putnam Jr, J. B. (2004). Risk factors associa ted with atrial ﬁbrillation after noncardiac thoracic surg...

work page 2016

[4] [4]

Wason, J. M. S. and Jaki, T. (2012). Optimal design of multi-a rm multi-stage trials. Statistics in Medicine , 31(30):4269–4279. Wason, J. M. S., Stecher, L., and Mander, A. P. (2014). Correc ting for multiple-testing in multi-arm trials: is it necessary and is it done? Trials, 15(1). Wassmer, G., Pahlke, F., Jensen, T., Bove, D. S., Schueuerhui s, S., an...

work page 2012

[5] [5]

5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al

The power under the LFC for the motivating example is therefo re 3∑ j=1 P (Φ j). 5.3.1 General equation for covariance matrix Under the same assumptions as used in Wason et al. (2017) of Vk,j = σ 2(n− 1 k,j + n− 1 0,j ) where nk,j = nk⋆ ,j for all k, k ⋆ the covariance between the events Bk,j and Bk⋆ ,j ⋆ ; or Bk,j and Ak⋆ ,j ⋆ ; or Ak,j and Bk⋆ ,j ⋆ ; or...

work page 2017