pith. sign in

arxiv: 1906.09178 · v1 · pith:MZKB5DIFnew · submitted 2019-06-21 · 📊 stat.CO

A web application for the design of multi-arm clinical trials

Pith reviewed 2026-05-25 18:19 UTC · model grok-4.3

classification 📊 stat.CO
keywords multi-arm clinical trialssample size calculationmultiple comparison correctionspower calculationclinical trial designweb applicationallocation ratios
0
0 comments X

The pith

A free web application performs sample size calculations for multi-arm clinical trials under multiple comparison corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a web application developed to simplify the design of multi-arm clinical trials by handling sample size calculations. It incorporates a range of popular methods to adjust for multiple comparisons and supports calculations that control different varieties of power. The tool also determines optimized allocation ratios across treatment arms. This addresses the challenge of selecting among numerous possible multi-arm designs when suitable software has been limited. The application is intended to make these designs more practical for statisticians and clinicians by providing key operating characteristics without requiring specialized programming skills.

Core claim

The authors have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. The application supports sample size calculation to control several varieties of power, as well as the determination of optimised arm-wise allocation ratios. It is free to access on any device with an internet browser and requires no programming knowledge to use. The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multi-arm clinical trial design.

What carries the argument

The web application that implements sample size calculations for multi-arm designs while applying multiple comparison corrections and optimizing arm allocations.

If this is right

  • Users can more readily evaluate operating characteristics such as power and error rates for chosen multi-arm designs.
  • Optimized arm-wise allocation ratios can be identified to meet trial objectives efficiently.
  • Sample sizes can be calculated while controlling per-comparison error, family-wise error, or disjunctive power as needed.
  • The tool may facilitate greater use of multi-arm designs by reducing the software barrier to their planning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Broader adoption could allow more simultaneous testing of treatments within single trials, potentially conserving patient resources across disease areas.
  • The interface design implies that non-statisticians might now participate more directly in reviewing and adjusting multi-arm trial parameters.
  • The focus on multiple power types suggests the application could support both confirmatory and exploratory goals within the same design framework.

Load-bearing premise

The statistical procedures for sample size calculation, multiple comparison corrections, and power control are correctly implemented and suitable for real clinical trial planning.

What would settle it

An independent manual calculation or simulation for a specific multi-arm design with a known multiple comparison method that produces sample sizes or allocation ratios differing from those output by the application.

Figures

Figures reproduced from arXiv: 1906.09178 by James MS Wason, Michael J Grayling.

Figure 1
Figure 1. Figure 1: Design parameters box. The box in which input parameters are specified [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Design summary box. The box in which a summary of the input parameters [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Operating characteristics summary. The boxes in which a summary of the [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Operating characteristics plots. The boxes in which plots of the identified [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
read the original abstract

Multi-arm designs provide an effective means of evaluating several treatments within the same clinical trial. Given the large number of treatments now available for testing in many disease areas, it has been argued that their utilisation should increase. However, for any given clinical trial there are numerous possible multi-arm designs that could be used, and choosing between them can be a difficult task. This task is complicated further by a lack of available easy-to-use software for designing multi-arm trials. To aid the wider implementation of multi-arm clinical trial designs, we have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. Furthermore, the application supports sample size calculation to control several varieties of power, as well as the determination of optimised arm-wise allocation ratios. It is built using the Shiny package in the R programming language, is free to access on any device with an internet browser, and requires no programming knowledge to use. The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multi-arm clinical trial design. We hope that it will assist with the future utilisation of such designs in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the development of a free Shiny web application in R for designing multi-arm clinical trials. The tool performs sample size calculations that incorporate a variety of popular multiple comparison corrections, supports control of several power definitions, and determines optimized arm-wise allocation ratios. It is presented as accessible via any web browser with no programming required, with the goal of supplying the core operating characteristics information needed by statisticians and clinicians to evaluate such designs.

Significance. If the underlying calculations are correctly implemented, the application could meaningfully lower the barrier to using multi-arm trial designs in practice by providing an accessible interface for complex sample-size and power calculations. The work is primarily a software contribution rather than a methodological advance, so its significance is tied directly to demonstrated reliability and usability rather than novel statistical results.

major comments (2)
  1. [Abstract] Abstract: the central claim that the application 'provides the core information required by statisticians and clinicians' rests on an unverified implementation; the manuscript supplies neither explicit formulas for the supported multiple-comparison corrections and power types nor any numerical checks against published tables or analytic results.
  2. [Implementation / functionality description] The description of the application's functionality (throughout the manuscript) contains no R code snippets, pseudocode, or verification examples that would allow independent confirmation that the sample-size calculations match standard methods for the listed corrections and power definitions.
minor comments (2)
  1. A table or appendix listing the exact multiple-comparison procedures, power definitions, and allocation optimization methods supported by the app would improve clarity and allow readers to assess coverage without running the application.
  2. Consider adding a short reproducibility statement indicating whether the source code for the Shiny app is publicly available (e.g., on GitHub) and whether any unit tests or validation scripts are included.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and recommendation. We agree that verification is important for a software contribution and will revise the manuscript accordingly to include formulas, pseudocode, and numerical checks. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the application 'provides the core information required by statisticians and clinicians' rests on an unverified implementation; the manuscript supplies neither explicit formulas for the supported multiple-comparison corrections and power types nor any numerical checks against published tables or analytic results.

    Authors: We accept the point that the manuscript did not provide explicit formulas or numerical verification to support the abstract claim. The calculations rely on standard methods (e.g., Dunnett, Bonferroni corrections; disjunctive and conjunctive power), but these were not detailed. In revision we will add a dedicated section with the formulas for all supported corrections and power types, plus numerical checks against published tables or analytic results to substantiate the claim. revision: yes

  2. Referee: [Implementation / functionality description] The description of the application's functionality (throughout the manuscript) contains no R code snippets, pseudocode, or verification examples that would allow independent confirmation that the sample-size calculations match standard methods for the listed corrections and power definitions.

    Authors: We agree that the absence of pseudocode or verification examples limits independent confirmation. We will add pseudocode for the core sample-size algorithm (including allocation optimization) and specific verification examples demonstrating agreement with standard methods. We will also note the R functions or packages used for the computations. revision: yes

Circularity Check

0 steps flagged

No circularity: software description with no derivations or fitted predictions

full rationale

The paper presents a web application (Shiny/R) for multi-arm trial sample-size calculations under various multiple-comparison corrections, power definitions, and allocation ratios. No equations, derivations, parameter fits, or 'predictions' are claimed; the work is purely descriptive of an implementation. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear. The central claim reduces only to the existence of the tool and its menu of standard methods, which is independent of any internal reduction to its own inputs. This matches the default non-circular case for implementation papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities because the paper describes software implementation rather than a mathematical or theoretical derivation.

pith-pipeline@v0.9.0 · 5724 in / 1128 out tokens · 28235 ms · 2026-05-25T18:19:20.553349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Journal of Health Economics47, 20–33 (2016)

    DiMasi, J.A., Grabowski, H.G., Hansen, R.W.: Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of Health Economics47, 20–33 (2016)

  2. [2]

    Biotechnology Innovation Organization (BIO), Biomedtracker, AMPLION: Clinical development success rates 2006-2015 (2016)

  3. [3]

    Lancet 384(9940), 283–4 (2014)

    Parmar, M.K.B., Carpenter, J., Sydes, M.R.: More multiarm randomised trials of superiority are needed. Lancet 384(9940), 283–4 (2014)

  4. [4]

    BMC Cardiovascular Disorders 18(1), 215 (2018) 17

    Jaki, T., Wason, J.M.S.: Multi-arm multi-stage trials can improve the efficiency of finding effective treatments for stroke: a case study. BMC Cardiovascular Disorders 18(1), 215 (2018) 17

  5. [5]

    Wason, J.M.S., Stecher, L., Mander, A.P.: Correcting for multiple-testing in multi- arm trials: is it necessary and is it done? Trials 15, 364 (2014)

  6. [6]

    BMC Medicine 11, 84 (2013)

    Baron, G., Perrodeau, E., Boutron, I., Ravaud, P.: Reporting of analyses from ran- domized controlled trials with multiple arms: a systematic review. BMC Medicine 11, 84 (2013)

  7. [7]

    JAMA 321(16), 1610–1620 (2019)

    Juszczak, E., Altman, D.G., Hopewell, S., Schulz, K.: Reporting of multi-arm parallel-group randomized trials: extension of the CONSORT 2010 statement. JAMA 321(16), 1610–1620 (2019)

  8. [8]

    Epidemiology 1(1), 43–46 (1990)

    Rothman, K.J.: No adjustments are needed for multiple comparisons. Epidemiology 1(1), 43–46 (1990)

  9. [9]

    Journal of the Royal Statistical Society (Series A) 159(1), 93–110 (1996)

    Cook, R.J., Farewell, V.T.: Multiplicity considerations in the design and analysis of clinical trials. Journal of the Royal Statistical Society (Series A) 159(1), 93–110 (1996)

  10. [10]

    Controlled clinical trials 21(6), 527–539 (2000)

    Proschan, M.A., Waclawiw, M.A.: Practical guidelines for multiplicity adjustment in clinical trials. Controlled clinical trials 21(6), 527–539 (2000)

  11. [11]

    Bender, R., Lange, S.: Adjusting for multiple testing - when and how? Journal of Clinical Epidemiology 54(4) (2001)

  12. [12]

    Feise, R.J.: Do multiple outcome measures require p-value adjustment? BMC Med- ical Research Methodology 2, 8 (2002)

  13. [13]

    Encyclopedia Biostatistics 5, 3446–3451 (2005)

    Hughes, M.D.: Multiplicity in clinical trials. Encyclopedia Biostatistics 5, 3446–3451 (2005)

  14. [14]

    Clinical Cancer Research 14 (2008)

    Freidlin, B., Korn, E.L., Gray, R., Martin, A.: Multi-arm clinical trials of new agents: some design considerations. Clinical Cancer Research 14 (2008)

  15. [15]

    International Journal of Epidemiology46(2), 746–755 (2016)

    Li, G., Taljaard, M., Van den Heuvel, E.R., Levine, M.A.H., Cook, D.J., Wells, G.A., Devereaux, P.J., Thabane, L.: An introduction to multiplicity issues in clinical trials: 18 the what, why, when and how. International Journal of Epidemiology46(2), 746–755 (2016)

  16. [16]

    Agency, E.M.: Guideline on Multiplicity Issues in Clinical Trials. (2017). https://www.ema.europa.eu/en/documents/scientific-guideline/ draft-guideline-multiplicity-issues-clinical-trials_en.pdf

  17. [17]

    Administration, U.F..D.: Multiple Endpoints in Clinical Tri- als Guidance for Industry. (2017). https://www.fda.gov/ regulatory-information/search-fda-guidance-documents/ multiple-endpoints-clinical-trials-guidance-industry

  18. [18]

    Statistical Methods in Medical Research 27(5), 1513–1530 (2018)

    Howard, D.R., Brown, J.M., Todd, S., Gregory, W.M.: Recommendations on mul- tiple testing adjustment in multi-arm trials with a shared control group. Statistical Methods in Medical Research 27(5), 1513–1530 (2018)

  19. [19]

    John Wiley & Sons, New York, NY (1987)

    Hochberg, Y., Tamhane, A.C.: Multiple Comparison Procedures. John Wiley & Sons, New York, NY (1987)

  20. [20]

    Chapman & Hall, London (1996)

    Hsu, J.C.: Multiple Comparisons. Chapman & Hall, London (1996)

  21. [21]

    CRC Press, Boca Raton, FL (2010)

    Bretz, F., Hothorn, T., Westfall, P.: Multiple Comparisons using R. CRC Press, Boca Raton, FL (2010)

  22. [22]

    Statistics in Medicine 22(20), 3133–3150 (2003)

    Sankoh, A.J., D’Agostino, R.B.S., Huque, M.F.: Efficacy endpoint selection and multiplicity adjustment methods in clinical trials with inherent multiple endpoint issues. Statistics in Medicine 22(20), 3133–3150 (2003)

  23. [23]

    Oxford University Press, Oxford (2007)

    Atkinson, A., Donev, A., Tobias, R.: Optimum Experimental Designs, with SAS. Oxford University Press, Oxford (2007)

  24. [24]

    https://www.cytel.com/software/east

    East. https://www.cytel.com/software/east. Accessed: 2019-05-04

  25. [25]

    Chang, W., Cheng, J., Allaire, J.J., Xie, Y., McPherson, J.: shiny: Web Application Framework for R. (2019). https://CRAN.R-project.org/package=shiny 19

  26. [26]

    R Founda- tion for Statistical Computing, Vienna, Austria (2018)

    R Core Team: R: a Language and Environment for Statistical Computing. R Founda- tion for Statistical Computing, Vienna, Austria (2018). R Foundation for Statistical Computing. https://www.R-project.org/

  27. [27]

    http://www.github.com/mjg211/multiarm/

    Grayling, M.J.: multiarm: Design and analysis of fixed-sample multi-arm clinical trials (2019). http://www.github.com/mjg211/multiarm/

  28. [28]

    Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze (1936)

    Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilit. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze (1936)

  29. [29]

    Journal of the American Statistical Association 62(318), 626–633 (1967)

    ˇSid´ ak, Z.: Rectangular confidence regions for the means of multivariate normal dis- tributions. Journal of the American Statistical Association 62(318), 626–633 (1967)

  30. [30]

    Journal of the American Statistical Association 50(272), 1096–1121 (1955)

    Dunnett, C.W.: A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association 50(272), 1096–1121 (1955)

  31. [31]

    Scandinavian Jour- nal of Statistics 6(2), 65–70 (1979)

    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Jour- nal of Statistics 6(2), 65–70 (1979)

  32. [32]

    Biometrika 75(4), 800–802 (1988)

    Hochberg, Y.: A sharper bonferroni procedure for multiple tests of significance. Biometrika 75(4), 800–802 (1988)

  33. [33]

    Journal of the Royal Statistical Society (Series B) 57(1), 289–300 (1995)

    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society (Series B) 57(1), 289–300 (1995)

  34. [34]

    Annals of Statistics 29(4), 1165–1188 (1995)

    Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29(4), 1165–1188 (1995)

  35. [35]

    Statistical Methods in Medical Research 25(2), 716–727 (2016)

    Wason, J., Magirr, D., Law, M., Jaki, T.: Some recommendations for multi-arm multi-stage trials. Statistical Methods in Medical Research 25(2), 716–727 (2016)

  36. [36]

    Journal of Statistical Theory and Practice 7(4), 753–773 (2013) 20

    Sverdlov, O., Rosenberger, W.F.: On recent advances in optimal allocation designs in clinical trials. Journal of Statistical Theory and Practice 7(4), 753–773 (2013) 20

  37. [37]

    R package version 1.0-10

    Genz, A., Bretz, F., Miwa, T., X, M., F, L., F, S., T, H.: mvtnorm: Multivariate nor- mal and t distributions. R package version 1.0-10. (2019). http://CRAN.R-project. org/package=mvtnorm

  38. [38]

    BMC Medical Research Methodology 16, 67 (2016)

    Jacob, L., M, U., Boulet, S., Begaj, I., Chevret, S.: Evaluation of a multi-arm multi- stage Bayesian design for phase II drug selection trials - an example in hemato- oncology. BMC Medical Research Methodology 16, 67 (2016)

  39. [39]

    PLoS ONE 11(7), 0159026 (2016)

    Wheeler, G.M., Sweeting, M.J., Mander, A.P.: AplusB: A Web Application for In- vestigating A + B Designs for Phase I Cancer Clinical Trials. PLoS ONE 11(7), 0159026 (2016)

  40. [40]

    BMC Cancer 18, 133 (2018)

    Wages, N.A., Petroni, G.R.: A web tool for designing and conducting phase I trials using the continual reassessment method. BMC Cancer 18, 133 (2018)

  41. [41]

    https://www.ncss.com/software/pass/

    PASS. https://www.ncss.com/software/pass/. Accessed: 2019-05-04

  42. [42]

    Biometrika 99(2), 494–501 (2012)

    Magirr, D., Jaki, T., Whitehead, J.: A generalized Dunnett test for multi-arm multi- stage clinical studies with treatment selection. Biometrika 99(2), 494–501 (2012)

  43. [43]

    Statistical Methods in Medical Research 26(1), 508–524 (2017)

    Wason, J., Stallard, N., Bowden, J., Jennison, C.: A multi-stage drop-the-losers design for multi-arm clinical trials. Statistical Methods in Medical Research 26(1), 508–524 (2017)

  44. [44]

    Stata Journal 9(4), 505–523 (2009)

    Barthel, F.M.S., Royston, P., Parmar, M.K.B.: A menu-driven facility for sample- size calculation in novel multiarm, multistage randomized controlled trials with a time-to-event outcome. Stata Journal 9(4), 505–523 (2009)

  45. [45]

    Journal of Statistical Software 88(4), 1–25 (2019)

    Jaki, T., Pallmann, P., Magirr, D.: The R package MAMS for designing multi-arm multi-stage clinical trials. Journal of Statistical Software 88(4), 1–25 (2019)

  46. [46]

    BMC Medicine 16, 210 (2018) 22 Figure 1: Design parameters box

    Dimairo, M., Coates, E., Pallmann, P., Todd, S., Julious, S.A., Jaki, T., Wason, J., Mander, A.P., Weir, C.J., Koenig, F., Walton, M.K., Biggs, K., Nicholl, J., Hamasaki, T., Proschan, M.A., Scott, J.A., Ando, Y., Hind, D., Altman, D.G.: Development 21 process of a consensus-driven CONSORT extension for randomised trials using an adaptive design. BMC Medi...