arxiv: 2511.17292 · v3 · submitted 2025-11-21 · 📊 stat.ME

Balancing Evidentiary Value and Sample Size of Adaptive Designs with Application to Animal Experiments

Leonhard Held , Fadoua Balabdaoui , Saverio Fontana , Samuel Pawel This is my paper

Pith reviewed 2026-05-17 20:20 UTC · model grok-4.3

classification 📊 stat.ME

keywords experimental unit information indexadaptive designsanimal experimentsevidentiary valuegroup-sequential designsdiagnostic odds ratiosample size reduction3R principles

0 comments p. Extension

The pith

The experimental unit information index quantifies the evidentiary value of each experimental unit to balance sample size and statistical reliability in adaptive designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the experimental unit information index to measure how much evidence one unit contributes in a statistical test. This measure combines power, type I error, and sample size into an adjusted diagnostic odds ratio that also has Bayesian interpretations. A sympathetic reader would care because it directly addresses the goal of reducing animal use in research while maintaining the ability to make sound inferences. The index is defined for standard tests and then extended to adaptive designs with early stopping for efficacy or futility. Reanalysis of over 2700 animal experiments with simulated interim analyses shows that this can lead to practical reductions in the number of subjects required.

Core claim

The authors propose the experimental unit information index (EUII) as a novel measure of evidentiary value per experimental unit, obtained by adjusting diagnostic likelihood ratios and the diagnostic odds ratio for sample size. The EUII has interpretations in terms of frequentist error rates and Bayesian posterior odds. Its asymptotic value depends only on the relative effect size under the alternative. The definition is extended to adaptive designs, and application to group-sequential designs demonstrates its use for maximizing evidentiary value per unit. A reanalysis of 2738 animal experiments illustrates possible sample size savings.

What carries the argument

The experimental unit information index (EUII), which is the sample-size-adjusted diagnostic odds ratio that quantifies the evidentiary contribution of one experimental unit.

If this is right

Group-sequential adaptive designs can be evaluated and optimized using the EUII to achieve smaller sample sizes while controlling error rates.
The asymptotic EUII value depends solely on the assumed relative effect size under the alternative hypothesis.
EUII provides interpretations both for frequentist power and type I error and for Bayesian posterior odds.
Post-hoc interim analyses on existing animal experiment data can identify opportunities for reduced sample sizes in future studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers in other fields using human subjects or costly experiments could adopt the EUII to similarly optimize resource allocation.
The approach might be generalized to other types of sequential designs or more complex statistical models beyond group-sequential tests.
Integration into software for trial design could make it easier to plan studies that maximize evidence per unit.

Load-bearing premise

The relative effect size under the alternative hypothesis is known or can be prespecified so that power calculations stay accurate despite sample size reductions from early stopping.

What would settle it

Conduct a simulation where the true effect size is set differently from the value assumed in the EUII calculation, and check whether the observed error rates or evidentiary strength deviate from what the index predicts.

Figures

Figures reproduced from arXiv: 2511.17292 by Fadoua Balabdaoui, Leonhard Held, Samuel Pawel, Saverio Fontana.

**Figure 2.** Figure 2: The experimental unit information index for unequal randomisation of [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: The experimental unit information index for the standard one-sample one-sided [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: The EUII (top) of different group-sequential methods with four interim analyses in comparison to a fixed design with n = 50. The maximum sample size nMax of O’BrienFleming and Pocock is chosen so that their power (top axis) and Type-I error rate (2.5%) match with the fixed design, whereas nMax from Haybittle-Peto is always n = 50, leading to slightly higher power and Type-I error rate. Middle: Improvement… view at source ↗

**Figure 5.** Figure 5: The experimental unit information index (first-order) of several group-sequential [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: The experimental unit information index (second-order) of several group [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Futility bounds on p-values based on predictive power. or the Haybittle-Peto method now performing best. Pocock has fairly constant values of EUII and is always among the top two methods. Most other methods also have a fairly constant EUII with varying nMax, with the exception of Haybittle-Peto, which is increasing from nMax = 16 to nMax = 32. As a result, Haybittle-Peto is even better than Pocock if the a… view at source ↗

read the original abstract

Reducing the number of experimental units is one of the three pillars of the 3R principles (Replace, Reduce, Refine) in animal research. At the same time, statistical error rates need to be controlled to enable reliable inferences and decisions. This paper proposes to adopt diagnostic likelihood ratios and the diagnostic odds ratio to statistical hypothesis tests and to adjust it for sample size to obtain a novel measure to quantify for the evidentiary value of one experimental unit. The experimental unit information index (EUII) is based on power, Type-I error and sample size, and has attractive interpretations both in terms of frequentist error rates and Bayesian posterior odds. We introduce the EUII in simple statistical test settings and show that its asymptotic value depends only on the assumed relative effect size under the alternative. We then extend the definition to adaptive designs where early stopping for efficacy or futility may cause reductions in sample size. Application to group-sequential designs show the usefulness of the approach when the goal is to maximize the evidentiary value of one experimental unit. A reanalysis of 2738 animal experiments with simulated results from (post-hoc) interim analyses illustrates the possible savings in sample size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines an EUII from power, type I error and sample size, then extends it to group-sequential designs, but the whole construction requires accurate advance knowledge of the relative effect size.

read the letter

The main thing to know is that this paper introduces the Experimental Unit Information Index to quantify evidentiary value per experimental unit and shows how it behaves in adaptive group-sequential settings for animal experiments. The asymptotic value depends only on the assumed relative effect size under the alternative, and they illustrate possible sample-size savings via a reanalysis of 2738 experiments with simulated interim looks.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the Experimental Unit Information Index (EUII), defined from the power, Type I error rate, and sample size of a statistical test, to measure the evidentiary value per experimental unit. It demonstrates that the asymptotic value of the EUII depends solely on the assumed relative effect size under the alternative hypothesis. The definition is extended to adaptive designs, specifically group-sequential designs with early stopping for efficacy or futility, and applied to a reanalysis of 2738 animal experiments to show potential reductions in sample size while maintaining evidentiary value.

Significance. If the EUII and its extension to adaptive designs are valid, this work could contribute to more efficient animal experimentation by allowing smaller sample sizes without compromising the ability to make reliable inferences, in line with the 3R principles. The dual frequentist and Bayesian interpretations of the EUII are a strength. The large-scale reanalysis provides practical illustration of the method's potential impact on reducing animal use in experiments.

major comments (3)

The claim that the asymptotic EUII depends only on the assumed relative effect size is presented without the explicit derivation or limiting argument. Since the EUII is constructed directly from power, Type-I error, and sample size, and the relative effect size is an input to the power calculation, it is important to show that the limit is indeed independent of other parameters to support the interpretation as a measure of evidentiary value per unit.
In the extension to group-sequential designs, the manuscript does not provide the adjusted formulas for power and Type-I error that account for the stopping boundaries. Without these, it is unclear whether the EUII correctly reflects the evidentiary value when early stopping reduces the realized sample size, which is central to the claim of balancing evidentiary value and sample size.
The reanalysis of 2738 experiments simulates post-hoc interim analyses but lacks sensitivity analysis to the choice of the assumed relative effect size or error bars on the estimated savings. This weakens the illustration of possible sample size reductions.

minor comments (2)

The notation for the EUII formula could be clarified to distinguish between the finite-sample and asymptotic versions.
Some figures in the application section would benefit from clearer labeling of the adaptive vs non-adaptive cases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We are pleased that the referee recognizes the potential contribution to more efficient animal experimentation in line with the 3R principles. Below, we provide point-by-point responses to the major comments and outline the revisions we plan to make.

read point-by-point responses

Referee: The claim that the asymptotic EUII depends only on the assumed relative effect size is presented without the explicit derivation or limiting argument. Since the EUII is constructed directly from power, Type-I error, and sample size, and the relative effect size is an input to the power calculation, it is important to show that the limit is indeed independent of other parameters to support the interpretation as a measure of evidentiary value per unit.

Authors: We agree that an explicit derivation would strengthen the presentation. In the revised manuscript, we will add a dedicated subsection deriving the asymptotic limit. Under the usual normal approximation for the test statistic, as n tends to infinity the power tends to 1 at a rate governed by the relative effect size δ; after the sample-size normalization built into the EUII definition, the limit simplifies to a closed-form expression depending only on δ (and the fixed α), independent of the particular choice of target power. This limiting argument directly supports the per-unit evidentiary interpretation. revision: yes
Referee: In the extension to group-sequential designs, the manuscript does not provide the adjusted formulas for power and Type-I error that account for the stopping boundaries. Without these, it is unclear whether the EUII correctly reflects the evidentiary value when early stopping reduces the realized sample size, which is central to the claim of balancing evidentiary value and sample size.

Authors: We acknowledge the need for greater explicitness. The revised version will include the standard expressions for the overall Type I error and power under the group-sequential boundaries (using the joint multivariate normal distribution of the sequential test statistics or the corresponding boundary-crossing probabilities). These adjusted quantities will then be inserted directly into the EUII formula, making clear how the index accounts for the random realized sample size induced by early stopping. revision: yes
Referee: The reanalysis of 2738 experiments simulates post-hoc interim analyses but lacks sensitivity analysis to the choice of the assumed relative effect size or error bars on the estimated savings. This weakens the illustration of possible sample size reductions.

Authors: We agree that robustness checks would improve the illustration. In the revision we will add a sensitivity analysis that repeats the reanalysis over a grid of plausible relative effect sizes and will report the resulting range of estimated sample-size savings. We will also attach simulation-based standard errors (or bootstrap intervals) to the aggregate savings figures to quantify uncertainty across the 2738 experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity: EUII is explicitly constructed from power, alpha and n with derived asymptotic property

full rationale

The paper defines the experimental unit information index directly from power, Type-I error rate and sample size, then derives that its asymptotic value depends only on the pre-specified relative effect size delta under the alternative. This dependence is a mathematical consequence of the definition rather than a reduction of an independent claim to its inputs. No load-bearing self-citation, uniqueness theorem, or fitted parameter renamed as prediction is present in the provided derivation chain. The extension to group-sequential adaptive designs applies the same explicit construction while preserving the error-rate interpretations, making the overall argument self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard hypothesis-testing assumptions plus the choice of relative effect size to anchor the asymptotic EUII. No new physical entities are postulated.

free parameters (1)

relative effect size under the alternative
Determines the asymptotic value of the EUII; must be assumed or specified by the user.

axioms (1)

standard math Power and Type-I error rates are well-defined and can be computed for the chosen test and design.
Invoked when constructing the EUII from diagnostic likelihood ratios.

invented entities (1)

Experimental Unit Information Index (EUII) no independent evidence
purpose: Quantify evidentiary value contributed by each experimental unit after adjusting for sample size.
Newly introduced composite measure; no independent falsifiable prediction supplied in the abstract.

pith-pipeline@v0.9.0 · 5513 in / 1387 out tokens · 32520 ms · 2026-05-17T20:20:11.586900+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 1 internal anchor

[1]

Anderson

K. Anderson. gsDesign: Group Sequential Design, 2024. URL https://CRAN.R-project.org/package=gsDesign. R package version 3.6.5

work page 2024
[2]

Bayarri, D

M. Bayarri, D. J. Benjamin, J. O. Berger, and T. M. Sellke. Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses. Journal of Mathematical Psychology, 72: 0 90--103, June 2016. ISSN 00222496. doi:10.1016/j.jmp.2015.12.007

work page doi:10.1016/j.jmp.2015.12.007 2016
[3]

Blenkinsop, M

A. Blenkinsop, M. K. Parmar, and B. Choodari-Oskooei. Assessing the impact of efficacy stopping rules on the error rates under the multi-arm multi-stage framework. Clinical Trials, 16 0 (2): 0 132--141, 2019. ISSN 1740-7745. doi:10.1177/1740774518823551

work page doi:10.1177/1740774518823551 2019
[4]

Blotwijk, S

S. Blotwijk, S. Hernot, and K. Barbé. Group sequential designs for in vivo studies: Minimizing animal numbers and handling uncertainty in power analysis. Research in Veterinary Science, 145: 0 248--254, 2022. doi:10.1016/j.rvsc.2022.03.003

work page doi:10.1016/j.rvsc.2022.03.003 2022
[5]

Bonapersona, H

V. Bonapersona, H. Hoijtink, R. A. Sarabdjitsingh, and M. Joëls. Increasing the statistical power of animal experiments with historical control data. Nature Neuroscience, 24 0 (4): 0 470--477, 2021. doi:10.1038/s41593-020-00792-3

work page doi:10.1038/s41593-020-00792-3 2021
[6]

N. E. Breslow. Statistics in epidemiology: The case-control study. Journal of the American Statistical Association, 91 0 (433): 0 14–28, Mar. 1996. doi:10.1080/01621459.1996.10476660

work page doi:10.1080/01621459.1996.10476660 1996
[7]

W. S. Browner. Are all significant p values created equal?: The analogy between diagnostic tests and clinical research. JAMA, 257 0 (18): 0 2459, 1987. doi:10.1001/jama.1987.03390180077027

work page doi:10.1001/jama.1987.03390180077027 1987
[8]

Cavus, B

M. Cavus, B. Yazici, and A. Sezer. Penalized power approach to compare the power of the tests when type I error probabilities are different. Communications in Statistics - Simulation and Computation, 50 0 (7): 0 1912–1926, Mar. 2019. doi:10.1080/03610918.2019.1588310

work page doi:10.1080/03610918.2019.1588310 1912
[9]

R. P. Chalmers and M. C. Adkins. Writing effective and reliable Monte Carlo simulations with the SimDesign package. The Quantitative Methods for Psychology, 16 0 (4): 0 248--280, 2020. doi:10.20982/tqmp.16.4.p248

work page doi:10.20982/tqmp.16.4.p248 2020
[10]

Cornfield

J. Cornfield. A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix. Journal Natl Cancer Inst, 1 0 (6): 0 1269--75, 1951

work page 1951
[11]

D. B. Dahl, D. Scott, C. Roosen, A. Magnusson, and J. Swinton. xtable: Export Tables to LaTeX or HTML, 2019. URL https://CRAN.R-project.org/package=xtable. R package version 1.8-4

work page 2019
[12]

M. H. De Groot and M. J. Schervish. Probability and Statistics. Addison-Wesley, 4th edition, 2012

work page 2012
[13]

J. J. Deeks and D. G. Altman. Diagnostic tests 4: likelihood ratios. BMJ, 329: 0 168--169, 2004

work page 2004
[14]

Fisch, I

R. Fisch, I. Jones, J. Jones, J. Kerman, G. K. Rosenkranz, and H. Schmidli. Bayesian Design of Proof -of- Concept Trials . Therapeutic Innovation & Regulatory Science, 49 0 (1): 0 155--162, Jan. 2015. ISSN 2168-4790, 2168-4804. doi:10.1177/2168479014533970

work page doi:10.1177/2168479014533970 2015
[15]

Gerber and T

F. Gerber and T. Gsponer. gsbDesign : An R Package for Evaluating the Operating Characteristics of a Group Sequential Bayesian Design . Journal of Statistical Software, 69: 0 1--23, Mar. 2016. doi:10.18637/jss.v069.i11

work page doi:10.18637/jss.v069.i11 2016
[16]

A. S. Glas, J. G. Lijmer, M. H. Prins, G. J. Bonsel, and P. M. Bossuyt. The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology, 56 0 (11): 0 1129--1135, Nov. 2003. ISSN 08954356. doi:10.1016/S0895-4356(03)00177-X

work page doi:10.1016/s0895-4356(03)00177-x 2003
[17]

W. M. Goodman, S. E. Spruill, and E. Komaroff. A Proposed Hybrid Effect Size Plus p- Value Criterion : Empirical Evidence Supporting its Use . The American Statistician, 73 0 (sup1): 0 168--185, Mar. 2019. ISSN 0003-1305, 1537-2731. doi:10.1080/00031305.2018.1564697

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2018.1564697 2019
[18]

Gravestock and L

I. Gravestock and L. Held. Adaptive power priors with empirical Bayes for clinical trials. Pharmaceutical Statistics, 16 0 (5): 0 349--360, 2017. doi:10.1002/pst.1814

work page doi:10.1002/pst.1814 2017
[19]

Gravestock and L

I. Gravestock and L. Held. Power priors based on multiple historical studies for binary outcomes. Biometrical Journal, 61 0 (5): 0 1201--1218, 2018. doi:10.1002/bimj.201700246

work page doi:10.1002/bimj.201700246 2018
[20]

A. P. Grieve. Optimising the trade-off between type I and type II errors: A review and extensions. 2024. doi:10.48550/arXiv.2409.12081. arXiv preprint

work page doi:10.48550/arxiv.2409.12081 2024
[21]

G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes . Oxford University Press, Oxford, UK, 3rd edition, 2001

work page 2001
[22]

Gsponer, F

T. Gsponer, F. Gerber, B. Bornkamp, D. Ohlssen, M. Vandemeulebroecke, and H. Schmidli. A practical guide to Bayesian group sequential designs. Pharmaceutical Statistics, 13 0 (1): 0 71--80, 2014. doi:10.1002/pst.1593

work page doi:10.1002/pst.1593 2014
[23]

Heinze, A

G. Heinze, A. Boulesteix, M. Kammer, T. P. Morris, and I. R. White. Phases of methodological research in biostatistics---building the evidence base for new methods. Biometrical Journal, 66 0 (1), 2023. doi:10.1002/bimj.202200222

work page doi:10.1002/bimj.202200222 2023
[24]

L. Held. A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: S eries A (Statistics in Society) , 183 0 (2): 0 431--448, 2020. doi:10.1111/rssa.12493

work page doi:10.1111/rssa.12493 2020
[25]

L. Held, F. Gerber, K. Rufibach, S. R. Haile, S. Meyer, S. Rueeger, and S. Schwab. biostatUZH : Misc Tools of the Department of Biostatistics, EBPI, University of Zurich , 2024. URL https://github.com/EBPI-Biostatistics/biostatUZH. R package version 2.2.7, commit c7834604b20d382651f12a6399a2e4e87abeef76

work page 2024
[26]

Huang and L

Q. Huang and L. Trinquart. Relative likelihood ratios for neutral comparisons of statistical tests in simulation studies. Biometrical Journal, 66 0 (1): 0 2200102, 2024. doi:10.1002/bimj.202200102

work page doi:10.1002/bimj.202200102 2024
[27]

J. P. A. Ioannidis. Why most published research findings are false. PLoS Medicine , 2 0 (8): 0 e124, 2005. doi:10.1371/journal.pmed.0020124

work page doi:10.1371/journal.pmed.0020124 2005
[28]

Jennison and B

C. Jennison and B. W. Turnbull. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall, 1999

work page 1999
[29]

J. A. Kairalla, C. S. Coffey, M. A. Thomann, and K. E. Muller. Adaptive trial designs: a review of barriers and opportunities. Trials, 13 0 (1), 2012. doi:10.1186/1745-6215-13-145

work page doi:10.1186/1745-6215-13-145 2012
[30]

J. Kang, T. Koulis, and T. Pourmohamad. Sample size reduction in preclinical experiments: A Bayesian sequential decision-making framework. Journal of Biopharmaceutical Statistics, pages 1--16, 2025. doi:10.1080/10543406.2025.2556680

work page doi:10.1080/10543406.2025.2556680 2025
[31]

Kassambara

A. Kassambara. ggpubr: 'ggplot2' Based Publication Ready Plots, 2023. URL https://CRAN.R-project.org/package=ggpubr. R package version 0.6.0

work page 2023
[32]

Kirkwood and J

B. Kirkwood and J. Sterne. E ssential M edical S tatistics. Blackwell Publishing, 2003

work page 2003
[33]

Koehler, E

E. Koehler, E. Brown, and S. J.-P. A. Haneuse. On the assessment of Monte Carlo error in simulation-based statistical analyses. The American Statistician, 63 0 (2): 0 155--162, 2009. doi:10.1198/tast.2009.0030

work page doi:10.1198/tast.2009.0030 2009
[34]

E. L. Lehmann. Testing Statistical Hypotheses. John Wiley & Sons, 1959

work page 1959
[35]

C. J. Lloyd. Estimating test power adjusted for size. Journal of Statistical Computation and Simulation, 75 0 (11): 0 921–933, Nov. 2005. doi:10.1080/00949650412331321160

work page doi:10.1080/00949650412331321160 2005
[36]

Ludbrook

J. Ludbrook. Interim analyses of data as they accumulate in laboratory experimentation. BMC Medical Research Methodology, 3 0 (1): 0 15, Dec. 2003. doi:10.1186/1471-2288-3-15

work page doi:10.1186/1471-2288-3-15 2003
[37]

P. D. Lyden, F. Bosetti, M. A. Diniz, A. Rogatko, J. I. Koenig, J. Lamb, K. A. Nagarkatti, R. P. Cabeen, D. C. Hess, P. K. Kamat, M. B. Khan, K. Wood, K. Dhandapani, A. S. Arbab, E. C. Leira, A. K. Chauhan, N. Dhanesha, R. B. Patel, M. Kumskova, D. Thedens, A. Morais, T. Imai, T. Qin, C. Ayata, L. S. Boisserand, A. L. Herman, H. E. Beatty, S. E. Velazquez...

work page doi:10.1161/strokeaha.121.038047 2022
[38]

J. N. Matthews. Introduction to Randomized Controlled Clinical Trials. Chapman and Hall/ CRC , New York, 2006. doi:10.1201/9781420011302

work page doi:10.1201/9781420011302 2006
[39]

Micheloud and L

C. Micheloud and L. Held. Power calculations for replication studies. Statistical Science, 37 0 (3): 0 369--379, 2022. doi:10.1214/21-sts828

work page doi:10.1214/21-sts828 2022
[40]

T. P. Morris, I. R. White, and M. J. Crowther. Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38 0 (11): 0 2074--2102, 2019. doi:10.1002/sim.8086

work page doi:10.1002/sim.8086 2074
[41]

J. F. Mudge, L. F. Baker, C. B. Edge, and J. E. Houlahan. Setting an optimal that minimizes errors in null hypothesis significance tests. PLOS ONE , 7 0 (2): 0 e32734, 2012. doi:10.1371/journal.pone.0032734

work page doi:10.1371/journal.pone.0032734 2012
[42]

Neuenschwander, S

B. Neuenschwander, S. Weber, H. Schmidli, and A. O'Hagan. Predictively consistent prior effective sample sizes. Biometrics, 76 0 (2): 0 578--587, 2020. doi:10.1111/biom.13252

work page doi:10.1111/biom.13252 2020
[43]

Neumann, U

K. Neumann, U. Grittner, S. K. Piper, A. Rex, O. Florez-Vargas, G. Karystianis, A. Schneider, I. Wellwood, B. Siegerink, J. P. A. Ioannidis, J. Kimmelman, and U. Dirnagl. Increasing efficiency of preclinical research by group sequential designs. PLOS Biology, 15 0 (3): 0 e2001307, Mar. 2017. ISSN 1545-7885. doi:10.1371/journal.pbio.2001307

work page doi:10.1371/journal.pbio.2001307 2017
[44]

Nikolakopoulos, K

S. Nikolakopoulos, K. C. Roes, and I. van der Tweel. Sequential designs with small samples: Evaluation and recommendations for normal responses. Statistical Methods in Medical Research, 27 0 (4): 0 1115--1127, 2016. doi:10.1177/0962280216653778

work page doi:10.1177/0962280216653778 2016
[45]

M. Pepe. T he Statistical Evaluation of Medical Tests for Classification and Prediction . Oxford University Press, USA, 2004

work page 2004
[46]

P. S. Phelan. The delta likelihood ratio does not incorporate study power. Journal of Clinical Epidemiology, 101: 0 128--129, 2018. doi:10.1016/j.jclinepi.2018.04.021

work page doi:10.1016/j.jclinepi.2018.04.021 2018
[47]

Pourmohamad and C

T. Pourmohamad and C. Wang. Sequential Bayes factors for sample size reduction in preclinical experiments with binary outcomes. Statistics in Biopharmaceutical Research, 15 0 (4): 0 706--715, 2022. doi:10.1080/19466315.2022.2123386

work page doi:10.1080/19466315.2022.2123386 2022
[48]

R: A Language and Environment for Statistical Computing

R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2024. URL https://www.R-project.org/

work page 2024
[49]

Reinagel

P. Reinagel. Is N -Hacking Ever OK? The consequences of collecting more data in pursuit of statistical significance . PLOS Biology, 21 0 (11): 0 e3002345, 2023. doi:10.1371/journal.pbio.3002345

work page doi:10.1371/journal.pbio.3002345 2023
[50]

P. S. Reynolds. The well-built research question. Lab Animal, 52 0 (10): 0 221--223, 2023. ISSN 1548-4475. doi:10.1038/s41684-023-01257-3

work page doi:10.1038/s41684-023-01257-3 2023
[51]

P. S. Reynolds. Statistical design of experiments: the forgotten component of reduction. Lab Animal, 53 0 (3): 0 57--59, 2024 a . doi:10.1038/s41684-024-01334-1

work page doi:10.1038/s41684-024-01334-1 2024
[52]

P. S. Reynolds. Study design: think ‘scientific value’ not ‘p-values’. Laboratory Animals, 58 0 (5): 0 404--410, 2024 b . doi:10.1177/00236772241276806

work page doi:10.1177/00236772241276806 2024
[53]

D. M. Rom and J. A. McTague. Exact critical values for group sequential designs with small sample sizes. Journal of Biopharmaceutical Statistics, 30 0 (4): 0 752--764, 2020. doi:10.1080/10543406.2020.1730878

work page doi:10.1080/10543406.2020.1730878 2020
[54]

G. K. Rosenkranz. Replicability of studies following a dual-criterion design. Statistics in Medicine, 40 0 (18): 0 4068--4076, 2021. doi:10.1002/sim.9014

work page doi:10.1002/sim.9014 2021
[55]

Roychoudhury, N

S. Roychoudhury, N. Scheuer, and B. Neuenschwander. Beyond p -values: A phase II dual-criterion design with statistical significance and clinical relevance. Clinical Trials, 15 0 (5): 0 452--461, Oct. 2018. ISSN 1740-7745, 1740-7753. doi:10.1177/1740774518770661

work page doi:10.1177/1740774518770661 2018
[56]

Rufibach, H

K. Rufibach, H. U. Burger, and M. Abt. Bayesian predictive power: choice of prior and some recommendations for its use as probability of success in drug development. Pharmaceutical Statistics, 15 0 (5): 0 438--446, 2016. doi:10.1002/pst.1764

work page doi:10.1002/pst.1764 2016
[57]

W. M. S. Russell and R. L. Burch. The Principles of Humane Experimental Technique. Methuen, London, U.K., 1959

work page 1959
[58]

B. S. Siepe, F. Barto s , T. P. Morris, A.-L. Boulesteix, D. W. Heck, and S. Pawel. Simulation studies for methodological research in psychology: A standardized structure for planning, preregistration, and reporting. Psychological Methods, 2024. doi:10.1037/met0000695. To appear

work page doi:10.1037/met0000695 2024
[59]

R. Simon. Randomized Clinical Trials and Research Strategy . Cancer Treatment Reports, 66: 0 1083--1087, 1982

work page 1982
[60]

R. Simon. S ome practical aspects of the interim monitoring of clinical trials . Statistics in Medicine, 13: 0 1401--1409, 1994

work page 1994
[61]

D. J. Spiegelhalter, L. S. Freedman, and P. R. Blackburn. Monitoring clinical trials: Conditional or predictive power? Controlled Clinical Trials, 7 0 (1): 0 8--17, Mar. 1986. ISSN 01972456. doi:10.1016/0197-2456(86)90003-6

work page doi:10.1016/0197-2456(86)90003-6 1986
[62]

D. J. Spiegelhalter, R. Abrams, and J. P. Myles. Bayesian Approaches to Clinical Trials and Health-Care Evaluation . New York: Wiley, 2004

work page 2004
[63]

M. J. Staquet, M. Rozencweig, D. D. Von Hoff, and F. M. Muggia. T he delta and epsilon errors in the assessment of cancer clinical trials . Cancer Treatment Reports, 63 0 (11-12): 0 1917--1921, 1979

work page 1917
[64]

H. G. G. Townsend, K. Osterrieder, M. D. Jelinski, D. W. Morck, C. L. Waldner, W. R. Cox, V. Gerdts, A. A. Potter, L. A. Babiuk, and J. C. Cross. A call to action to address critical flaws and bias in laboratory animal experiments and preclinical research. Scientific Reports, 15 0 (1): 0 30745, 2025. doi:10.1038/s41598-025-15935-4

work page doi:10.1038/s41598-025-15935-4 2025
[65]

R. J. Walley and A. P. Grieve. Optimising the trade-off between type i and II error rates in the bayesian context. Pharmaceutical Statistics, 20 0 (4): 0 710--720, 2021. doi:10.1002/pst.2102

work page doi:10.1002/pst.2102 2021
[66]

Wassmer and W

G. Wassmer and W. Brannath. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials . Springer, New York, 2016. doi:10.1007/978-3-319-32562-0

work page doi:10.1007/978-3-319-32562-0 2016
[67]

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer International Publishing, Cham, 2016. ISBN 978-3-319-24277-4. doi:10.1007/978-3-319-24277-4

work page doi:10.1007/978-3-319-24277-4 2016
[68]

Wickham, R

H. Wickham, R. François, L. Henry, and K. Müller. dplyr: A Grammar of Data Manipulation, 2022. URL https://CRAN.R-project.org/package=dplyr. R package version 1.0.10

work page 2022
[69]

Wickham, D

H. Wickham, D. Vaughan, and M. Girlich. tidyr: Tidy Messy Data, 2024. URL https://CRAN.R-project.org/package=tidyr. R package version 1.3.1

work page 2024
[70]

Wiesenfarth and S

M. Wiesenfarth and S. Calderazzo. Quantification of prior impact in terms of effective current sample size. Biometrics, 76 0 (1): 0 326--336, 2020. doi:10.1111/biom.13124

work page doi:10.1111/biom.13124 2020
[71]

C. O. Wilke. cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2', 2024. URL https://CRAN.R-project.org/package=cowplot. R package version 1.1.3

work page 2024
[72]

Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R, 2024. URL https://yihui.org/knitr/. R package version 1.46

work page 2024
[73]

Y. Zhao, D. Li, R. Liu, and Y. Yuan. Bayesian optimal phase II designs with dual-criterion decision making. Pharmaceutical Statistics, 22 0 (4): 0 605--618, 2023. ISSN 1539-1612. doi:10.1002/pst.2296

work page doi:10.1002/pst.2296 2023