pith. sign in

arxiv: 2508.01610 · v2 · submitted 2025-08-03 · 📊 stat.ME

Sample size calculations for multilevel factorial longitudinal cluster randomised trials

Pith reviewed 2026-05-19 01:44 UTC · model grok-4.3

classification 📊 stat.ME
keywords sample size calculationpower analysisfactorial designcluster randomised triallongitudinal datamultilevel interventionsplit-plot designstepped wedge
0
0 comments X

The pith

Power to detect individual, cluster, and interaction effects in split-plot factorial longitudinal cluster trials follows from standard formulas for simpler trial designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops sample size methods for trials that simultaneously test an individual-level intervention and a cluster-level intervention using longitudinal cluster randomisation schemes such as stepped-wedge or crossover designs. It demonstrates that the required number of participants or clusters to achieve specified power for the two main effects and their interaction can be obtained by combining existing power results from individually randomised trials and from longitudinal cluster randomised trials. This combined approach is applied to the SharES breast-cancer trial to illustrate planning for continuous outcomes under the multilevel factorial structure. A sympathetic reader would care because many real-world interventions act at both personal and group levels, yet prior methodology did not cover the longitudinal cluster setting.

Core claim

For continuous outcomes in split-plot factorial longitudinal cluster randomised trials, the power to detect a given effect of the individual-level intervention, the cluster-level intervention, and the interaction between them is determined directly from standard power formulas for individually randomised trials and for longitudinal cluster randomised trials.

What carries the argument

The split-plot factorial longitudinal cluster randomised trial, in which clusters are randomised to levels of one factor while individuals within clusters are randomised to levels of the second factor, with outcomes measured repeatedly over time.

If this is right

  • Trial designers can plan joint assessment of patient-level and clinic-level interventions in stepped-wedge or crossover settings using readily available power calculators.
  • The approach extends to any longitudinal cluster design whose power formula is already known, without requiring a completely new derivation.
  • Interaction effects between the two intervention levels become testable within the same study that also estimates the separate main effects.
  • Sample size can be expressed in terms of the variances and intra-class correlations already familiar from simpler trial types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may reduce the barrier to running multilevel intervention studies in fields such as oncology or primary care where both clinician behaviour and patient education matter.
  • Extension to binary or time-to-event outcomes would follow the same logic once the corresponding component formulas are substituted.
  • The framework invites direct comparison of efficiency between split-plot factorial longitudinal designs and separate single-factor trials.

Load-bearing premise

That the power for the full multilevel factorial longitudinal design can be assembled by direct combination of existing formulas from separate individual-randomised and longitudinal cluster-randomised trials without extra terms for time-dependent interactions or the specific factorial randomisation structure.

What would settle it

A Monte Carlo simulation of a split-plot factorial longitudinal cluster trial in which the sample size chosen from the combined formulas fails to deliver the nominal power for the interaction effect when the true data-generating process includes realistic cluster-level and temporal correlation.

Figures

Figures reproduced from arXiv: 2508.01610 by Andrew Copas, Andrew Forbes, Fan Li, James Wason, Jessica Kasza, Rebecca Walwyn, Rhys Bowden.

Figure 1
Figure 1. Figure 1: Variance of treatment effect estimators vs number of observations per cluster [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Power vs number of observations per cluster period, for a standardised effect [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the variance of treatment effect estimators vs within-period [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
read the original abstract

Typically, trials investigate the impact of either an individual-level intervention on participant outcomes, or the impact of a cluster-level intervention on participant outcomes. Factorial designs consider two (or more) treatments for each of two (or more) different factors. In factorial trial designs, trial units (individuals or clusters) are each randomised to a level of each of the treatments; these designs allow assessment of the interactions between different interventions. Recently, there has been growing interest in the design of trials that jointly assess the impact of individual- and cluster-level interventions (i.e. multi-level interventions); requiring the development of methodology that accommodates randomisation at multiple levels. While recent work has developed sample size methodology for variants combining standard cluster randomisation and individual randomisation, that work does not apply to longitudinal cluster randomised trial designs such as the stepped wedge design or cluster randomised crossover design. Here we present dedicated sample size methodology for "split-plot factorial longitudinal cluster randomised trials" with continuous outcomes: allowing for joint assessment of individual-level and cluster-level interventions that allows for the impact of the cluster-level intervention to be assessed using any longitudinal cluster randomised trial design. We show how the power to detect given effects of the individual-level intervention, the cluster-level intervention, and the interaction between the two depends on standard results for individually-randomised trials and longitudinal cluster randomised trials. We apply these results to the SharES trial, which considered the effects of a patient- and clinician-level interventions for patients with breast cancer on patient knowledge about the risks and benefits of treatment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents dedicated sample size methodology for split-plot factorial longitudinal cluster randomised trials with continuous outcomes. It enables joint assessment of individual-level and cluster-level interventions (and their interaction) where the cluster-level intervention is evaluated via any longitudinal cluster randomised trial design such as stepped wedge or cluster crossover. The central claim is that power for the three effects follows directly from combining standard power results for individually randomised trials and longitudinal cluster randomised trials; the approach is illustrated via application to the SharES trial.

Significance. If the combination of existing formulas is shown to be statistically valid for the split-plot factorial structure, the work would supply a practical, accessible tool for sample-size planning in trials that evaluate multilevel interventions within longitudinal cluster designs. This addresses a methodological gap without requiring wholly new derivations, which is efficient. The real-trial application adds immediate relevance for trialists working on patient- and clinician-level interventions.

major comments (1)
  1. [Abstract and Methods] Abstract and Methods: the claim that power expressions for the individual-level, cluster-level, and interaction effects are obtained directly by combining standard results rests on the unstated assumption that the multilevel factorial structure introduces no additional variance components or time-dependent interaction terms beyond those already present in the separate individual-randomised and longitudinal-cluster formulae. Without an explicit combined variance expression or design-matrix adjustment (e.g., for the interaction contrast in a longitudinal mixed model), it is not possible to verify that the combination preserves correct type-I error and power for the interaction test.
minor comments (1)
  1. [Application] The SharES trial example would benefit from a short table listing the assumed effect sizes, intra-cluster correlations, and autocorrelation values used to obtain the reported sample sizes, allowing readers to reproduce the calculations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. The observation regarding the need for explicit verification of the combined variance structure is well taken and has prompted us to strengthen the presentation of the underlying model.

read point-by-point responses
  1. Referee: Abstract and Methods: the claim that power expressions for the individual-level, cluster-level, and interaction effects are obtained directly by combining standard results rests on the unstated assumption that the multilevel factorial structure introduces no additional variance components or time-dependent interaction terms beyond those already present in the separate individual-randomised and longitudinal-cluster formulae. Without an explicit combined variance expression or design-matrix adjustment (e.g., for the interaction contrast in a longitudinal mixed model), it is not possible to verify that the combination preserves correct type-I error and power for the interaction test.

    Authors: We agree that an explicit derivation of the variance expressions would improve verifiability. The split-plot factorial structure nests individual-level randomisation within the cluster-level longitudinal design, so that the fixed-effects contrasts for the individual-level main effect, cluster-level main effect, and their interaction remain orthogonal under the assumed linear mixed model. The random-effects structure (cluster, time, and cluster-by-time) is inherited directly from the longitudinal cluster trial component and does not acquire extra time-dependent interaction terms from the factorial aspect. In the revised manuscript we will add a subsection that states the full model, supplies the design-matrix contrast vectors for each effect, and derives the variance of the interaction estimator as a linear combination of the variances already present in the separate individual-randomised and longitudinal-cluster formulae. This will confirm that type-I error and power for the interaction test are preserved under the standard assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation combines external standard results

full rationale

The paper's central claim is that power calculations for the split-plot factorial longitudinal cluster design follow from combining existing formulas for individually randomised trials and longitudinal cluster randomised trials. No load-bearing step reduces to a self-citation chain, fitted parameter renamed as prediction, or self-definitional loop within the manuscript. The derivation treats the multilevel factorial structure as separable into known components once the design is specified, with the abstract explicitly stating dependence on 'standard results' from prior literature rather than internal fits or author-specific uniqueness theorems. This is the most common honest non-finding for papers that extend methodology by composition of established results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard statistical assumptions for continuous outcomes in randomised trials and the novel claim that power expressions combine without additional multilevel-specific terms.

axioms (2)
  • domain assumption Continuous outcomes follow linear mixed models with random effects capturing clustering and longitudinal correlation.
    Standard assumption invoked for power calculations in cluster and longitudinal trial designs.
  • ad hoc to paper The factorial multilevel structure does not introduce new variance components beyond those in the separate individual and cluster designs.
    This is the key modelling choice that allows direct combination of existing power results.

pith-pipeline@v0.9.0 · 5819 in / 1308 out tokens · 31789 ms · 2026-05-19T01:44:27.982848+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Analysis of variance, design, and regression: Linear modeling for unbalanced data

    Ronald Christensen. Analysis of variance, design, and regression: Linear modeling for unbalanced data. Chapman and Hall/CRC, 2018

  2. [2]

    Multilevel factorial ex- periments for developing behavioral interventions: power, sample size, and resource considerations

    John J Dziak, Inbal Nahum-Shani, and Linda M Collins. Multilevel factorial ex- periments for developing behavioral interventions: power, sample size, and resource considerations. Psychological methods, 17(2):153, 2012

  3. [3]

    Sarah T Hawley, Kelley Kidwell, David Zahrieh, Anne McCarthy, Rachel Wills, Aaron Rankin, Timothy Hofer, Selina Chow, Reshma Jagsi, and Heather Neuman. 25 Improving patient-centered communication in breast cancer: a study protocol for a multilevel intervention of a shared treatment deliberation system (shares) within the nci community oncology research pr...

  4. [4]

    Hooper, S

    R. Hooper, S. Teerenstra, E. de Hoop, and S. Eldridge. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med , 35:4718–4728, 2016

  5. [5]

    Does it decay? obtaining decaying correlation parameter values from pre- viously analysed cluster randomised trials

    Jessica Kasza, Rhys Bowden, Yongdong Ouyang, Monica Taljaard, and Andrew B Forbes. Does it decay? obtaining decaying correlation parameter values from pre- viously analysed cluster randomised trials. Statistical Methods in Medical Research, 32(11):2123–2134, 2023

  6. [6]

    Cluster randomized controlled trial of a multilevel physical activity intervention for older adults

    Jacqueline Kerr, Dori Rosenberg, Rachel A Millstein, Khalisa Bolling, Katie Crist, Michelle Takemoto, Suneeta Godbole, Kevin Moran, Loki Natarajan, Cynthia Castro-Sweet, et al. Cluster randomized controlled trial of a multilevel physical activity intervention for older adults. International Journal of Behavioral Nutrition and Physical Activity, 15:1–9, 2018

  7. [7]

    Walking the talk on multi-level interventions: the power of parsimony

    Colleen M McBride, Hannah LF Cooper, David R Williams, and Karen M Emmons. Walking the talk on multi-level interventions: the power of parsimony. Social Sci- ence & Medicine, 283:114189, 2021

  8. [8]

    The 2 × 2 cluster randomized controlled factorial trial design is mainly used for efficiency and to explore intervention interactions: a systematic review

    Noreen D Mdege, Sally Brabyn, Catherine Hewitt, Rachel Richardson, and David J Torgerson. The 2 × 2 cluster randomized controlled factorial trial design is mainly used for efficiency and to explore intervention interactions: a systematic review. Journal of Clinical Epidemiology , 67(10):1083–1092, 2014

  9. [9]

    A modern theory of factorial design

    Rahul Mukerjee and CF Jeff Wu. A modern theory of factorial design . Springer Science & Business Media, 2007

  10. [10]

    Multilevel factorial designs in intervention development

    Inbal Nahum-Shani and John J Dziak. Multilevel factorial designs in intervention development. Optimization of behavioral, biobehavioral, and biomedical interven- tions: Advanced topics , pages 47–87, 2018

  11. [11]

    Multilevel implementation strategies for adolescent human papillomavirus vaccine uptake: a cluster randomized clinical trial

    Lila J Finney Rutten, Joan M Griffin, Jennifer L St Sauver, Kathy MacLaughlin, Jessica D Austin, Gregory Jenkins, Jeph Herrin, and Robert M Jacobson. Multilevel implementation strategies for adolescent human papillomavirus vaccine uptake: a cluster randomized clinical trial. JAMA pediatrics, 178(1):29–36, 2024

  12. [12]

    Multi- level intervention stepped wedge designs (mli-swds)

    John Sperger, Michael R Kosorok, Laura Linnan, and Shawn M Kneipp. Multi- level intervention stepped wedge designs (mli-swds). Prevention Science, 25(Suppl 3):371–383, 2024

  13. [13]

    Sample size calculation in hierarchical 2 × 2 factorial trials with unequal cluster sizes

    Zizhong Tian, Denise Esserman, Guangyu Tong, Ondrej Blaha, James Dziura, Peter Peduzzi, and Fan Li. Sample size calculation in hierarchical 2 × 2 factorial trials with unequal cluster sizes. Statistics in medicine , 41(4):645–664, 2022. 26

  14. [14]

    Learning from a factorial stepped-wedge cluster randomised controlled trial in primary care

    BAM Van der Geest and JV Been. Learning from a factorial stepped-wedge cluster randomised controlled trial in primary care. Neonatal hyperbilirubinaemia, page 127, 2020. 27