A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models

Fan Li; Guangyu Tong; Mary Ryan Baumann; Michael O. Harhay; Monica Taljaard; Patrick J. Heagerty; Rui Wang

arxiv: 2501.18383 · v3 · submitted 2025-01-30 · 📊 stat.ME · stat.AP

A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models

Mary Ryan Baumann , Monica Taljaard , Patrick J. Heagerty , Michael O. Harhay , Guangyu Tong , Rui Wang , Fan Li This is my paper

Pith reviewed 2026-05-23 04:38 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords cluster randomized trialstreatment effect heterogeneitysample size calculationpower analysislinear mixed modelsintracluster correlationstepped wedge designR Shiny calculator

0 comments

The pith

This tutorial consolidates sample size and power formulas for testing treatment effect heterogeneity in cluster randomized trials via linear mixed models and supplies an R Shiny calculator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper brings together recently derived formulas for power and sample size when the goal is to detect heterogeneity of treatment effects, rather than only the average treatment effect, in cluster randomized trials. These calculations apply to linear mixed effects models across single-period and multi-period parallel designs, crossover designs, and stepped-wedge designs, and cover both continuous and binary outcomes. Because the formulas require extra design parameters, especially intracluster correlation coefficients for both the outcome and the effect-modifying covariate, the tutorial also supplies an online calculator to reduce the barrier to use. A sympathetic reader would care because pre-specified heterogeneity analyses are increasingly common in community trials, yet without proper power planning those analyses risk being inconclusive.

Core claim

The authors consolidate separate power and sample size formulas for testing treatment-covariate interactions or differences in subpopulation-specific treatment effects in cluster randomized trials using linear mixed effects models, demonstrate their application through an R Shiny calculator, and highlight the sensitivity of results to accurate intracluster correlation estimates for both outcomes and covariates.

What carries the argument

The online R Shiny calculator that implements the design-specific sample size and power formulas for HTE testing in CRTs with LME models, taking as inputs the relevant ICCs, effect sizes, and cluster parameters.

If this is right

Trial designers can now calculate the number of clusters and cluster sizes needed to power pre-specified HTE analyses in the main CRT designs.
Power estimates become strongly dependent on the chosen ICC values for both the outcome and the covariate.
The same consolidated approach covers continuous and binary outcomes across parallel, crossover, and stepped-wedge structures.
The calculator lowers the practical barrier to performing these calculations before a trial begins.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future methodological work could test whether the formulas remain accurate when the linear mixed model assumptions are mildly violated in real cluster data.
Routine collection of pilot estimates for covariate ICCs, in addition to outcome ICCs, would become a standard part of CRT planning.
The calculator framework could be extended to allow users to upload their own simulation-based power checks for non-standard designs.

Load-bearing premise

Users will be able to supply accurate estimates of the intracluster correlation coefficients for both the outcome and the effect-modifying covariate.

What would settle it

Running the published formulas by hand for a stepped-wedge design with a continuous outcome and finding that the calculator's output power differs by more than sampling error from the hand calculation for the same inputs.

read the original abstract

Cluster-randomized trials (CRTs) are a well-established class of designs for evaluating community-based interventions. An essential task in planning these trials is determining the number of clusters and cluster sizes needed to achieve sufficient statistical power for detecting a clinically relevant effect size. While methods for evaluating the average treatment effect (ATE) for the entire study population are well-established, sample size methods for testing heterogeneity of treatment effects (HTEs), i.e., treatment-covariate interaction or difference in subpopulation-specific treatment effects, in CRTs have only recently been developed. For pre-specified analyses of HTEs in CRTs, effect-modifying covariates should, ideally, be accompanied by sample size or power calculations to ensure the trial has adequate power for the planned analyses. Power analysis for testing HTEs is more complex than for ATEs due to the additional design parameters that must be specified. Power and sample size formulas for testing HTEs via linear mixed effects (LME) models have been separately derived for different cluster-randomized designs, including single and multi-period parallel designs, crossover designs, and stepped-wedge designs, and for continuous and binary outcomes. This tutorial provides a consolidated reference guide for these methods and enhances their accessibility through an online R Shiny calculator. We further discuss key considerations for conducting sample size and power calculations to test pre-specified HTE hypotheses in CRTs, highlighting the importance of specifying advanced estimates of intracluster correlation coefficients for both outcomes and covariates, and their implications for power. The sample size methodology and calculator functionality are demonstrated through a real CRT example.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A useful consolidation of existing power formulas for HTE in CRTs plus a Shiny calculator, but no new derivations.

read the letter

This tutorial gathers power and sample size formulas for testing treatment effect heterogeneity via linear mixed models across several cluster randomized designs, including parallel, crossover, and stepped-wedge, for both continuous and binary outcomes. The core contribution is pulling those previously derived pieces into one place and pairing them with an R Shiny app that lets users run the calculations directly. It also walks through a real-trial example and flags the practical need to supply ICCs for both the outcome and the covariate. That last point is already foregrounded in the abstract, which is good because those inputs drive most of the uncertainty in practice. The paper does not claim new math; it explicitly points back to the separate derivations in earlier work. That keeps the scope honest. The main limitation is that any errors in the cited formulas or in the app's implementation would carry through, and the abstract-level description does not let me verify the code or the numerical examples in detail. The sensitivity discussion is proportionate and does not overstate what the methods can deliver. This is aimed at trial statisticians who already know they want to pre-specify an HTE analysis and need a single reference plus a working tool. It is not aimed at readers looking for fresh theoretical results. I would send it for peer review in a methods journal because the consolidation and the calculator lower the barrier for correct design work even if the underlying formulas are not original.

Referee Report

0 major / 3 minor

Summary. This tutorial consolidates power and sample size formulas for testing pre-specified treatment effect heterogeneity (HTE) via linear mixed models in cluster-randomized trials. It covers single- and multi-period parallel designs, crossover designs, and stepped-wedge designs, for both continuous and binary outcomes. The paper provides an R Shiny calculator to implement the methods, discusses practical issues including the need for accurate intracluster correlation coefficients (ICCs) for the outcome and the effect-modifying covariate, and demonstrates the approach with a real CRT example.

Significance. If the cited derivations are represented accurately and the calculator implements them correctly, the manuscript supplies a consolidated, accessible reference that fills a practical gap: while ATE power methods for CRTs are mature, HTE methods have appeared only recently and separately. The explicit foregrounding of ICC sensitivity for both outcome and covariate, together with the online tool, should improve the quality of sample-size planning for HTE analyses in future CRTs. The provision of reproducible code (Shiny app) is a clear strength.

minor comments (3)

[Introduction / §2] The abstract and introduction state that formulas were 'separately derived' for different designs and outcomes; a short table or appendix listing the original references for each formula (with equation numbers) would help readers trace the derivations without searching the cited papers.
[Shiny calculator section] In the description of the Shiny app inputs, the mapping between user-supplied ICC values and the variance components appearing in the power formulas is not shown explicitly; adding a small schematic or equation reference next to each input field would reduce the chance of mis-specification.
[Example section] The real-CRT example reports power curves but does not tabulate the exact ICC values used for the outcome and covariate; including these numerical values (and the source of the estimates) would allow readers to reproduce the displayed results directly from the formulas.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and for recommending acceptance. We are pleased that the consolidation of power and sample size methods for HTE testing in CRTs, along with the R Shiny calculator and emphasis on ICC sensitivity, is recognized as addressing a practical gap.

Circularity Check

0 steps flagged

No significant circularity; tutorial consolidates external derivations

full rationale

The paper is explicitly a tutorial that consolidates power and sample size formulas previously derived separately for HTE testing in CRTs across designs and outcome types. It makes no new first-principles derivations or predictions that reduce to its own fitted inputs or self-citations. The abstract states the formulas 'have been separately derived' and positions the contribution as a reference guide plus R Shiny calculator, with discussion of ICC sensitivity as a practical point. No load-bearing step equates outputs to inputs by construction, and the reader's assessment of score 0.0 aligns with the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The tutorial rests on standard linear mixed model assumptions for clustered data and on the requirement that users provide realistic intracluster correlation estimates; no new entities are introduced.

free parameters (1)

intracluster correlation coefficients for outcome and covariate
These must be specified by the user and directly determine the required sample size in the consolidated formulas.

axioms (1)

domain assumption Linear mixed effects models appropriately capture the clustering structure in CRT data for both outcomes and covariates.
The tutorial focuses exclusively on LME-based power calculations.

pith-pipeline@v0.9.0 · 5846 in / 1173 out tokens · 37284 ms · 2026-05-23T04:38:47.938599+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction; washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Power and sample size formulas for testing HTEs via linear mixed effects (LME) models have been separately derived for different cluster-randomized designs... highlighting the importance of specifying advanced estimates of intracluster correlation coefficients for both outcomes and covariates
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean; IndisputableMonolith/Constants absolute_floor_iff_bare_distinguishability; phi_golden_ratio unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Table 2: Summary of HTE variance formulas... σ_HTE² = σ_ATE² × (1−α₁)/{1+(m−2)α₁−(m−1)ρ₁α₁}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Design and analysis of group-randomized trials

Murray DM. Design and analysis of group-randomized trials. Oxford Univrsity Press, USA; 1998

work page 1998
[2]

Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design

Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design. American Journal of Public Health. 2017;107(6):907–915

work page 2017
[3]

Methods for sample size determination in cluster randomized trials

Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. International Journal of Epidemiology. 2015 June 1;44(3):1051–1067

work page 2015
[4]

Sample size calculators for planning stepped- wedge cluster randomized trials: a review and comparison

Ouyang Y, Li F, Preisser JS, Taljaard M. Sample size calculators for planning stepped- wedge cluster randomized trials: a review and comparison. International Journal of Epidemiology. 2022 Dec 1;51(6):2000–2013

work page 2022
[6]

Designing three-level cluster randomized trials to assess treatment effect heterogeneity

Li F, Chen X, Tian Z, Esserman D, Heagerty PJ, Wang R. Designing three-level cluster randomized trials to assess treatment effect heterogeneity. Biostatistics. 2022 July;24(4):833–849

work page 2022
[9]

Sample size requirements for testing treatment effect heterogeneity in cluster randomized trials with binary outcomes

Maleyeff L, Wang R, Haneuse S, Li F. Sample size requirements for testing treatment effect heterogeneity in cluster randomized trials with binary outcomes. Statistics in Medicine. 2023;42(27):5054–5083

work page 2023
[10]

Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials

Wang X, Goldfeld KS, Taljaard M, Li F. Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials. Prev Sci [Internet]. 2023 Oct 10 [cited 2024 Feb 29]; Available from: https://doi.org/10.1007/s11121-023-01590-6

work page doi:10.1007/s11121-023-01590-6 2023
[11]

Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity

Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911. Page 21 of 36

work page 2024
[12]

Sample size and power calculation for testing treatment effect heterogeneity in cluster randomized crossover designs

Wang X, Chen X, Goldfeld KS, Taljaard M, Li F. Sample size and power calculation for testing treatment effect heterogeneity in cluster randomized crossover designs. Statistical Methods in Medical Research. 2024;33(7):1115–1136

work page 2024
[13]

Simple sample size calculation for cluster-randomized trials

Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. International Journal of Epidemiology. 1999 Apr 1;28(2):319–326

work page 1999
[14]

Sample size calculation for cluster randomized cross- over trials

Giraudeau B, Ravaud P, Donner A. Sample size calculation for cluster randomized cross- over trials. Statistics in Medicine. 2008;27(27):5578–5585

work page 2008
[15]

Multi-period crossover trials

Matthews J. Multi-period crossover trials. Statistical Methods in Medical Research. 1994;3(4):383–405

work page 1994
[16]

Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs

Hemming K, Lilford R, Girling AJ. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs. Statistics in Medicine. 2015;34(2):181–196

work page 2015
[17]

Design and analysis of stepped wedge cluster randomized trials

Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials. 2007 Feb 1;28(2):182–191

work page 2007
[18]

Statistical Power and Sample Size Requirements for Three Level Hierarchical Cluster Randomized Trials

Heo M, Leon AC. Statistical Power and Sample Size Requirements for Three Level Hierarchical Cluster Randomized Trials. Biometrics. 2008;64(4):1256–1262

work page 2008
[19]

Individually Randomized Group Treatment Trials: A Critical Appraisal of Frequently Used Design and Analytic Approaches

Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually Randomized Group Treatment Trials: A Critical Appraisal of Frequently Used Design and Analytic Approaches. American Journal of Public Health. 2008;98(8):1418–1424

work page 2008
[20]

Cluster randomised trials with repeated cross sections: alternatives to parallel group designs

Hooper R, Bourke L. Cluster randomised trials with repeated cross sections: alternatives to parallel group designs. BMJ [Internet]. 2015;350. Available from: https://www.bmj.com/content/350/bmj.h2925

work page 2015
[21]

Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches

Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials. 2015 Aug 17;16(1):352

work page 2015
[22]

Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care

NIA IMPACT Collaboratory. Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care. National Institutes of Health: Bethesda, Maryland. 2022

work page 2022
[24]

A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator

Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 June 1;49(3):979– 995. Page 22 of 36

work page 2020
[25]

Cohort versus cross-sectional design in large field trials: Precision, sample size, and a unifying model

Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: Precision, sample size, and a unifying model. Statistics in Medicine. 1994;13(1):61–78

work page 1994
[26]

Statistical analysis and optimal design for cluster randomized trials

Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychological Methods. US: American Psychological Association; 1997;2(2):173–185

work page 1997
[28]

Lumbar Imaging With Reporting Of Epidemiology (LIRE)—Protocol for a pragmatic cluster randomized trial

Jarvik JG, Comstock BA, James KT, et al. Lumbar Imaging With Reporting Of Epidemiology (LIRE)—Protocol for a pragmatic cluster randomized trial. Contemporary Clinical Trials. 2015;45:157–163

work page 2015
[30]

Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials

Mbekwe Yepnang AM, Caille A, Eldridge SM, Giraudeau B. Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials. Stat Methods Med Res. SAGE Publications Ltd STM; 2021 Aug 1;30(8):1988–2003

work page 2021
[31]

Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data

Gulliford MC, Adams G, Ukoumunne OC, Latinovic R, Chinn S, Campbell MJ. Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data. Journal of Clinical Epidemiology. 2005 Mar 1;58(3):246–251

work page 2005
[32]

Reflection on modern methods: when is a stepped-wedge cluster randomized trial a good study design choice? International Journal of Epidemiology

Hemming K, Taljaard M. Reflection on modern methods: when is a stepped-wedge cluster randomized trial a good study design choice? International Journal of Epidemiology. 2020 June 1;49(3):1043–1052

work page 2020
[33]

Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs

Kasza J, Bowden R, Forbes AB. Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs. Statistics in Medicine. 2021;40(7):1736–1751

work page 2021
[34]

Model misspecification in stepped wedge trials: Random effects for time or treatment

Voldal EC, Xia F, Kenny A, Heagerty PJ, Hughes JP. Model misspecification in stepped wedge trials: Random effects for time or treatment. Statistics in Medicine. 2022;41(10):1751–1766

work page 2022
[35]

A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models

Kasza J, Hooper R, Copas A, Forbes AB. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Statistics in Medicine. 2020;39(13):1871– 1883. Page 23 of 36 Supplementary material for “A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials wit...

work page 2020
[36]

Effects of a High-Intensity Functional Exercise Program on Dependence in Activities of Daily Living and Balance in Older Adults with Dementia

Toots A, Littbrand H, Lindelöf N, et al. Effects of a High-Intensity Functional Exercise Program on Dependence in Activities of Daily Living and Balance in Older Adults with Dementia. Journal of the American Geriatrics Society. 2016;64(1):55–64

work page 2016
[37]

Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials

Yang S, Li F, Starks MA, Hernandez AF, Mentz RJ, Choudhury KR. Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials. Statistics in Medicine. 2020;39(28):4218–4237

work page 2020
[38]

Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity

Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911

work page 2024
[39]

Patterns of intra-cluster correlation from primary care research to inform study design and analysis

Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. Journal of Clinical Epidemiology. 2004 Aug 1;57(8):785–794. Page 35 of 36

work page 2004
[40]

Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research

Campbell MK, Fayers PM, Grimshaw JM. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clinical Trials. 2005;2(2):99–107

work page 2005
[41]

Clustering in surgical trials - database of intracluster correlations

Cook JA, Bruckner T, MacLennan GS, Seiler CM. Clustering in surgical trials - database of intracluster correlations. Trials. 2012 Jan 4;13(1):2

work page 2012
[42]

Intra-cluster correlations from the CLustered OUtcome Dataset bank to inform the design of longitudinal cluster trials

Korevaar E, Kasza J, Taljaard M, et al. Intra-cluster correlations from the CLustered OUtcome Dataset bank to inform the design of longitudinal cluster trials. Clinical Trials. 2021;18(5):529–540

work page 2021
[43]

Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial

Ouyang Y, Hemming K, Li F, Taljaard M. Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial. International Journal of Epidemiology. 2023 Oct 1;52(5):1634–1647

work page 2023
[44]

Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care

Yelland LN, Salter AB, Ryan P, Laurence CO. Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care. Clinical Trials. 2011;8(1):48–58

work page 2011
[45]

A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator

Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 Jun 1;49(3):979– 995

work page 2020
[46]

Estimates of intra-cluster correlation coefficients from 2018 USA Medicare data to inform the design of cluster randomized trials in Alzheimer’s and related dementias

Ouyang Y, Li F, Li X, Bynum J, Mor V, Taljaard M. Estimates of intra-cluster correlation coefficients from 2018 USA Medicare data to inform the design of cluster randomized trials in Alzheimer’s and related dementias. Trials. 2024 Oct 30;25(1):732

work page 2018
[47]

Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials

Kasza J, Bowden R, Ouyang Y, Taljaard M, Forbes AB. Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials. Statistical Methods in Medical Research. 2023;32(11):2123–2134

work page 2023
[48]

Barriers and facilitators to patient recruitment to a cluster randomized controlled trial in primary care: lessons for future trials

Foster JM, Sawyer SM, Smith L, Reddel HK, Usherwood T. Barriers and facilitators to patient recruitment to a cluster randomized controlled trial in primary care: lessons for future trials. BMC Med Res Methodol. 2015 Mar 12;15(1):18

work page 2015
[49]

Recruitment and implementation challenges were common in stepped-wedge cluster randomized trials: Results from a methodological review

Caille A, Taljaard M, Vilain—Abraham FL, et al. Recruitment and implementation challenges were common in stepped-wedge cluster randomized trials: Results from a methodological review. Journal of Clinical Epidemiology. 2022;148:93–103

work page 2022
[50]

How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al

Breukelen GJP van, Candel MJJM. How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al. International Journal of Epidemiology. 2018 Jun 1;47(3):998–1001

work page 2018
[51]

Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations

Ford WP, Westgate PM. Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Statistics in Medicine. 2020;39(21):2779–2792. Page 36 of 36

work page 2020
[52]

Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure

Li F. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Statistics in Medicine. 2020;39(4):438–455

work page 2020
[53]

Sample size considerations for stepped wedge designs with subclusters

Davis-Plourde K, Taljaard M, Li F. Sample size considerations for stepped wedge designs with subclusters. Biometrics. 2023;79(1):98–112

work page 2023
[54]

Substantial risks associated with few clusters in cluster randomized and stepped wedge designs

Taljaard M, Teerenstra S, Ivers NM, Fergusson DA. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clinical Trials. SAGE Publications; 2016 Aug 1;13(4):459–463

work page 2016
[55]

Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care

Eldridge SM, Ashby D, Feder GS, Rudnicka AR, Ukoumunne OC. Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care. Clinical Trials. 2004;1(1):80–90

work page 2004
[56]

Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method

Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. International Journal of Epidemiology. 2006 Oct 1;35(5):1292–1300

work page 2006
[57]

Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials

Breukelen GJP van, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Statistics in Medicine. 2007;26(13):2589–2603

work page 2007
[58]

Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity

Tong G, Esserman D, Li F. Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2022;41(8):1376–1396

work page 2022
[59]

Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances

Tong G, Taljaard M, Li F. Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances. Statistics in Medicine. 2023;42(19):3392–3412

work page 2023
[60]

Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression

Candel MJJM, Van Breukelen GJP. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Statistics in Medicine. 2010;29(14):1488–1501

work page 2010
[61]

Calculating sample sizes for cluster randomized trials: We can keep it simple and efficient! Journal of Clinical Epidemiology

Breukelen GJP van, Candel MJJM. Calculating sample sizes for cluster randomized trials: We can keep it simple and efficient! Journal of Clinical Epidemiology. 2012 Nov 1;65(11):1212–1218

work page 2012
[62]

Cluster randomised crossover trials with binary data and unbalanced cluster sizes: Application to studies of near-universal interventions in intensive care

Forbes AB, Akram M, Pilcher D, Cooper J, Bellomo R. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: Application to studies of near-universal interventions in intensive care. Clinical Trials. 2015;12(1):34–44

work page 2015
[63]

Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling

Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Statistics in Medicine. 2018;37(30):4652–4664

work page 2018

[1] [1]

Design and analysis of group-randomized trials

Murray DM. Design and analysis of group-randomized trials. Oxford Univrsity Press, USA; 1998

work page 1998

[2] [2]

Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design

Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design. American Journal of Public Health. 2017;107(6):907–915

work page 2017

[3] [3]

Methods for sample size determination in cluster randomized trials

Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. International Journal of Epidemiology. 2015 June 1;44(3):1051–1067

work page 2015

[4] [4]

Sample size calculators for planning stepped- wedge cluster randomized trials: a review and comparison

Ouyang Y, Li F, Preisser JS, Taljaard M. Sample size calculators for planning stepped- wedge cluster randomized trials: a review and comparison. International Journal of Epidemiology. 2022 Dec 1;51(6):2000–2013

work page 2022

[5] [6]

Designing three-level cluster randomized trials to assess treatment effect heterogeneity

Li F, Chen X, Tian Z, Esserman D, Heagerty PJ, Wang R. Designing three-level cluster randomized trials to assess treatment effect heterogeneity. Biostatistics. 2022 July;24(4):833–849

work page 2022

[6] [9]

Sample size requirements for testing treatment effect heterogeneity in cluster randomized trials with binary outcomes

Maleyeff L, Wang R, Haneuse S, Li F. Sample size requirements for testing treatment effect heterogeneity in cluster randomized trials with binary outcomes. Statistics in Medicine. 2023;42(27):5054–5083

work page 2023

[7] [10]

Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials

Wang X, Goldfeld KS, Taljaard M, Li F. Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials. Prev Sci [Internet]. 2023 Oct 10 [cited 2024 Feb 29]; Available from: https://doi.org/10.1007/s11121-023-01590-6

work page doi:10.1007/s11121-023-01590-6 2023

[8] [11]

Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity

Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911. Page 21 of 36

work page 2024

[9] [12]

Sample size and power calculation for testing treatment effect heterogeneity in cluster randomized crossover designs

Wang X, Chen X, Goldfeld KS, Taljaard M, Li F. Sample size and power calculation for testing treatment effect heterogeneity in cluster randomized crossover designs. Statistical Methods in Medical Research. 2024;33(7):1115–1136

work page 2024

[10] [13]

Simple sample size calculation for cluster-randomized trials

Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. International Journal of Epidemiology. 1999 Apr 1;28(2):319–326

work page 1999

[11] [14]

Sample size calculation for cluster randomized cross- over trials

Giraudeau B, Ravaud P, Donner A. Sample size calculation for cluster randomized cross- over trials. Statistics in Medicine. 2008;27(27):5578–5585

work page 2008

[12] [15]

Multi-period crossover trials

Matthews J. Multi-period crossover trials. Statistical Methods in Medical Research. 1994;3(4):383–405

work page 1994

[13] [16]

Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs

Hemming K, Lilford R, Girling AJ. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs. Statistics in Medicine. 2015;34(2):181–196

work page 2015

[14] [17]

Design and analysis of stepped wedge cluster randomized trials

Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials. 2007 Feb 1;28(2):182–191

work page 2007

[15] [18]

Statistical Power and Sample Size Requirements for Three Level Hierarchical Cluster Randomized Trials

Heo M, Leon AC. Statistical Power and Sample Size Requirements for Three Level Hierarchical Cluster Randomized Trials. Biometrics. 2008;64(4):1256–1262

work page 2008

[16] [19]

Individually Randomized Group Treatment Trials: A Critical Appraisal of Frequently Used Design and Analytic Approaches

Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually Randomized Group Treatment Trials: A Critical Appraisal of Frequently Used Design and Analytic Approaches. American Journal of Public Health. 2008;98(8):1418–1424

work page 2008

[17] [20]

Cluster randomised trials with repeated cross sections: alternatives to parallel group designs

Hooper R, Bourke L. Cluster randomised trials with repeated cross sections: alternatives to parallel group designs. BMJ [Internet]. 2015;350. Available from: https://www.bmj.com/content/350/bmj.h2925

work page 2015

[18] [21]

Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches

Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials. 2015 Aug 17;16(1):352

work page 2015

[19] [22]

Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care

NIA IMPACT Collaboratory. Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care. National Institutes of Health: Bethesda, Maryland. 2022

work page 2022

[20] [24]

A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator

Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 June 1;49(3):979– 995. Page 22 of 36

work page 2020

[21] [25]

Cohort versus cross-sectional design in large field trials: Precision, sample size, and a unifying model

Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: Precision, sample size, and a unifying model. Statistics in Medicine. 1994;13(1):61–78

work page 1994

[22] [26]

Statistical analysis and optimal design for cluster randomized trials

Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychological Methods. US: American Psychological Association; 1997;2(2):173–185

work page 1997

[23] [28]

Lumbar Imaging With Reporting Of Epidemiology (LIRE)—Protocol for a pragmatic cluster randomized trial

Jarvik JG, Comstock BA, James KT, et al. Lumbar Imaging With Reporting Of Epidemiology (LIRE)—Protocol for a pragmatic cluster randomized trial. Contemporary Clinical Trials. 2015;45:157–163

work page 2015

[24] [30]

Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials

Mbekwe Yepnang AM, Caille A, Eldridge SM, Giraudeau B. Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials. Stat Methods Med Res. SAGE Publications Ltd STM; 2021 Aug 1;30(8):1988–2003

work page 2021

[25] [31]

Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data

Gulliford MC, Adams G, Ukoumunne OC, Latinovic R, Chinn S, Campbell MJ. Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data. Journal of Clinical Epidemiology. 2005 Mar 1;58(3):246–251

work page 2005

[26] [32]

Reflection on modern methods: when is a stepped-wedge cluster randomized trial a good study design choice? International Journal of Epidemiology

Hemming K, Taljaard M. Reflection on modern methods: when is a stepped-wedge cluster randomized trial a good study design choice? International Journal of Epidemiology. 2020 June 1;49(3):1043–1052

work page 2020

[27] [33]

Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs

Kasza J, Bowden R, Forbes AB. Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs. Statistics in Medicine. 2021;40(7):1736–1751

work page 2021

[28] [34]

Model misspecification in stepped wedge trials: Random effects for time or treatment

Voldal EC, Xia F, Kenny A, Heagerty PJ, Hughes JP. Model misspecification in stepped wedge trials: Random effects for time or treatment. Statistics in Medicine. 2022;41(10):1751–1766

work page 2022

[29] [35]

A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models

Kasza J, Hooper R, Copas A, Forbes AB. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Statistics in Medicine. 2020;39(13):1871– 1883. Page 23 of 36 Supplementary material for “A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials wit...

work page 2020

[30] [36]

Effects of a High-Intensity Functional Exercise Program on Dependence in Activities of Daily Living and Balance in Older Adults with Dementia

Toots A, Littbrand H, Lindelöf N, et al. Effects of a High-Intensity Functional Exercise Program on Dependence in Activities of Daily Living and Balance in Older Adults with Dementia. Journal of the American Geriatrics Society. 2016;64(1):55–64

work page 2016

[31] [37]

Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials

Yang S, Li F, Starks MA, Hernandez AF, Mentz RJ, Choudhury KR. Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials. Statistics in Medicine. 2020;39(28):4218–4237

work page 2020

[32] [38]

Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity

Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911

work page 2024

[33] [39]

Patterns of intra-cluster correlation from primary care research to inform study design and analysis

Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. Journal of Clinical Epidemiology. 2004 Aug 1;57(8):785–794. Page 35 of 36

work page 2004

[34] [40]

Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research

Campbell MK, Fayers PM, Grimshaw JM. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clinical Trials. 2005;2(2):99–107

work page 2005

[35] [41]

Clustering in surgical trials - database of intracluster correlations

Cook JA, Bruckner T, MacLennan GS, Seiler CM. Clustering in surgical trials - database of intracluster correlations. Trials. 2012 Jan 4;13(1):2

work page 2012

[36] [42]

Intra-cluster correlations from the CLustered OUtcome Dataset bank to inform the design of longitudinal cluster trials

Korevaar E, Kasza J, Taljaard M, et al. Intra-cluster correlations from the CLustered OUtcome Dataset bank to inform the design of longitudinal cluster trials. Clinical Trials. 2021;18(5):529–540

work page 2021

[37] [43]

Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial

Ouyang Y, Hemming K, Li F, Taljaard M. Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial. International Journal of Epidemiology. 2023 Oct 1;52(5):1634–1647

work page 2023

[38] [44]

Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care

Yelland LN, Salter AB, Ryan P, Laurence CO. Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care. Clinical Trials. 2011;8(1):48–58

work page 2011

[39] [45]

A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator

Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 Jun 1;49(3):979– 995

work page 2020

[40] [46]

Estimates of intra-cluster correlation coefficients from 2018 USA Medicare data to inform the design of cluster randomized trials in Alzheimer’s and related dementias

Ouyang Y, Li F, Li X, Bynum J, Mor V, Taljaard M. Estimates of intra-cluster correlation coefficients from 2018 USA Medicare data to inform the design of cluster randomized trials in Alzheimer’s and related dementias. Trials. 2024 Oct 30;25(1):732

work page 2018

[41] [47]

Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials

Kasza J, Bowden R, Ouyang Y, Taljaard M, Forbes AB. Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials. Statistical Methods in Medical Research. 2023;32(11):2123–2134

work page 2023

[42] [48]

Barriers and facilitators to patient recruitment to a cluster randomized controlled trial in primary care: lessons for future trials

Foster JM, Sawyer SM, Smith L, Reddel HK, Usherwood T. Barriers and facilitators to patient recruitment to a cluster randomized controlled trial in primary care: lessons for future trials. BMC Med Res Methodol. 2015 Mar 12;15(1):18

work page 2015

[43] [49]

Recruitment and implementation challenges were common in stepped-wedge cluster randomized trials: Results from a methodological review

Caille A, Taljaard M, Vilain—Abraham FL, et al. Recruitment and implementation challenges were common in stepped-wedge cluster randomized trials: Results from a methodological review. Journal of Clinical Epidemiology. 2022;148:93–103

work page 2022

[44] [50]

How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al

Breukelen GJP van, Candel MJJM. How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al. International Journal of Epidemiology. 2018 Jun 1;47(3):998–1001

work page 2018

[45] [51]

Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations

Ford WP, Westgate PM. Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Statistics in Medicine. 2020;39(21):2779–2792. Page 36 of 36

work page 2020

[46] [52]

Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure

Li F. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Statistics in Medicine. 2020;39(4):438–455

work page 2020

[47] [53]

Sample size considerations for stepped wedge designs with subclusters

Davis-Plourde K, Taljaard M, Li F. Sample size considerations for stepped wedge designs with subclusters. Biometrics. 2023;79(1):98–112

work page 2023

[48] [54]

Substantial risks associated with few clusters in cluster randomized and stepped wedge designs

Taljaard M, Teerenstra S, Ivers NM, Fergusson DA. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clinical Trials. SAGE Publications; 2016 Aug 1;13(4):459–463

work page 2016

[49] [55]

Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care

Eldridge SM, Ashby D, Feder GS, Rudnicka AR, Ukoumunne OC. Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care. Clinical Trials. 2004;1(1):80–90

work page 2004

[50] [56]

Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method

Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. International Journal of Epidemiology. 2006 Oct 1;35(5):1292–1300

work page 2006

[51] [57]

Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials

Breukelen GJP van, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Statistics in Medicine. 2007;26(13):2589–2603

work page 2007

[52] [58]

Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity

Tong G, Esserman D, Li F. Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2022;41(8):1376–1396

work page 2022

[53] [59]

Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances

Tong G, Taljaard M, Li F. Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances. Statistics in Medicine. 2023;42(19):3392–3412

work page 2023

[54] [60]

Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression

Candel MJJM, Van Breukelen GJP. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Statistics in Medicine. 2010;29(14):1488–1501

work page 2010

[55] [61]

Calculating sample sizes for cluster randomized trials: We can keep it simple and efficient! Journal of Clinical Epidemiology

Breukelen GJP van, Candel MJJM. Calculating sample sizes for cluster randomized trials: We can keep it simple and efficient! Journal of Clinical Epidemiology. 2012 Nov 1;65(11):1212–1218

work page 2012

[56] [62]

Cluster randomised crossover trials with binary data and unbalanced cluster sizes: Application to studies of near-universal interventions in intensive care

Forbes AB, Akram M, Pilcher D, Cooper J, Bellomo R. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: Application to studies of near-universal interventions in intensive care. Clinical Trials. 2015;12(1):34–44

work page 2015

[57] [63]

Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling

Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Statistics in Medicine. 2018;37(30):4652–4664

work page 2018