A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models
Pith reviewed 2026-05-23 04:38 UTC · model grok-4.3
The pith
This tutorial consolidates sample size and power formulas for testing treatment effect heterogeneity in cluster randomized trials via linear mixed models and supplies an R Shiny calculator.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors consolidate separate power and sample size formulas for testing treatment-covariate interactions or differences in subpopulation-specific treatment effects in cluster randomized trials using linear mixed effects models, demonstrate their application through an R Shiny calculator, and highlight the sensitivity of results to accurate intracluster correlation estimates for both outcomes and covariates.
What carries the argument
The online R Shiny calculator that implements the design-specific sample size and power formulas for HTE testing in CRTs with LME models, taking as inputs the relevant ICCs, effect sizes, and cluster parameters.
If this is right
- Trial designers can now calculate the number of clusters and cluster sizes needed to power pre-specified HTE analyses in the main CRT designs.
- Power estimates become strongly dependent on the chosen ICC values for both the outcome and the covariate.
- The same consolidated approach covers continuous and binary outcomes across parallel, crossover, and stepped-wedge structures.
- The calculator lowers the practical barrier to performing these calculations before a trial begins.
Where Pith is reading between the lines
- Future methodological work could test whether the formulas remain accurate when the linear mixed model assumptions are mildly violated in real cluster data.
- Routine collection of pilot estimates for covariate ICCs, in addition to outcome ICCs, would become a standard part of CRT planning.
- The calculator framework could be extended to allow users to upload their own simulation-based power checks for non-standard designs.
Load-bearing premise
Users will be able to supply accurate estimates of the intracluster correlation coefficients for both the outcome and the effect-modifying covariate.
What would settle it
Running the published formulas by hand for a stepped-wedge design with a continuous outcome and finding that the calculator's output power differs by more than sampling error from the hand calculation for the same inputs.
read the original abstract
Cluster-randomized trials (CRTs) are a well-established class of designs for evaluating community-based interventions. An essential task in planning these trials is determining the number of clusters and cluster sizes needed to achieve sufficient statistical power for detecting a clinically relevant effect size. While methods for evaluating the average treatment effect (ATE) for the entire study population are well-established, sample size methods for testing heterogeneity of treatment effects (HTEs), i.e., treatment-covariate interaction or difference in subpopulation-specific treatment effects, in CRTs have only recently been developed. For pre-specified analyses of HTEs in CRTs, effect-modifying covariates should, ideally, be accompanied by sample size or power calculations to ensure the trial has adequate power for the planned analyses. Power analysis for testing HTEs is more complex than for ATEs due to the additional design parameters that must be specified. Power and sample size formulas for testing HTEs via linear mixed effects (LME) models have been separately derived for different cluster-randomized designs, including single and multi-period parallel designs, crossover designs, and stepped-wedge designs, and for continuous and binary outcomes. This tutorial provides a consolidated reference guide for these methods and enhances their accessibility through an online R Shiny calculator. We further discuss key considerations for conducting sample size and power calculations to test pre-specified HTE hypotheses in CRTs, highlighting the importance of specifying advanced estimates of intracluster correlation coefficients for both outcomes and covariates, and their implications for power. The sample size methodology and calculator functionality are demonstrated through a real CRT example.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This tutorial consolidates power and sample size formulas for testing pre-specified treatment effect heterogeneity (HTE) via linear mixed models in cluster-randomized trials. It covers single- and multi-period parallel designs, crossover designs, and stepped-wedge designs, for both continuous and binary outcomes. The paper provides an R Shiny calculator to implement the methods, discusses practical issues including the need for accurate intracluster correlation coefficients (ICCs) for the outcome and the effect-modifying covariate, and demonstrates the approach with a real CRT example.
Significance. If the cited derivations are represented accurately and the calculator implements them correctly, the manuscript supplies a consolidated, accessible reference that fills a practical gap: while ATE power methods for CRTs are mature, HTE methods have appeared only recently and separately. The explicit foregrounding of ICC sensitivity for both outcome and covariate, together with the online tool, should improve the quality of sample-size planning for HTE analyses in future CRTs. The provision of reproducible code (Shiny app) is a clear strength.
minor comments (3)
- [Introduction / §2] The abstract and introduction state that formulas were 'separately derived' for different designs and outcomes; a short table or appendix listing the original references for each formula (with equation numbers) would help readers trace the derivations without searching the cited papers.
- [Shiny calculator section] In the description of the Shiny app inputs, the mapping between user-supplied ICC values and the variance components appearing in the power formulas is not shown explicitly; adding a small schematic or equation reference next to each input field would reduce the chance of mis-specification.
- [Example section] The real-CRT example reports power curves but does not tabulate the exact ICC values used for the outcome and covariate; including these numerical values (and the source of the estimates) would allow readers to reproduce the displayed results directly from the formulas.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript and for recommending acceptance. We are pleased that the consolidation of power and sample size methods for HTE testing in CRTs, along with the R Shiny calculator and emphasis on ICC sensitivity, is recognized as addressing a practical gap.
Circularity Check
No significant circularity; tutorial consolidates external derivations
full rationale
The paper is explicitly a tutorial that consolidates power and sample size formulas previously derived separately for HTE testing in CRTs across designs and outcome types. It makes no new first-principles derivations or predictions that reduce to its own fitted inputs or self-citations. The abstract states the formulas 'have been separately derived' and positions the contribution as a reference guide plus R Shiny calculator, with discussion of ICC sensitivity as a practical point. No load-bearing step equates outputs to inputs by construction, and the reader's assessment of score 0.0 aligns with the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- intracluster correlation coefficients for outcome and covariate
axioms (1)
- domain assumption Linear mixed effects models appropriately capture the clustering structure in CRT data for both outcomes and covariates.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean; IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction; washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Power and sample size formulas for testing HTEs via linear mixed effects (LME) models have been separately derived for different cluster-randomized designs... highlighting the importance of specifying advanced estimates of intracluster correlation coefficients for both outcomes and covariates
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean; IndisputableMonolith/Constantsabsolute_floor_iff_bare_distinguishability; phi_golden_ratio unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Table 2: Summary of HTE variance formulas... σ_HTE² = σ_ATE² × (1−α₁)/{1+(m−2)α₁−(m−1)ρ₁α₁}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Design and analysis of group-randomized trials
Murray DM. Design and analysis of group-randomized trials. Oxford Univrsity Press, USA; 1998
work page 1998
-
[2]
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design
Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design. American Journal of Public Health. 2017;107(6):907–915
work page 2017
-
[3]
Methods for sample size determination in cluster randomized trials
Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. International Journal of Epidemiology. 2015 June 1;44(3):1051–1067
work page 2015
-
[4]
Ouyang Y, Li F, Preisser JS, Taljaard M. Sample size calculators for planning stepped- wedge cluster randomized trials: a review and comparison. International Journal of Epidemiology. 2022 Dec 1;51(6):2000–2013
work page 2022
-
[6]
Designing three-level cluster randomized trials to assess treatment effect heterogeneity
Li F, Chen X, Tian Z, Esserman D, Heagerty PJ, Wang R. Designing three-level cluster randomized trials to assess treatment effect heterogeneity. Biostatistics. 2022 July;24(4):833–849
work page 2022
-
[9]
Maleyeff L, Wang R, Haneuse S, Li F. Sample size requirements for testing treatment effect heterogeneity in cluster randomized trials with binary outcomes. Statistics in Medicine. 2023;42(27):5054–5083
work page 2023
-
[10]
Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials
Wang X, Goldfeld KS, Taljaard M, Li F. Sample Size Requirements to Test Subgroup- Specific Treatment Effects in Cluster-Randomized Trials. Prev Sci [Internet]. 2023 Oct 10 [cited 2024 Feb 29]; Available from: https://doi.org/10.1007/s11121-023-01590-6
-
[11]
Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity
Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911. Page 21 of 36
work page 2024
-
[12]
Wang X, Chen X, Goldfeld KS, Taljaard M, Li F. Sample size and power calculation for testing treatment effect heterogeneity in cluster randomized crossover designs. Statistical Methods in Medical Research. 2024;33(7):1115–1136
work page 2024
-
[13]
Simple sample size calculation for cluster-randomized trials
Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. International Journal of Epidemiology. 1999 Apr 1;28(2):319–326
work page 1999
-
[14]
Sample size calculation for cluster randomized cross- over trials
Giraudeau B, Ravaud P, Donner A. Sample size calculation for cluster randomized cross- over trials. Statistics in Medicine. 2008;27(27):5578–5585
work page 2008
-
[15]
Matthews J. Multi-period crossover trials. Statistical Methods in Medical Research. 1994;3(4):383–405
work page 1994
-
[16]
Hemming K, Lilford R, Girling AJ. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs. Statistics in Medicine. 2015;34(2):181–196
work page 2015
-
[17]
Design and analysis of stepped wedge cluster randomized trials
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials. 2007 Feb 1;28(2):182–191
work page 2007
-
[18]
Heo M, Leon AC. Statistical Power and Sample Size Requirements for Three Level Hierarchical Cluster Randomized Trials. Biometrics. 2008;64(4):1256–1262
work page 2008
-
[19]
Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually Randomized Group Treatment Trials: A Critical Appraisal of Frequently Used Design and Analytic Approaches. American Journal of Public Health. 2008;98(8):1418–1424
work page 2008
-
[20]
Cluster randomised trials with repeated cross sections: alternatives to parallel group designs
Hooper R, Bourke L. Cluster randomised trials with repeated cross sections: alternatives to parallel group designs. BMJ [Internet]. 2015;350. Available from: https://www.bmj.com/content/350/bmj.h2925
work page 2015
-
[21]
Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches
Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials. 2015 Aug 17;16(1):352
work page 2015
-
[22]
NIA IMPACT Collaboratory. Best Practices for Integrating Health Equity into Embedded Pragmatic Clinical Trials for Dementia Care. National Institutes of Health: Bethesda, Maryland. 2022
work page 2022
-
[24]
Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 June 1;49(3):979– 995. Page 22 of 36
work page 2020
-
[25]
Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: Precision, sample size, and a unifying model. Statistics in Medicine. 1994;13(1):61–78
work page 1994
-
[26]
Statistical analysis and optimal design for cluster randomized trials
Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychological Methods. US: American Psychological Association; 1997;2(2):173–185
work page 1997
-
[28]
Jarvik JG, Comstock BA, James KT, et al. Lumbar Imaging With Reporting Of Epidemiology (LIRE)—Protocol for a pragmatic cluster randomized trial. Contemporary Clinical Trials. 2015;45:157–163
work page 2015
-
[30]
Mbekwe Yepnang AM, Caille A, Eldridge SM, Giraudeau B. Association of intracluster correlation measures with outcome prevalence for binary outcomes in cluster randomised trials. Stat Methods Med Res. SAGE Publications Ltd STM; 2021 Aug 1;30(8):1988–2003
work page 2021
-
[31]
Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data
Gulliford MC, Adams G, Ukoumunne OC, Latinovic R, Chinn S, Campbell MJ. Intraclass correlation coefficient and outcome prevalence are associated in clustered binary data. Journal of Clinical Epidemiology. 2005 Mar 1;58(3):246–251
work page 2005
-
[32]
Hemming K, Taljaard M. Reflection on modern methods: when is a stepped-wedge cluster randomized trial a good study design choice? International Journal of Epidemiology. 2020 June 1;49(3):1043–1052
work page 2020
-
[33]
Kasza J, Bowden R, Forbes AB. Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs. Statistics in Medicine. 2021;40(7):1736–1751
work page 2021
-
[34]
Model misspecification in stepped wedge trials: Random effects for time or treatment
Voldal EC, Xia F, Kenny A, Heagerty PJ, Hughes JP. Model misspecification in stepped wedge trials: Random effects for time or treatment. Statistics in Medicine. 2022;41(10):1751–1766
work page 2022
-
[35]
Kasza J, Hooper R, Copas A, Forbes AB. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Statistics in Medicine. 2020;39(13):1871– 1883. Page 23 of 36 Supplementary material for “A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials wit...
work page 2020
-
[36]
Toots A, Littbrand H, Lindelöf N, et al. Effects of a High-Intensity Functional Exercise Program on Dependence in Activities of Daily Living and Balance in Older Adults with Dementia. Journal of the American Geriatrics Society. 2016;64(1):55–64
work page 2016
-
[37]
Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials
Yang S, Li F, Starks MA, Hernandez AF, Mentz RJ, Choudhury KR. Sample size requirements for detecting treatment effect heterogeneity in cluster randomized trials. Statistics in Medicine. 2020;39(28):4218–4237
work page 2020
-
[38]
Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity
Li F, Chen X, Tian Z, Wang R, Heagerty PJ. Planning stepped wedge cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2024;43(5):890–911
work page 2024
-
[39]
Patterns of intra-cluster correlation from primary care research to inform study design and analysis
Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intra-cluster correlation from primary care research to inform study design and analysis. Journal of Clinical Epidemiology. 2004 Aug 1;57(8):785–794. Page 35 of 36
work page 2004
-
[40]
Campbell MK, Fayers PM, Grimshaw JM. Determinants of the intracluster correlation coefficient in cluster randomized trials: the case of implementation research. Clinical Trials. 2005;2(2):99–107
work page 2005
-
[41]
Clustering in surgical trials - database of intracluster correlations
Cook JA, Bruckner T, MacLennan GS, Seiler CM. Clustering in surgical trials - database of intracluster correlations. Trials. 2012 Jan 4;13(1):2
work page 2012
-
[42]
Korevaar E, Kasza J, Taljaard M, et al. Intra-cluster correlations from the CLustered OUtcome Dataset bank to inform the design of longitudinal cluster trials. Clinical Trials. 2021;18(5):529–540
work page 2021
-
[43]
Ouyang Y, Hemming K, Li F, Taljaard M. Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial. International Journal of Epidemiology. 2023 Oct 1;52(5):1634–1647
work page 2023
-
[44]
Yelland LN, Salter AB, Ryan P, Laurence CO. Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care. Clinical Trials. 2011;8(1):48–58
work page 2011
-
[45]
Hemming K, Kasza J, Hooper R, Forbes A, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. International Journal of Epidemiology. 2020 Jun 1;49(3):979– 995
work page 2020
-
[46]
Ouyang Y, Li F, Li X, Bynum J, Mor V, Taljaard M. Estimates of intra-cluster correlation coefficients from 2018 USA Medicare data to inform the design of cluster randomized trials in Alzheimer’s and related dementias. Trials. 2024 Oct 30;25(1):732
work page 2018
-
[47]
Kasza J, Bowden R, Ouyang Y, Taljaard M, Forbes AB. Does it decay? Obtaining decaying correlation parameter values from previously analysed cluster randomised trials. Statistical Methods in Medical Research. 2023;32(11):2123–2134
work page 2023
-
[48]
Foster JM, Sawyer SM, Smith L, Reddel HK, Usherwood T. Barriers and facilitators to patient recruitment to a cluster randomized controlled trial in primary care: lessons for future trials. BMC Med Res Methodol. 2015 Mar 12;15(1):18
work page 2015
-
[49]
Caille A, Taljaard M, Vilain—Abraham FL, et al. Recruitment and implementation challenges were common in stepped-wedge cluster randomized trials: Results from a methodological review. Journal of Clinical Epidemiology. 2022;148:93–103
work page 2022
-
[50]
Breukelen GJP van, Candel MJJM. How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al. International Journal of Epidemiology. 2018 Jun 1;47(3):998–1001
work page 2018
-
[51]
Ford WP, Westgate PM. Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Statistics in Medicine. 2020;39(21):2779–2792. Page 36 of 36
work page 2020
-
[52]
Li F. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Statistics in Medicine. 2020;39(4):438–455
work page 2020
-
[53]
Sample size considerations for stepped wedge designs with subclusters
Davis-Plourde K, Taljaard M, Li F. Sample size considerations for stepped wedge designs with subclusters. Biometrics. 2023;79(1):98–112
work page 2023
-
[54]
Substantial risks associated with few clusters in cluster randomized and stepped wedge designs
Taljaard M, Teerenstra S, Ivers NM, Fergusson DA. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clinical Trials. SAGE Publications; 2016 Aug 1;13(4):459–463
work page 2016
-
[55]
Eldridge SM, Ashby D, Feder GS, Rudnicka AR, Ukoumunne OC. Lessons for cluster randomized trials in the twenty-first century: a systematic review of trials in primary care. Clinical Trials. 2004;1(1):80–90
work page 2004
-
[56]
Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. International Journal of Epidemiology. 2006 Oct 1;35(5):1292–1300
work page 2006
-
[57]
Breukelen GJP van, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Statistics in Medicine. 2007;26(13):2589–2603
work page 2007
-
[58]
Tong G, Esserman D, Li F. Accounting for unequal cluster sizes in designing cluster randomized trials to detect treatment effect heterogeneity. Statistics in Medicine. 2022;41(8):1376–1396
work page 2022
-
[59]
Tong G, Taljaard M, Li F. Sample size considerations for assessing treatment effect heterogeneity in randomized trials with heterogeneous intracluster correlations and variances. Statistics in Medicine. 2023;42(19):3392–3412
work page 2023
-
[60]
Candel MJJM, Van Breukelen GJP. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Statistics in Medicine. 2010;29(14):1488–1501
work page 2010
-
[61]
Breukelen GJP van, Candel MJJM. Calculating sample sizes for cluster randomized trials: We can keep it simple and efficient! Journal of Clinical Epidemiology. 2012 Nov 1;65(11):1212–1218
work page 2012
-
[62]
Forbes AB, Akram M, Pilcher D, Cooper J, Bellomo R. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: Application to studies of near-universal interventions in intensive care. Clinical Trials. 2015;12(1):34–44
work page 2015
-
[63]
Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Statistics in Medicine. 2018;37(30):4652–4664
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.