pith. machine review for the scientific record. sign in

arxiv: 2604.18323 · v1 · submitted 2026-04-20 · 📊 stat.ME

Recognition: unknown

Which Small-Sample Correction Should Be Used When Analyzing Stepped-Wedge Designs with Time-Varying Treatment Effects?

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:53 UTC · model grok-4.3

classification 📊 stat.ME
keywords stepped-wedge designrobust variance estimatorsmall-sample correctiontime-varying treatment effectcluster randomized trialexposure-time indicator modelMancl-DeRouen estimatorMorel-Bokossa-Neerchal estimator
0
0 comments X

The pith

When random effects are misspecified in stepped-wedge trials with time-varying effects, the Mancl-DeRouen estimator restores coverage for continuous outcomes while the Morel-Bokossa-Neerchal estimator does so for binary outcomes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Stepped-wedge cluster randomized trials often use models that assume treatment effects begin right at crossover and stay constant. Exposure-time indicator models relax this by letting effects differ according to time since exposure, which allows separate estimation of the time-averaged treatment effect and the long-term effect. These models still depend on random-effects assumptions that are commonly wrong in practice, and model-based standard errors then produce undercoverage. Simulations compare four robust variance estimators across continuous and binary outcomes and show that certain small-sample corrections recover proper coverage while others remain unstable, especially for the long-term effect.

Core claim

Exposure-time indicator models target the time-averaged treatment effect and long-term effect in stepped-wedge designs when effects vary with exposure duration. Under misspecified random-effects structures, model-based standard errors undercover, but robust variance estimators improve performance. For continuous outcomes the Mancl-DeRouen estimator paired with a t-distribution whose degrees of freedom equal the number of clusters minus two yields the most consistent coverage; for binary outcomes the Morel-Bokossa-Neerchal estimator is the only consistently reliable choice. Both model-based and robust approaches remain unstable when targeting the long-term effect.

What carries the argument

Exposure-time indicator (ETI) models combined with robust variance estimators (classic sandwich, Kauermann-Carroll, Mancl-DeRouen, Morel-Bokossa-Neerchal) that adjust standard errors for small numbers of clusters and possible random-effects misspecification.

If this is right

  • Model-based standard errors produce undercoverage for both the time-averaged and long-term effects when random effects are misspecified.
  • For continuous outcomes the Mancl-DeRouen estimator with t-distribution and degrees of freedom equal to clusters minus two gives consistent coverage across scenarios.
  • For binary outcomes the Morel-Bokossa-Neerchal estimator is the only small-sample correction that remains reliable.
  • The Mancl-DeRouen estimator can become unstable in one-cluster-per-sequence designs because of data sparsity.
  • Inference on the long-term effect stays unstable whether model-based or robust standard errors are used.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Trialists who expect effects to change with exposure duration should fit exposure-time indicator models with the recommended robust corrections instead of immediate-treatment models.
  • Designs with few clusters or one cluster per sequence may still require additional safeguards beyond these estimators when targeting long-term effects.
  • Similar robust-variance recommendations could be tested in other longitudinal cluster designs such as crossover trials that also feature time-dependent exposures.

Load-bearing premise

The specific simulation scenarios, including the chosen forms of random-effects misspecification and the data-generating processes for time-varying effects, adequately represent conditions in real stepped-wedge cluster randomized trials.

What would settle it

Empirical coverage of nominal 95 percent intervals computed from a real stepped-wedge trial dataset whose true time-varying effects are known independently, checked separately for the Mancl-DeRouen and Morel-Bokossa-Neerchal estimators against the model-based standard errors.

read the original abstract

Stepped-wedge cluster randomized trials (SW-CRTs) evaluate interventions rolled out across clusters over time. Standard analyses typically use immediate-treatment (IT) models, which assume effects begin at crossover and remain constant thereafter. When effects vary with exposure duration, IT models may misrepresent target effects. Exposure-time indicator (ETI) models address this by allowing treatment effects to differ by time since exposure and by targeting the time-averaged treatment effect (TATE) and long-term effect (LTE). Like IT models, ETI models require specification of a random-effects structure, which is often misspecified, and the performance of robust variance estimators (RVEs) in this setting is not well understood. We review RVEs for ETI models and evaluate them in simulation studies with continuous and binary outcomes under correctly specified (binary only) and misspecified random-effects structures. We compare the classic sandwich, Kauermann-Carroll (KC), Mancl-DeRouen (MD), and Morel-Bokossa-Neerchal (MBN) estimators for inference on the TATE and LTE. Our simulations show that under misspecified random-effects structures, model-based standard errors (SE) produced undercoverage, whereas RVEs improved performance. For continuous outcomes, MD with a t-distribution and degrees of freedom equal to the number of clusters minus two gave the most consistent coverage probabilities. For binary outcomes, MBN was the only consistently reliable option. MD, however, could be unstable in one-cluster-per-sequence designs because of data sparsity. Across scenarios, both model-based SE and RVE for LTE were unstable, indicating that greater caution is needed when targeting LTE under ETI models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper evaluates robust variance estimators (classic sandwich, KC, MD, MBN) for exposure-time indicator (ETI) models in stepped-wedge cluster randomized trials (SW-CRTs) that allow time-varying treatment effects. It targets the time-averaged treatment effect (TATE) and long-term effect (LTE) under misspecified random-effects structures via Monte Carlo simulations for continuous and binary outcomes. Key results indicate model-based SEs undercover while RVEs improve coverage; MD with t-distribution (df = clusters-2) performs best for continuous outcomes, MBN is most reliable for binary, but LTE estimates are unstable across scenarios and MD can be unstable in sparse one-cluster-per-sequence designs.

Significance. If the simulation findings hold, the work offers timely practical guidance for choosing small-sample corrections in SW-CRT analyses when random-effects assumptions are violated and time-varying effects are present. The Monte Carlo design directly compares finite-sample performance of multiple RVEs on both TATE and LTE, addressing a gap where theoretical results are limited; this empirical evidence can help analysts avoid undercoverage in real trials.

major comments (2)
  1. [Methods / Simulation design] Simulation design (Methods section): the data-generating processes specify particular random-effects misspecifications and exposure-time effect patterns, but no sensitivity analyses are reported for stronger cluster-size heterogeneity, higher ICC variability, or non-monotonic time-varying effects outside the simulated envelope. Because the headline recommendations (MD+t best for continuous; MBN only reliable for binary) rest entirely on these grids, limited coverage of realistic SW-CRT conditions weakens generalizability of the performance claims.
  2. [Results] Results on LTE (Results section): both model-based SE and all RVEs for the long-term effect are reported as unstable across scenarios, yet the paper provides no quantitative thresholds (e.g., minimum number of clusters or periods) or alternative estimators that would make LTE inference reliable. This instability directly affects the central claim that ETI models can target LTE, so explicit guidance or caveats are needed to avoid over-interpretation.
minor comments (2)
  1. [Introduction] The abstract and introduction use TATE and LTE without an early formal definition or equation; adding a brief display equation in the Introduction would improve readability for readers unfamiliar with ETI models.
  2. [Results] Table captions (Results) should explicitly state the number of Monte Carlo replications and the exact coverage target (e.g., 95%) so that reported probabilities can be interpreted without returning to the Methods text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript evaluating robust variance estimators for exposure-time indicator models in stepped-wedge cluster randomized trials. The feedback on simulation scope and long-term effect stability is helpful, and we have revised the paper accordingly. We address each major comment point by point below.

read point-by-point responses
  1. Referee: Simulation design (Methods section): the data-generating processes specify particular random-effects misspecifications and exposure-time effect patterns, but no sensitivity analyses are reported for stronger cluster-size heterogeneity, higher ICC variability, or non-monotonic time-varying effects outside the simulated envelope. Because the headline recommendations (MD+t best for continuous; MBN only reliable for binary) rest entirely on these grids, limited coverage of realistic SW-CRT conditions weakens generalizability of the performance claims.

    Authors: We agree that the simulation grid does not cover every conceivable SW-CRT variation, including more extreme cluster-size heterogeneity, wider ICC ranges, or non-monotonic exposure-time patterns. Our design prioritized representative misspecifications and monotonic patterns to generate practical recommendations under conditions where random-effects assumptions are violated. To improve transparency, we will expand the Discussion section with an explicit description of the simulated envelope and clear statements limiting the applicability of the MD+t and MBN recommendations to settings similar to those examined. revision: partial

  2. Referee: Results on LTE (Results section): both model-based SE and all RVEs for the long-term effect are reported as unstable across scenarios, yet the paper provides no quantitative thresholds (e.g., minimum number of clusters or periods) or alternative estimators that would make LTE inference reliable. This instability directly affects the central claim that ETI models can target LTE, so explicit guidance or caveats are needed to avoid over-interpretation.

    Authors: We acknowledge that the observed instability of LTE estimates across scenarios requires more explicit guidance to prevent over-interpretation. The manuscript already states that greater caution is warranted when targeting the LTE, but we will add quantitative summaries drawn from the existing simulation results (e.g., scenarios in which coverage for the LTE approached nominal levels only when the number of clusters exceeded a certain threshold) and will recommend considering immediate-treatment models when long-term effects are the primary target and time-varying patterns are plausible. These additions will be placed in the Results and Discussion sections. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on independent Monte Carlo simulations

full rationale

The paper evaluates robust variance estimators for exposure-time indicator models in stepped-wedge trials exclusively through simulation studies that generate data under controlled random-effects misspecifications and compare coverage and stability of model-based SEs versus RVEs (KC, MD, MBN). These simulations constitute external benchmarks independent of the target performance metrics; no equations reduce TATE or LTE inference to fitted parameters by construction, no self-definitional loops appear in the estimator definitions, and no load-bearing self-citations or imported uniqueness theorems are invoked to force the reported conclusions. The derivation chain is therefore self-contained as an empirical comparison rather than a tautological renaming or fit.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on simulation-based evaluation rather than analytical proofs. The data-generating processes assume an exposure-time indicator structure with added misspecification only in the random-effects component.

free parameters (1)
  • Simulation design parameters (number of clusters, periods, effect sizes, intraclass correlations)
    Chosen by the authors to represent typical stepped-wedge trial settings and varied across scenarios to test estimator performance.
axioms (1)
  • domain assumption The exposure-time indicator model correctly captures the time-varying treatment effects in the data-generating process
    Invoked when generating simulated data to isolate the effect of random-effects misspecification on the estimators.

pith-pipeline@v0.9.0 · 5618 in / 1438 out tokens · 75296 ms · 2026-05-10T03:53:03.129979+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 7 canonical work pages

  1. [1]

    London: Arnold; 2000

    Donner Allan.Design and analysis of cluster randomization trials in health research. London: Arnold; 2000. Book Title: Design and analysis of cluster randomization trials in health research

  2. [2]

    OuyangYongdong,HemmingKarla,LiFan,TaljaardMonica.Estimatingintra-clustercorrelationcoefficientsforplanning longitudinal cluster randomized trials: a tutorial.International Journal of Epidemiology.2023;:dyad062

  3. [3]

    Design and analysis of stepped wedge cluster randomized trials.Contemporary Clinical Trials.2007;28(2):182–191

    Hussey Michael A., Hughes James P.. Design and analysis of stepped wedge cluster randomized trials.Contemporary Clinical Trials.2007;28(2):182–191

  4. [4]

    CopasAndrewJ.,LewisJamesJ.,ThompsonJenniferA.,DaveyCalum,BaioGianluca,HargreavesJamesR..Designinga stepped wedge trial: three main designs, carry-over effects and randomisation approaches.Trials.2015;16(1):352

  5. [5]

    Mixed-effects mod- els for the design and analysis of stepped wedge cluster randomized trials: An overview.Statistical Methods in Medical Research.2021;30(2):612–639

    Li Fan, Hughes James P, Hemming Karla, Taljaard Monica, Melnick Edward R., Heagerty Patrick J. Mixed-effects mod- els for the design and analysis of stepped wedge cluster randomized trials: An overview.Statistical Methods in Medical Research.2021;30(2):612–639

  6. [6]

    Ouyang Yongdong, Taljaard Monica, Forbes Andrew B, Li Fan. Maintaining the validity of inference from linear mixed models in stepped-wedge cluster randomized trials under misspecified random-effects structures.Statistical Methods in Medical Research.2024;33(9):1497–1516

  7. [7]

    Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.Statistics in Medicine.2016;35(26):4718–4728

    Hooper Richard, Teerenstra Steven, Hoop Esther, Eldridge Sandra. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.Statistics in Medicine.2016;35(26):4718–4728

  8. [8]

    Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models

    Girling Alan J., Hemming Karla. Statistical efficiency and optimal design for stepped cluster stud- ies under linear mixed effects models.Statistics in Medicine.2016;35(13):2149–2166. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.6850

  9. [9]

    KaszaJ.,HemmingK.,HooperR.,MatthewsJns,ForbesA.B..Impactofnon-uniformcorrelationstructureonsamplesize and power in multiple-period cluster randomised trials.Statistical Methods in Medical Research.2019;28(3):703–716

  10. [10]

    Model misspecification in stepped wedge trials: Random effects for time or treatment.Statistics in Medicine.2022;41(10):1751–1766

    Voldal Emily C., Xia Fan, Kenny Avi, Heagerty Patrick J., Hughes James P.. Model misspecification in stepped wedge trials: Random effects for time or treatment.Statistics in Medicine.2022;41(10):1751–1766

  11. [11]

    Sample size calculators for planning stepped-wedge cluster randomized trials: a review and comparison.International Journal of Epidemiology.2022;:dyac123

    Ouyang Yongdong, Li Fan, Preisser John S, Taljaard Monica. Sample size calculators for planning stepped-wedge cluster randomized trials: a review and comparison.International Journal of Epidemiology.2022;:dyac123

  12. [12]

    Accounting for complex intra- clustercorrelationsinlongitudinalclusterrandomizedtrials:acasestudyinmalariavectorcontrol.BMCMedicalResearch Methodology.2023;23(1):64

    Ouyang Yongdong, Kulkarni Manisha A., Protopopoff Natacha, Li Fan, Taljaard Monica. Accounting for complex intra- clustercorrelationsinlongitudinalclusterrandomizedtrials:acasestudyinmalariavectorcontrol.BMCMedicalResearch Methodology.2023;23(1):64

  13. [13]

    Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples.Trials.2017;18(1):101

    Hemming Karla, Taljaard Monica, Forbes Andrew. Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples.Trials.2017;18(1):101

  14. [14]

    Contemporary Clinical Trials.2015;45(Pt A):55–60

    HughesJamesP.,GranstonTanyaS.,HeagertyPatrickJ..Currentissuesinthedesignandanalysisofsteppedwedgetrials. Contemporary Clinical Trials.2015;45(Pt A):55–60

  15. [15]

    KennyAvi,VoldalEmilyC.,XiaFan,HeagertyPatrickJ.,HughesJamesP..Analysisofsteppedwedgeclusterrandomized trials in the presence of a time-varying treatment effect.Statistics in Medicine.2022;41(22):4311–4339

  16. [16]

    Assessing exposure-time treatment effect heterogeneity in stepped- wedge cluster randomized trials.Biometrics.2023;79(3):2551–2564

    Maleyeff Lara, Li Fan, Haneuse Sebastien, Wang Rui. Assessing exposure-time treatment effect heterogeneity in stepped- wedge cluster randomized trials.Biometrics.2023;79(3):2551–2564

  17. [17]

    How to achieve model-robust inference in stepped wedge trials with model-based methods?.Biometrics.2024;80(4):ujae123

    Wang Bingkai, Wang Xueqi, Li Fan. How to achieve model-robust inference in stepped wedge trials with model-based methods?.Biometrics.2024;80(4):ujae123

  18. [18]

    Adherence to key recommendations for design and analysis of stepped-wedge cluster randomized trials: A review of trials published 2016–2022.Clinical Trials.2024;21(2):199–210

    Nevins Pascale, Ryan Mary, Davis-Plourde Kendra, et al. Adherence to key recommendations for design and analysis of stepped-wedge cluster randomized trials: A review of trials published 2016–2022.Clinical Trials.2024;21(2):199–210. 22 Ouyang et al

  19. [19]

    TongGuangyu,NevinsPascale,RyanMary,etal.Areviewofcurrentpracticeinthedesignandanalysisofextremelysmall stepped-wedge cluster randomized trials.Clinical Trials (London, England).2025;22(1):45–56

  20. [20]

    Inference for the treatment effect in staircase designs with continuous outcomes: a simulation study.BMC medical research methodology.2025;25(1):127

    Rezaei-Darzi Ehsan, Grantham Kelsey L., Forbes Andrew B., Kasza Jessica. Inference for the treatment effect in staircase designs with continuous outcomes: a simulation study.BMC medical research methodology.2025;25(1):127

  21. [21]

    LiangKung-Yee,ZegerScottL..LongitudinalDataAnalysisUsingGeneralizedLinearModels.Biometrika.1986;73(1):13– 22

  22. [22]

    The fixed-effects model for robust analysis of stepped-wedge cluster trials with a small number of clusters and continuous outcomes: a simulation study.Trials.2024;25(1):718

    Lee Kenneth Menglin, Cheung Yin Bun. The fixed-effects model for robust analysis of stepped-wedge cluster trials with a small number of clusters and continuous outcomes: a simulation study.Trials.2024;25(1):718

  23. [23]

    Scott JoAnna M, deCamp Allan, Juraska Michal, Fay Michael P, Gilbert Peter B. Finite-sample corrected generalized esti- mating equation of population average treatment effects in stepped wedge cluster randomized trials.Statistical Methods in Medical Research.2017;26(2):583–597

  24. [24]

    FordWhitneyP.,WestgatePhilipM..Maintainingthevalidityofinferenceinsmall-samplesteppedwedgeclusterrandom- ized trials with binary outcomes when using generalized estimating equations.Statistics in Medicine.2020;39(21):2779–

  25. [25]

    _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.8575

  26. [26]

    Comparison of small-sample standard-error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: A simulation study

    Thompson JA, Hemming K, Forbes A, Fielding K, Hayes R. Comparison of small-sample standard-error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: A simulation study. Statistical Methods in Medical Research.2021;30(2):425–439

  27. [27]

    ThompsonJ.A.,DaveyC.,FieldingK.,HargreavesJ.R.,HayesR.J..Robustanalysisofsteppedwedgetrialsusingcluster- level summaries within periods.Statistics in Medicine.2018;37(16):2487–2500

  28. [28]

    Randomization-based inference for a marginal treatment effect in stepped wedge cluster randomized trials.Statistics in Medicine.2021;40(20):4442–4456

    Rabideau Dustin J., Wang Rui. Randomization-based inference for a marginal treatment effect in stepped wedge cluster randomized trials.Statistics in Medicine.2021;40(20):4442–4456

  29. [29]

    Small sample validity of latent variable models for cor- related binary data.Communications in Statistics - Simulation and Computation.1994;23(1):243–269

    Qu Yinsheng, Piedmonte Marion R, Williams George V. Small sample validity of latent variable models for cor- related binary data.Communications in Statistics - Simulation and Computation.1994;23(1):243–269. _eprint: https://doi.org/10.1080/03610919408813167

  30. [30]

    On Anticipation Effect in Stepped Wedge Cluster Randomized Trials.Statistics in Medicine.2026;45(3-5):e70380

    Wang Hao, Chen Xinyuan, Courtright Katherine R., et al. On Anticipation Effect in Stepped Wedge Cluster Randomized Trials.Statistics in Medicine.2026;45(3-5):e70380. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.70380

  31. [31]

    PustejovskyJames.clubSandwich:Cluster-Robust(Sandwich)VarianceEstimatorswithSmall-SampleCorrections.2022

  32. [32]

    ZegerS.L.,LiangK.Y..Longitudinaldataanalysisfordiscreteandcontinuousoutcomes.Biometrics.1986;42(1):121–130

  33. [33]

    Bias reduction in standard errors for linear regression with multi-stage samples

    Bell Robert M., McCaffrey Daniel F. Bias reduction in standard errors for linear regression with multi-stage samples. Statistics Canada.2002;(12-001-XIE)

  34. [34]

    A., DeRouen T

    Mancl L. A., DeRouen T. A.. A covariance estimator for GEE with improved small-sample properties.Biometrics. 2001;57(1):126–134

  35. [35]

    2003;45(4):395–409

    MorelJ.g.,BokossaM.c.,NeerchalN.k..SmallSampleCorrectionfortheVarianceofGEEEstimators.BiometricalJournal. 2003;45(4):395–409. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/bimj.200390021

  36. [36]

    P., Graubard B

    Fay M. P., Graubard B. I.. Small-sample adjustments for Wald-type tests using sandwich estimators.Biometrics. 2001;57(4):1198–1206

  37. [37]

    Small-Sample Methods for Cluster-Robust Variance Estimation and Hypoth- esis Testing in Fixed Effects Models.Journal of Business & Economic Statistics.2018;36(4):672–683

    Pustejovsky James E., Tipton Elizabeth. Small-Sample Methods for Cluster-Robust Variance Estimation and Hypoth- esis Testing in Fixed Effects Models.Journal of Business & Economic Statistics.2018;36(4):672–683. _eprint: https://doi.org/10.1080/07350015.2016.1247004. Ouyang et al. 23

  38. [38]

    HughesJamesP.,LeeWen-Yu,TroxelAndreaB.,HeagertyPatrickJ..SampleSizeCalculationsforSteppedWedgeDesigns with Treatment Effects that May Change with the Duration of Time under Intervention.Prevention Science: The Official Journal of the Society for Prevention Research.2024;25(Suppl 3):348–355

  39. [39]

    KatzJoanne,TielschJamesM.,KhatrySubarnaK.,etal.ImpactofImprovedBiomassandLiquidPetroleumGasStoveson BirthOutcomesinRuralNepal:Resultsof2RandomizedTrials.GlobalHealth,ScienceandPractice.2020;8(3):372–382

  40. [40]

    Peiris David, Praveen Devarsetty, Mogulluru Kishor, et al. SMARThealth India: A stepped-wedge, cluster randomised controlledtrialofacommunityhealthworkermanagedmobilehealthinterventionforpeopleassessedathighcardiovascular disease risk in rural India.PloS One.2019;14(3):e0213708

  41. [41]

    Robust inference for the stepped wedge design.Biometrics

    Hughes James P., Heagerty Patrick J., Xia Fan, Ren Yuqi. Robust inference for the stepped wedge design.Biometrics. 2020;76(1):119–130

  42. [42]

    Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples.Statistics in Medicine.2016;35(10):1706–1721

    Wang Ming, Kong Lan, Li Zheng, Zhang Lijun. Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples.Statistics in Medicine.2016;35(10):1706–1721

  43. [43]

    2016;51(4):495–518

    McNeishDaniel,StapletonLauraM..ModelingClusteredDatawithVeryFewClusters.MultivariateBehavioralResearch. 2016;51(4):495–518

  44. [44]

    The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration.Educational Psychology Review.2016;28(2):295–314

    McNeish Daniel M., Stapleton Laura M.. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration.Educational Psychology Review.2016;28(2):295–314

  45. [45]

    Kenny Avi, Voldal Emily C., Xia Fan, Chan Kwun Chuen Gary, Heagerty Patrick J., Hughes James P..Factors affecting power in stepped wedge trials when the treatment effect varies with time.arXiv:2503.11472 [stat] version: 1; 2025

  46. [46]

    LeeKennethM.,TurnerElizabethL.,KennyAvi.AnalysisofStepped-WedgeClusterRandomizedTrialsWhenTreatment Effects Vary by Exposure Time or Calendar Time.Statistics in Medicine.2025;44(20-22):e70256

  47. [47]

    Grand rounds in methodology: improving the design of staggered implementation cluster randomised trials.BMJ quality & safety.2025;34(9):631–636

    Watson Samuel I., Hooper Richard. Grand rounds in methodology: improving the design of staggered implementation cluster randomised trials.BMJ quality & safety.2025;34(9):631–636

  48. [48]

    The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge.Statistical Methods in Medical Research.2024;33(1):24–41

    Grantham Kelsey L, Forbes Andrew B, Hooper Richard, Kasza Jessica. The staircase cluster randomised trial design: A pragmatic alternative to the stepped wedge.Statistical Methods in Medical Research.2024;33(1):24–41