Statistical inference with win statistics in cluster-randomized trials with composite outcomes
Pith reviewed 2026-05-10 03:47 UTC · model grok-4.3
The pith
Multiple testing procedures for win statistics in cluster-randomized trials with composite outcomes control type I error while providing power in simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that for win statistics in CRTs, procedures such as permutation tests and those using clustered jackknife variance estimates achieve nominal type I error rates across practical settings, while Wald tests based on rank sums may require adjustments, allowing reliable inference on treatment effects from pairwise comparisons that prioritize more important outcomes.
What carries the argument
Win statistics (win ratio, win odds, net benefit, DOOR) computed from pairwise win/loss comparisons in hierarchical composites, with inference via cluster-adjusted variance estimators and permutation methods.
If this is right
- Researchers analyzing CRTs can select from the surveyed procedures based on simulation performance for their specific cluster number and correlation.
- The methods complement each other to describe treatment benefit strength beyond a single p-value.
- Implementation in the WinsCRT R package facilitates application to new trials.
- Reanalysis of existing data like STRIDE becomes straightforward with these tools.
Where Pith is reading between the lines
- The simulation results imply that for trials with few clusters, permutation procedures may be safer choices.
- Extending these methods to time-to-event composites with heavy censoring could be a next step for broader applicability.
- If the procedures perform well, win statistics may become standard for composite outcomes in clustered settings where traditional models struggle with hierarchy.
Load-bearing premise
The finite-sample behaviors observed in the chosen simulation scenarios with specific cluster sizes, correlations, and censoring patterns will hold for actual cluster-randomized trial datasets.
What would settle it
A simulation study with cluster sizes or intracluster correlations outside the tested range showing inflation of type I error above 0.05 for the recommended procedures would falsify the generalizability of the performance claims.
Figures
read the original abstract
Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys testing procedures for win statistics (win ratio, win odds, net benefit, and DOOR) in parallel-arm cluster-randomized trials with hierarchical composite outcomes. It compares Wald tests (based on cluster rank sums and bivariate clustered U-statistics), cluster jackknife variance tests, score permutation tests, permutation procedures with analytical variance, and likelihood ratio tests derived from jackknife estimates. Finite-sample type I error and power are characterized via simulations that vary cluster size, intracluster correlation, and censoring-induced ties, with an application to reanalysis of the STRIDE pragmatic CRT and an accompanying WinsCRT R package.
Significance. If the simulation results hold under the conditions examined, the work fills a clear gap by providing practical guidance on inference for win statistics in CRTs, where clustered dependence and composite hierarchical outcomes complicate standard methods. The inclusion of multiple procedures, coverage of relevant simulation factors (cluster size, ICC, ties), a real-data example, and open-source software are strengths that support adoption by practitioners.
major comments (1)
- [Simulation studies] Simulation studies section: the finite-sample type I error and power characterizations for the Wald, jackknife, permutation, and LRT procedures rest solely on simulations without stated asymptotic normality/consistency results for the clustered U-statistics or jackknife estimators under the CRT design, nor explicit validity conditions (e.g., minimum number of clusters, upper bound on ICC, or handling of censoring ties). This limits the ability to determine when the reported error rates remain reliable for real CRT data whose dependence structures may deviate from the simulated grid.
minor comments (1)
- Consider adding a dedicated software section or vignette-style example in the WinsCRT R package documentation to illustrate implementation of each testing procedure on the STRIDE data or a simulated example, improving reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and positive evaluation of our manuscript. We respond to the major comment below and indicate the revisions made.
read point-by-point responses
-
Referee: [Simulation studies] Simulation studies section: the finite-sample type I error and power characterizations for the Wald, jackknife, permutation, and LRT procedures rest solely on simulations without stated asymptotic normality/consistency results for the clustered U-statistics or jackknife estimators under the CRT design, nor explicit validity conditions (e.g., minimum number of clusters, upper bound on ICC, or handling of censoring ties). This limits the ability to determine when the reported error rates remain reliable for real CRT data whose dependence structures may deviate from the simulated grid.
Authors: We thank the referee for highlighting this important aspect. Our manuscript is primarily focused on providing practical guidance through extensive finite-sample simulations, as the derivation of new asymptotic normality and consistency results for these win statistics under clustered randomized trial designs with composite outcomes is technically involved and would extend the paper considerably. However, we agree that stating validity conditions would be beneficial. In the revised manuscript, we have added a new subsection in the simulation studies section that references existing asymptotic results for clustered U-statistics (citing relevant works on rank-based statistics in dependent data) and explicitly lists the ranges of cluster sizes, ICC values, and tie proportions covered in our simulations as the conditions under which the procedures are recommended. We have also expanded the discussion to note potential limitations when data deviate from these, such as very small numbers of clusters or extreme ICC. This provides practitioners with clearer guidance without requiring full new theoretical derivations. revision: partial
Circularity Check
No circularity; procedures derived from standard U-statistics and permutation theory with simulation-based characterization
full rationale
The paper surveys and compares testing procedures (Wald, jackknife, permutation, LRT) for win statistics in CRTs. These are constructed from established clustered U-statistics, jackknife variance, and permutation methods without any self-definitional loops or fitted inputs renamed as predictions. Finite-sample type I error and power are assessed via simulations across cluster sizes, ICC, and censoring; this is an independent empirical check, not a reduction to the paper's own equations. No load-bearing self-citations or uniqueness theorems from the authors' prior work are invoked to force the results. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard regularity conditions for asymptotic normality of clustered U-statistics and validity of permutation tests under the null.
Reference graph
Works this paper leans on
-
[1]
European heart journal , volume=
The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities , author=. European heart journal , volume=. 2012 , publisher=
work page 2012
-
[2]
A generalized two-sample Wilcoxon test for doubly censored data , author=. Biometrika , volume=. 1965 , publisher=
work page 1965
-
[3]
Statistics in medicine , volume=
Combining mortality and longitudinal measures in clinical trials , author=. Statistics in medicine , volume=. 1999 , publisher=
work page 1999
-
[4]
An Evaluation of Weighted Chi-Square Statistics for Clustered Binary Data,
Sample size determination for win statistics in cluster-randomized trials , author=. arXiv preprint arXiv:2510.22709 , year=
-
[5]
Japanese Journal of Statistics and Data Science , volume=
Inference on win ratio for cluster-randomized semi-competing risk data , author=. Japanese Journal of Statistics and Data Science , volume=. 2021 , publisher=
work page 2021
-
[6]
the Annals of Statistics , volume=
Empirical likelihood and general estimating equations , author=. the Annals of Statistics , volume=. 1994 , publisher=
work page 1994
-
[7]
Journal of the American Statistical Association , volume=
Jackknife empirical likelihood , author=. Journal of the American Statistical Association , volume=. 2009 , publisher=
work page 2009
-
[8]
Hanxiang Peng and Fei Tan , title =. Bernoulli , number =. 2018 , doi =
work page 2018
- [9]
-
[10]
Journal of the National Cancer Institute , volume=
Design and analysis of group-randomized trials in cancer: a review of current practices , author=. Journal of the National Cancer Institute , volume=. 2008 , publisher=
work page 2008
-
[11]
Academic emergency medicine , volume=
Advanced statistics: statistical methods for analyzing cluster and cluster-randomized data , author=. Academic emergency medicine , volume=. 2002 , publisher=
work page 2002
-
[12]
Statistics in medicine , volume=
Use of composite endpoints in clinical trials , author=. Statistics in medicine , volume=. 2014 , publisher=
work page 2014
-
[13]
Center for Biologics Evaluation and Research (CBER) , year=
Multiple endpoints in clinical trials guidance for industry , author=. Center for Biologics Evaluation and Research (CBER) , year=
-
[14]
Statistical methods for composite endpoints: statistical methods for composite endpoints , author=. EuroIntervention , volume=
-
[15]
Statistics in Biopharmaceutical Research , year=
The win ratio: on interpretation and handling of ties , author=. Statistics in Biopharmaceutical Research , year=
-
[16]
Pharmaceutical Statistics , volume=
Parametric and nonparametric methods for confidence intervals and sample size planning for win probability in parallel-group randomized trials with Likert item and Likert scale data , author=. Pharmaceutical Statistics , volume=. 2023 , publisher=
work page 2023
-
[17]
Statistics in medicine , volume=
Generalized pairwise comparisons of prioritized outcomes in the two-sample problem , author=. Statistics in medicine , volume=. 2010 , publisher=
work page 2010
-
[18]
Statistics in Medicine , volume=
Sample Size and Power Calculations With Win Measures Based on Hierarchical Endpoints , author=. Statistics in Medicine , volume=. 2025 , publisher=
work page 2025
-
[19]
On the alternative hypotheses for the win ratio , author=. Biometrics , volume=. 2019 , publisher=
work page 2019
-
[20]
An alternative approach to confidence interval estimation for the win ratio statistic , author=. Biometrics , volume=. 2015 , publisher=
work page 2015
-
[21]
Statistics in medicine , volume=
Use of the Mann--Whitney U-test for clustered data , author=. Statistics in medicine , volume=. 1999 , publisher=
work page 1999
-
[22]
Large sample inference for a win ratio analysis of a composite outcome based on prioritized components , author=. Biostatistics , volume=. 2016 , publisher=
work page 2016
-
[23]
Statistical Methods in Medical Research , pages=
Rank-based estimators of global treatment effects for cluster randomized trials with multiple endpoints on different scales , author=. Statistical Methods in Medical Research , pages=. 2025 , publisher=
work page 2025
-
[24]
New England journal of medicine , volume=
A randomized trial of a multifactorial strategy to prevent serious fall injuries , author=. New England journal of medicine , volume=. 2020 , publisher=
work page 2020
-
[25]
The Journals of Gerontology: Series A , volume=
Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE): a cluster-randomized pragmatic trial of a multifactorial fall injury prevention strategy: design and methods , author=. The Journals of Gerontology: Series A , volume=. 2018 , publisher=
work page 2018
-
[26]
Composite outcomes in randomized trials: greater precision but with greater uncertainty? , author=. Jama , volume=. 2003 , publisher=
work page 2003
-
[27]
Journal of biopharmaceutical statistics , volume=
Addressing multiplicity issues of a composite endpoint and its components in clinical trials , author=. Journal of biopharmaceutical statistics , volume=. 2011 , publisher=
work page 2011
-
[28]
Statistics in Biopharmaceutical Research , volume=
Statistical models for composite endpoints of death and nonfatal events: a review , author=. Statistics in Biopharmaceutical Research , volume=. 2021 , publisher=
work page 2021
-
[29]
Statistics in Medicine , volume=
Confidence interval estimation for treatment effects in cluster randomization trials based on ranks , author=. Statistics in Medicine , volume=. 2021 , publisher=
work page 2021
-
[30]
Statistics in Medicine , volume=
Win odds: an adaptation of the win ratio to include ties , author=. Statistics in Medicine , volume=. 2021 , publisher=
work page 2021
-
[31]
Clinical Infectious Diseases , volume=
Desirability of outcome ranking (DOOR) and response adjusted for duration of antibiotic risk (RADAR) , author=. Clinical Infectious Diseases , volume=. 2015 , publisher=
work page 2015
-
[32]
arXiv preprint arXiv:2603.02003 , year=
Analysis of Stepped-Wedge Randomised Cluster Trial using a generalized pairwise comparison approach: a simulation study , author=. arXiv preprint arXiv:2603.02003 , year=
-
[33]
Confidence interval estimation for the win probability in cluster randomized trials with hierarchical composite endpoints using win fractions , author=. Clinical Trials , pages=. 2026 , publisher=
work page 2026
-
[34]
arXiv preprint arXiv:2602.11403 , year=
Who's Winning? Clarifying Estimands Based on Win Statistics in Cluster Randomized Trials , author=. arXiv preprint arXiv:2602.11403 , year=
-
[35]
Statistical Methods in Medical Research , volume=
A comparison of analytical strategies for cluster randomized trials with survival outcomes in the presence of competing risks , author=. Statistical Methods in Medical Research , volume=. 2022 , publisher=
work page 2022
-
[36]
Statistics in Medicine , volume=
Finite-sample adjustments in variance estimators for clustered competing risks regression , author=. Statistics in Medicine , volume=. 2022 , publisher=
work page 2022
-
[37]
arXiv preprint arXiv:2601.13428 , year=
Optimal estimation of generalized causal effects in cluster-randomized trials with multiple outcomes , author=. arXiv preprint arXiv:2601.13428 , year=
-
[38]
EunYi Chung and Joseph P. Romano , title =. The Annals of Statistics , number =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.