On Cluster Randomized Trials with the Desirability of Outcome Ranking (DOOR) Endpoints

Guoqing Diao; Scott Evans; Toshimitsu Hamasaki; Wanying Shao

arxiv: 2604.24032 · v2 · submitted 2026-04-27 · 📊 stat.ME

On Cluster Randomized Trials with the Desirability of Outcome Ranking (DOOR) Endpoints

Wanying Shao , Toshimitsu Hamasaki , Scott Evans , Guoqing Diao This is my paper

Pith reviewed 2026-05-08 02:12 UTC · model grok-4.3

classification 📊 stat.ME

keywords cluster randomized trialsDOORU-statisticsinfluence functionstreatment effectsordinal outcomes

0 comments

The pith

New methods extend the Desirability of Outcome Ranking framework to cluster randomized trials using U-statistics and influence functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops methods to apply the DOOR endpoint analysis to cluster randomized trials. It uses properties of U-statistics and influence functions to estimate within-cluster and between-cluster treatment effects. This extension handles cases with mixed clusters, single-group clusters, and varying cluster numbers and sizes. Simulations confirm the methods work well, and an example from a newborn care trial illustrates the approach.

Core claim

We propose a suite of new methods to extend DOOR to cluster trials based on properties of U-statistics and influence functions to estimate within-cluster and between-cluster treatment effects. These approaches can be applied in different scenarios, including mixtures of clusters with two treatment groups and clusters with only one group, and both small and large numbers of clusters.

What carries the argument

U-statistics and influence functions adapted for clustered data to compute DOOR-based treatment comparisons.

If this is right

The methods enable DOOR analysis in cluster trials where individual randomization is not feasible.
Performance is validated through simulations for different cluster sizes and numbers.
Applicable to real-world examples like comparing medical procedures in a crossover trial.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These estimators could support patient-centric analyses in public health trials that must use cluster designs.
Extensions might examine performance under varying strengths of intra-cluster correlation.

Load-bearing premise

That the properties of U-statistics and influence functions extend directly to clustered data without bias from intra-cluster correlation or unbalanced cluster sizes.

What would settle it

A simulation or real dataset where the new estimators show significant bias due to high intra-cluster correlation or varying cluster sizes would falsify the reliability of the methods.

Figures

Figures reproduced from arXiv: 2604.24032 by Guoqing Diao, Scott Evans, Toshimitsu Hamasaki, Wanying Shao.

**Figure 1.** Figure 1: Examples of between-cluster variability. Each dot represents one observation. view at source ↗

**Figure 2.** Figure 2: Type I error rates for the test based on view at source ↗

**Figure 3.** Figure 3: Type I error rates for the tests based on view at source ↗

**Figure 4.** Figure 4: Type I error rates for the tests based on view at source ↗

**Figure 5.** Figure 5: Powers for the test based on Wcb for testing H0 : Db = 0.5 under the large sample size scenarios (1. n=50, m=16; 2. n=100, m=8; 3. n=200, m=4) for one-group randomization based on 10,000 replicates. Panels (A)-(F) correspond to ρc=0.001, 0.02, 0.06, 0.1, 0.3, and 0.5, respectively. 36 view at source ↗

**Figure 6.** Figure 6: Powers for the tests based on Wfw, Wcb, Wfmax and Wfwt for testing H0 : Dw = 0.5, H0 : Db = 0.5, H0 : Db = Dw = 0.5, and H0 : Db = Dw = 0.5, respectively, under the large sample size scenarios (1. n=50, m=16; 2. n=100, m=8; 3. n=200, m=4) for two-group randomization based on 10,000 replicates. Panels (A)-(F) correspond to ρc=0.001, 0.02, 0.06, 0.1, 0.3, and 0.5, respectively. 37 view at source ↗

**Figure 7.** Figure 7: Powers for the tests based on Wfw, Wcb, Wfmax and Wfwt for testing H0 : Dw = 0.5, H0 : Db = 0.5, H0 : Db = Dw = 0.5, and H0 : Db = Dw = 0.5, respectively, under the large sample size scenarios (1. n=50, m=16; 2. n=100, m=8; 3. n=200, m=4) for mixture randomization based on 10,000 replicates. Panels (A)-(F) correspond to ρc=0.001, 0.02, 0.06, 0.1, 0.3, and 0.5, respectively. 38 view at source ↗

**Figure 8.** Figure 8: Results of the MINVI study DOOR analysis. view at source ↗

read the original abstract

Cluster randomized trials are widely used when individual randomization is logistically infeasible or when correlations between observations cannot be ignored, especially in fields such as ophthalmology, infectious disease, vaccine research, and sociology. The desirability of outcome ranking (DOOR) framework evaluates patient-centric benefit-risk using an ordinal outcome and a Wilcoxon-Mann-Whitney statistic-based approach to compare outcome distributions between interventions. We propose a suite of new methods to extend DOOR to cluster trials based on properties of U-statistics and influence functions to estimate within-cluster and between-cluster treatment effects. These approaches can be applied in different scenarios, including mixtures of clusters with two treatment groups and clusters with only one group, and both small and large numbers of clusters. Simulations demonstrate that the proposed methods perform well under various scenarios regarding the number of clusters and cluster sizes. As an illustration, we apply the proposed methods to a cluster randomized crossover trial comparing delayed cord clamping and umbilical cord milking for newborns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts DOOR to cluster trials with U-stats but the variance may not hold up for unbalanced clusters.

read the letter

The one thing to take away is that the authors have developed estimators for DOOR endpoints in cluster randomized trials using U-statistics and influence functions, with coverage for mixed cluster designs. They do well in setting up the within-cluster and between-cluster comparisons and in running simulations across different numbers of clusters and sizes. The example from the delayed cord clamping trial illustrates the methods in a crossover setting, which helps ground the work. Where it gets softer is the handling of dependence. The stress test points out that plain U-statistic theory assumes independent observations, but clustered data requires projecting the kernels to the cluster level and adjusting the influence functions for intra-cluster correlation. If the paper does not include explicit weighting by cluster size or a cluster-robust variance, the estimates could lose consistency when sizes are unbalanced or when the ICC is not small. The abstract says the simulations look good, but that does not replace checking the regularity conditions in the proofs. This paper targets methodologists in biostatistics who work on cluster trials with ordinal outcomes. A reader needing to implement patient-centric analyses in such trials would find practical guidance here. It engages honestly with the literature on U-statistics and DOOR, so it qualifies for serious peer review. I would recommend sending it out rather than desk rejecting it.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a suite of methods to extend the Desirability of Outcome Ranking (DOOR) framework to cluster randomized trials. It develops estimators for within-cluster and between-cluster treatment effects based on U-statistics and influence functions, applicable to mixtures of two-arm and single-arm clusters as well as small and large numbers of clusters. Performance is assessed via simulations across varying numbers of clusters and cluster sizes, with an illustration using data from a cluster randomized crossover trial on delayed cord clamping versus umbilical cord milking.

Significance. If the estimators prove consistent and the variance estimates robust, this would fill a practical gap by enabling patient-centric DOOR analyses in clustered designs common to infectious disease, ophthalmology, and social research. The grounding in U-statistic theory and the provision of both simulation results and a real-data example are strengths that could support broader adoption if the clustering adjustments are rigorously derived.

major comments (2)

[Abstract] Abstract: The claim that standard U-statistic and influence-function machinery extends directly to produce valid estimators across mixed cluster types and unbalanced sizes lacks any statement of the required cluster-level regularity conditions or the explicit form of the cluster-robust projection of the kernel. Standard U-statistic theory assumes i.i.d. observations; without this projection the variance estimators are at risk of inconsistency precisely in the regimes (unbalanced sizes, non-negligible ICC, small numbers of clusters) highlighted as target scenarios.
[Abstract] Abstract: The assertion that 'simulations demonstrate that the proposed methods perform well under various scenarios' supplies no details on simulation design (range of ICC values, cluster-size distributions, performance metrics such as bias or coverage), error-bar reporting, or explicit checks for bias induced by intra-cluster correlation. This information is load-bearing for evaluating whether the central methodological claim holds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments highlight opportunities to strengthen the abstract's clarity on theoretical foundations and empirical support. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that standard U-statistic and influence-function machinery extends directly to produce valid estimators across mixed cluster types and unbalanced sizes lacks any statement of the required cluster-level regularity conditions or the explicit form of the cluster-robust projection of the kernel. Standard U-statistic theory assumes i.i.d. observations; without this projection the variance estimators are at risk of inconsistency precisely in the regimes (unbalanced sizes, non-negligible ICC, small numbers of clusters) highlighted as target scenarios.

Authors: We appreciate the referee drawing attention to this. The full manuscript defines the estimators via cluster-level U-statistics whose kernels are projected onto the cluster to induce the appropriate dependence structure, yielding cluster-robust influence functions that remain consistent under intra-cluster correlation, unbalanced cluster sizes, and mixtures of two-arm and single-arm clusters. Regularity conditions (finite moments of the outcome and suitable rates for the number of clusters) are stated in the theoretical development. We agree the abstract would be improved by a brief reference to the cluster-robust projection and the cluster-level regularity conditions; we will revise it accordingly. revision: yes
Referee: [Abstract] Abstract: The assertion that 'simulations demonstrate that the proposed methods perform well under various scenarios' supplies no details on simulation design (range of ICC values, cluster-size distributions, performance metrics such as bias or coverage), error-bar reporting, or explicit checks for bias induced by intra-cluster correlation. This information is load-bearing for evaluating whether the central methodological claim holds.

Authors: We agree that the abstract is concise and omits key simulation details. The manuscript contains a dedicated simulation section that systematically varies the number of clusters, cluster sizes (including unbalanced distributions), and intra-cluster correlation, evaluating bias, empirical standard errors, coverage of confidence intervals, and type-I error rates. We will revise the abstract to include a short summary of the simulation design and the performance metrics examined, thereby better supporting the reported findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation extends established U-statistic theory without self-referential reduction

full rationale

The paper derives new within-cluster and between-cluster DOOR estimators for cluster randomized trials by extending standard U-statistic and influence-function properties to clustered data structures. The central claims rest on applying these established tools to handle mixtures of two-arm and single-arm clusters, with no evidence that any key quantity is defined in terms of itself, that a fitted parameter is relabeled as a prediction, or that load-bearing steps reduce to self-citations whose validity depends on the present work. Simulations and the real-data illustration supply independent checks. The derivation chain therefore remains self-contained against external statistical theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on adapting standard U-statistic and influence function theory to clustered data; no free parameters, invented entities, or non-standard axioms are mentioned in the abstract.

axioms (1)

domain assumption Properties of U-statistics and influence functions extend to within-cluster and between-cluster treatment effect estimation in DOOR endpoints
The proposed methods are explicitly based on these properties as stated in the abstract.

pith-pipeline@v0.9.0 · 5472 in / 1219 out tokens · 20995 ms · 2026-05-08T02:12:31.048138+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

An Evaluation of Weighted Chi-Square Statistics for Clustered Binary Data,

Ahn, C., Jung, S.-H., and Kang, S.-H. (2003), “An Evaluation of Weighted Chi-Square Statistics for Clustered Binary Data,”Drug Information Journal, 37, 91–99. Buyse, M. (2010), “Generalized Pairwise Comparisons of Prioritized Outcomes in the Two- Sample Problem,”Statistics in Medicine, 29, 3245–3257. Chamberlain, J. M., Kapur, J., Silbergleit, R. S., Elm,...

work page arXiv 2003
[2]

Sample Size Calculations for Clustered Binary Data,

Hunter, D. R. (2014),Notes for a Graduate-Level Course in Asymptotics for Statisticians, The Pennsylvania State University. Jennison, C. and Turnbull, B. W. (2000),Group Sequential Methods with Applications to Clinical Trials, Boca Raton, FL: Chapman & Hall/CRC. Jung, S.-H. (2024),Cluster Randomization Trials: Statistical Design and Analysis, New York: Ch...

work page 2014
[3]

Between-and within-cluster covariate effects in the analysis of clustered data,

Neuhaus, J. M. and Kalbfleisch, J. D. (1998), “Between-and within-cluster covariate effects in the analysis of clustered data,”Biometrics, 638–645. Pall, B., Gomes, P., Yi, F., and Torkildsen, G. (2019), “Management of ocular allergy itch with an antihistamine-releasing contact lens,”Cornea, 38, 713–717. Price, M. O., Feng, M. T., and Price Jr, F. W. (202...

work page 1998

[1] [1]

An Evaluation of Weighted Chi-Square Statistics for Clustered Binary Data,

Ahn, C., Jung, S.-H., and Kang, S.-H. (2003), “An Evaluation of Weighted Chi-Square Statistics for Clustered Binary Data,”Drug Information Journal, 37, 91–99. Buyse, M. (2010), “Generalized Pairwise Comparisons of Prioritized Outcomes in the Two- Sample Problem,”Statistics in Medicine, 29, 3245–3257. Chamberlain, J. M., Kapur, J., Silbergleit, R. S., Elm,...

work page arXiv 2003

[2] [2]

Sample Size Calculations for Clustered Binary Data,

Hunter, D. R. (2014),Notes for a Graduate-Level Course in Asymptotics for Statisticians, The Pennsylvania State University. Jennison, C. and Turnbull, B. W. (2000),Group Sequential Methods with Applications to Clinical Trials, Boca Raton, FL: Chapman & Hall/CRC. Jung, S.-H. (2024),Cluster Randomization Trials: Statistical Design and Analysis, New York: Ch...

work page 2014

[3] [3]

Between-and within-cluster covariate effects in the analysis of clustered data,

Neuhaus, J. M. and Kalbfleisch, J. D. (1998), “Between-and within-cluster covariate effects in the analysis of clustered data,”Biometrics, 638–645. Pall, B., Gomes, P., Yi, F., and Torkildsen, G. (2019), “Management of ocular allergy itch with an antihistamine-releasing contact lens,”Cornea, 38, 713–717. Price, M. O., Feng, M. T., and Price Jr, F. W. (202...

work page 1998