Multiple testing when many $p$-values are uniformly conservative, with application to testing qualitative interaction in educational interventions

Dylan S. Small; Qingyuan Zhao; Weijie Su

read the original abstract

In the evaluation of treatment effects, it is of major policy interest to know if the treatment is beneficial for some and harmful for others, a phenomenon known as qualitative interaction. We formulate this question as a multiple testing problem with many conservative null $p$-values, in which the classical multiple testing methods may lose power substantially. We propose a simple technique---conditioning---to improve the power. A crucial assumption we need is uniform conservativeness, meaning for any conservative $p$-value $p$, the conditional distribution $(p/\tau)\,|\,p \le \tau$ is stochastically larger than the uniform distribution on $(0,1)$ for any $\tau$. We show this property holds for one-sided tests in a one-dimensional exponential family (e.g.\ testing for qualitative interaction) as well as testing $|\mu|\le\eta$ using a statistic $X \sim \mathrm{N}(\mu,1)$ (e.g.\ testing for practical importance with threshold $\eta$). We propose an adaptive method to select the threshold $\tau$. Our theoretical and simulation results suggest the proposed tests gain significant power when many $p$-values are uniformly conservative and lose little power when no $p$-value is uniformly conservative. We apply our method to two educational intervention datasets.

Multiple testing when many p-values are uniformly conservative, with application to testing qualitative interaction in educational interventions

discussion (0)