Rank-adaptive covariance testing with applications to genomics and neuroimaging
Pith reviewed 2026-05-24 06:51 UTC · model grok-4.3
The pith
A test based on the sum of the top singular values of the covariance difference detects low-rank group differences with higher power.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose Rank-Adaptive Covariance Testing (RACT) that employs the Ky-Fan(k) norm of the difference in sample covariances as the test statistic. By adapting k to the data, RACT leverages low-rank differences to achieve higher power while maintaining exact Type I error control through permutation.
What carries the argument
The Ky-Fan(k) norm (sum of the k largest singular values) applied to the matrix difference between two sample covariance estimates, used as a test statistic that is sensitive to low-rank perturbations.
If this is right
- RACT identifies gene-network differences between lung-cancer subtypes more effectively than non-adaptive tests.
- The procedure detects scanner-induced covariance heterogeneity in diffusion-tensor imaging data.
- Simulation studies confirm elevated power precisely when the alternative covariance difference is low-rank.
- The permutation guarantee removes the need for large-sample approximations in high-dimensional regimes.
Where Pith is reading between the lines
- The same norm construction could be applied to tests for equality of correlation matrices or precision matrices.
- Data-driven selection of k might be combined with other matrix norms to gain robustness against model misspecification.
- Similar rank-adaptive ideas could extend two-sample testing to multiple groups or to time-series covariance changes.
Load-bearing premise
Covariance differences between groups are driven primarily by changes in low-rank structures that are only weakly dispersed across many dimensions.
What would settle it
In data simulated from a low-rank covariance alternative, the Ky-Fan-based test shows no power gain over a Frobenius-norm test, or the permutation procedure yields rejection rates above the nominal level under the null.
Figures
read the original abstract
In biomedical studies, testing for differences in covariance offers scientific insights beyond mean differences, especially when differences are driven by complex joint behavior between features. However, when differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures within the data, as is often the case in genomics and neuroimaging, existing two-sample covariance testing methods may suffer from power loss. The Ky-Fan(k) norm, defined by the sum of the top Ky-Fan(k) singular values, is a simple and intuitive matrix norm able to capture signals caused by differences in low-rank structures between matrices, but its statistical properties in hypothesis testing have not been studied well. In this paper, we investigate the behavior of the Ky-Fan(k) norm in two-sample covariance testing. Ultimately, we propose a novel methodology, Rank-Adaptive Covariance Testing (RACT), which is able to leverage differences in low-rank structures found in the covariance matrices of two groups in order to maximize power. RACT uses permutation for statistical inference, ensuring an exact Type I error control. We validate RACT in simulation studies and evaluate its performance when testing for differences in gene expression networks between two types of lung cancer, as well as testing for covariance heterogeneity in diffusion tensor imaging (DTI) data taken on two different scanner types.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Rank-Adaptive Covariance Testing (RACT), a two-sample procedure for covariance matrices that replaces the usual Frobenius or max-norm with the Ky-Fan(k) norm of the difference of sample covariances. The rank k is chosen adaptively to concentrate power on low-rank signal differences, and inference is performed by label permutation on the pooled sample to guarantee exact finite-sample Type I error control. The method is motivated by genomics and neuroimaging settings and is illustrated on lung-cancer gene-expression networks and DTI scanner-comparison data, with supporting simulation results.
Significance. If the power advantage and exact control hold, RACT supplies a practical, non-parametric tool that exploits the low-rank structure commonly present in high-dimensional biomedical covariance matrices. The permutation guarantee is a clear methodological strength, and the two real-data applications demonstrate relevance to the target domains.
major comments (2)
- [§3] §3 (Method): the precise rule for selecting the data-dependent rank k is not stated; without an explicit algorithm it is impossible to verify that the permutation distribution remains exact once k is allowed to depend on the observed matrices.
- [§4] §4 (Simulations): power comparisons are reported against Frobenius and max-norm baselines, but the number of Monte Carlo replications, the precise low-rank signal construction, and standard-error information are omitted, preventing assessment of whether the reported gains are statistically reliable.
minor comments (2)
- [Abstract] Abstract: the phrase 'exact Type I error control' should be qualified by noting that it holds conditionally on the chosen k, even when k is data-dependent.
- [§5] §5 (Applications): the preprocessing steps used to form the sample covariance matrices (centering, scaling, missing-value handling) are not described; these details are needed for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and the helpful comments, which will improve the clarity and completeness of the manuscript. We address each major comment below.
read point-by-point responses
-
Referee: [§3] §3 (Method): the precise rule for selecting the data-dependent rank k is not stated; without an explicit algorithm it is impossible to verify that the permutation distribution remains exact once k is allowed to depend on the observed matrices.
Authors: We agree that an explicit statement of the adaptive rank-selection rule is required. In the revised manuscript we will add a precise algorithmic description of how k is chosen from the observed sample covariance matrices. Because the identical data-dependent rule is applied to every label-permuted replicate, the full test statistic (including the choice of k) remains a symmetric function of the pooled observations. Consequently the permutation distribution continues to be exact under the null of exchangeability, preserving finite-sample Type I error control. We will also insert a short remark making this symmetry explicit. revision: yes
-
Referee: [§4] §4 (Simulations): power comparisons are reported against Frobenius and max-norm baselines, but the number of Monte Carlo replications, the precise low-rank signal construction, and standard-error information are omitted, preventing assessment of whether the reported gains are statistically reliable.
Authors: We thank the referee for noting these omissions. In the revision we will report the exact number of Monte Carlo replications, give a detailed description of the low-rank signal construction used to generate the alternatives, and include standard-error estimates (or confidence intervals) for all reported power values so that the statistical reliability of the observed gains can be assessed directly. revision: yes
Circularity Check
No significant circularity detected
full rationale
The derivation relies on the Ky-Fan(k) norm applied to differences in sample covariance matrices, with k chosen in a rank-adaptive manner and inference performed via label permutation on the pooled sample. Permutation testing yields an exact level-α test under exchangeability without requiring any fitted parameters or self-referential equations; the low-rank modeling premise is an external domain assumption rather than an internal reduction of the test statistic to its inputs. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results appear as load-bearing steps in the provided description. The central claims remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures
Reference graph
Works this paper leans on
-
[1]
amar2013dissection APACrefauthors Amar, D. , Safer, H. \ Shamir, R. APACrefauthors \ 2013 . Dissection of regulatory networks that are altered in disease via differential co-expression Dissection of regulatory networks that are altered in disease via differential co-expression . PLoS Computational Biology 9 3 e1002955
work page 2013
-
[2]
cai2013two APACrefauthors Cai, T. , Liu, W. \ Xia, Y. APACrefauthors \ 2013 . Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings . Journal of the American Statistical Association 108 501 265--277
work page 2013
- [3]
-
[4]
ding2023sampletestcovariancematrices APACrefauthors Ding, X. , Hu, Y. \ Wang, Z. APACrefauthors \ 2024 . Two Sample Test for Covariance Matrices in Ultra-High Dimension Two sample test for covariance matrices in ultra-high dimension . Journal of the American Statistical Association 0 0 1--12
work page 2024
-
[5]
ding2023global APACrefauthors Ding, X. \ Wang, Z. APACrefauthors \ 2025 . Global and local CLTs for linear spectral statistics of general sample covariance matrices when the dimension is much larger than the sample size with applications Global and local clts for linear spectral statistics of general sample covariance matrices when the dimension is much l...
work page 2025
- [6]
-
[7]
ds003011:1.2.0 APACrefauthors Hawco, C. , Dickie, E. , Herman, G. , Turner, J. , Argyan, M. , Homan, P. Voineskos, A. APACrefauthors \ 2021 . Social Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. OpenNeuro. Social Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. OpenNeuro
work page 2021
-
[8]
he2018high APACrefauthors He, J. \ Chen, S X. APACrefauthors \ 2018 . High-dimensional two-sample covariance matrix testing via super-diagonals High-dimensional two-sample covariance matrix testing via super-diagonals . Statistica Sinica 28 4 2671--2696
work page 2018
-
[9]
he2021asymptotically APACrefauthors He, Y. , Xu, G. , Wu, C. \ Pan, W. APACrefauthors \ 2021 . Asymptotically independent U-statistics in high-dimensional testing Asymptotically independent U-statistics in high-dimensional testing . Annals of statistics 49 1 154
work page 2021
-
[10]
hu2023image APACrefauthors Hu, F. , Chen, A A. , Horng, H. , Bashyam, V. , Davatzikos, C. , Alexander-Bloch, A. Shinohara, R T. APACrefauthors \ 2023 . Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization Image harmonization: A review of statistical and deep le...
work page 2023
-
[11]
kuchibhotla2020exchangeability APACrefauthors Kuchibhotla, A K. APACrefauthors \ 2020 . Exchangeability, conformal prediction, and rank tests Exchangeability, conformal prediction, and rank tests . arXiv preprint arXiv:2005.06095
-
[12]
lehmann2021testing APACrefauthors Lehmann, E L. \ Romano, J P. APACrefauthors \ 2021 . Testing statistical hypotheses Testing statistical hypotheses \ ( 4 \ ). Springer
work page 2021
-
[13]
li2012two APACrefauthors Li, J. \ Chen, S X. APACrefauthors \ 2012 . Two sample tests for high-dimensional covariance matrices Two sample tests for high-dimensional covariance matrices . The Annals of Statistics 40 2 908 -- 940
work page 2012
-
[14]
lock2022bidimensional APACrefauthors Lock, E F. , Park, J Y. \ Hoadley, K A. APACrefauthors \ 2022 . Bidimensional linked matrix factorization for pan-omics pan-cancer analysis Bidimensional linked matrix factorization for pan-omics pan-cancer analysis . The Annals of Applied Statistics 16 1 193
work page 2022
-
[15]
biocmanagerpackage APACrefauthors Morgan, M. \ Ramos, M. APACrefauthors \ 2024 . BiocManager: Access the Bioconductor Project Package Repository BiocManager: Access the Bioconductor Project Package Repository \ [ ]. APACrefURL https://CRAN.R-project.org/package=BiocManager APACrefURL R package version 1.30.25
work page 2024
- [16]
-
[17]
park2020integrative APACrefauthors Park, J Y. \ Lock, E F. APACrefauthors \ 2020 . Integrative factorization of bidimensionally linked matrices Integrative factorization of bidimensionally linked matrices . Biometrics 76 1 61--74
work page 2020
-
[18]
schott2007test APACrefauthors Schott, J R. APACrefauthors \ 2007 . A test for the equality of covariance matrices when the dimension is large relative to the sample sizes A test for the equality of covariance matrices when the dimension is large relative to the sample sizes . Computational Statistics & Data Analysis 51 12 6535--6542
work page 2007
-
[19]
simes1986improved APACrefauthors Simes, R J. APACrefauthors \ 1986 . An improved Bonferroni procedure for multiple tests of significance An improved bonferroni procedure for multiple tests of significance . Biometrika 73 3 751--754
work page 1986
-
[20]
srivastava2010testing APACrefauthors Srivastava, M S. \ Yanagihara, H. APACrefauthors \ 2010 . Testing the equality of several covariance matrices with fewer observations than the dimension Testing the equality of several covariance matrices with fewer observations than the dimension . Journal of Multivariate Analysis 101 6 1319--1329
work page 2010
-
[21]
suehnholz2024quantifying APACrefauthors Suehnholz, S P. , Nissan, M H. , Zhang, H. , Kundra, R. , Nandakumar, S. , Lu, C. others APACrefauthors \ 2024 . Quantifying the expanding landscape of clinical actionability for patients with cancer Quantifying the expanding landscape of clinical actionability for patients with cancer . Cancer Discovery 14 1 49--65
work page 2024
-
[22]
tukey1953problem APACrefauthors Tukey, J W. APACrefauthors \ 1953 . The problem of multiple comparisons. The problem of multiple comparisons. Unpublished manuscript. In The Collected Works of John W. Tukey VIII. Multiple Comparisons: 1948-1983, 1--300. Chapman and Hall, New York
work page 1953
-
[23]
wainwright2019high APACrefauthors Wainwright, M J. APACrefauthors \ 2019 . High-dimensional statistics: A non-asymptotic viewpoint High-dimensional statistics: A non-asymptotic viewpoint \ ( 48). Cambridge university press
work page 2019
- [24]
-
[25]
zhang2024san APACrefauthors Zhang, R. , Chen, L. , Oliver, L D. , Voineskos, A N. \ Park, J Y. APACrefauthors \ 2024 . SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites . ...
work page 2024
-
[26]
10.1162/imag_a_00011 APACrefauthors Zhang, R. , Oliver, L D. , Voineskos, A N. \ Park, J Y. APACrefauthors \ 2023 08 . RELIEF: A structured multivariate approach for removal of latent inter-scanner effects RELIEF: A structured multivariate approach for removal of latent inter-scanner effects . Imaging Neuroscience 1 1-16
-
[27]
zhu2017testing APACrefauthors Zhu, L. , Lei, J. , Devlin, B. \ Roeder, K. APACrefauthors \ 2017 . Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes . The Annals of Applied Statistics 11 3 1810
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.