pith. sign in

arxiv: 2309.10284 · v4 · submitted 2023-09-19 · 📊 stat.ME · math.ST· stat.AP· stat.TH

Rank-adaptive covariance testing with applications to genomics and neuroimaging

Pith reviewed 2026-05-24 06:51 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.APstat.TH
keywords two-sample covariance testKy-Fan normrank-adaptive testingpermutation inferencehigh-dimensional datagenomicsneuroimagingdiffusion tensor imaging
0
0 comments X

The pith

A test based on the sum of the top singular values of the covariance difference detects low-rank group differences with higher power.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Rank-Adaptive Covariance Testing to compare covariance matrices between two groups when differences arise from low-rank structure changes rather than uniform spread across all entries. Standard covariance tests lose power in the high-dimensional settings typical of gene networks or brain imaging because the signal is concentrated in a few singular directions. RACT forms its statistic from the Ky-Fan(k) norm of the sample covariance difference and selects k to capture that structure. Permutation of group labels supplies an exact finite-sample Type I error guarantee without asymptotic approximations. The method is checked on simulated low-rank alternatives and on real gene-expression networks from lung-cancer subtypes plus diffusion-tensor imaging scans from different scanners.

Core claim

We propose Rank-Adaptive Covariance Testing (RACT) that employs the Ky-Fan(k) norm of the difference in sample covariances as the test statistic. By adapting k to the data, RACT leverages low-rank differences to achieve higher power while maintaining exact Type I error control through permutation.

What carries the argument

The Ky-Fan(k) norm (sum of the k largest singular values) applied to the matrix difference between two sample covariance estimates, used as a test statistic that is sensitive to low-rank perturbations.

If this is right

  • RACT identifies gene-network differences between lung-cancer subtypes more effectively than non-adaptive tests.
  • The procedure detects scanner-induced covariance heterogeneity in diffusion-tensor imaging data.
  • Simulation studies confirm elevated power precisely when the alternative covariance difference is low-rank.
  • The permutation guarantee removes the need for large-sample approximations in high-dimensional regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same norm construction could be applied to tests for equality of correlation matrices or precision matrices.
  • Data-driven selection of k might be combined with other matrix norms to gain robustness against model misspecification.
  • Similar rank-adaptive ideas could extend two-sample testing to multiple groups or to time-series covariance changes.

Load-bearing premise

Covariance differences between groups are driven primarily by changes in low-rank structures that are only weakly dispersed across many dimensions.

What would settle it

In data simulated from a low-rank covariance alternative, the Ky-Fan-based test shows no power gain over a Frobenius-norm test, or the permutation procedure yields rejection rates above the nominal level under the null.

Figures

Figures reproduced from arXiv: 2309.10284 by David Veitch, Jun Young Park, Yinqiu He.

Figure 1
Figure 1. Figure 1: Visualizations of the covariance matrices of both groups across the four simulation sce [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical power curves when using select single norms, as well as when using RACT [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical power curves for RACT and permutation-based versions of competing methods [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Difference in covariance matrices between groups using all samples, and their low-rank [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: First row: empirical power of individual Ky-Fan( [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical distribution of standardized T1 under H0 for various covariance structures in the n = 1000, p = 250 setting. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Empirical distribution of standardized T5 under H0 for various covariance structures in the n = 1000, p = 250 setting. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Empirical distribution of standardized T10 under H0 for various covariance structures in the n = 1000, p = 250 setting. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Empirical distribution of standardized T50 under H0 for various covariance structures in the n = 1000, p = 250 setting. C.2 Sensitivity analysis of cutoff used to calculate K A key hyperparameter in RACT is the choice of K, which sets the maximum Ky-Fan(k) norm which enters into TRACT. We select K to be the smallest K ≤ min(n, p) such that the variation of Σ explained by its top b K singular values exceeds… view at source ↗
Figure 10
Figure 10. Figure 10: First row: empirical power of tests statistics based on individual superdiagionals relative [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗
read the original abstract

In biomedical studies, testing for differences in covariance offers scientific insights beyond mean differences, especially when differences are driven by complex joint behavior between features. However, when differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures within the data, as is often the case in genomics and neuroimaging, existing two-sample covariance testing methods may suffer from power loss. The Ky-Fan(k) norm, defined by the sum of the top Ky-Fan(k) singular values, is a simple and intuitive matrix norm able to capture signals caused by differences in low-rank structures between matrices, but its statistical properties in hypothesis testing have not been studied well. In this paper, we investigate the behavior of the Ky-Fan(k) norm in two-sample covariance testing. Ultimately, we propose a novel methodology, Rank-Adaptive Covariance Testing (RACT), which is able to leverage differences in low-rank structures found in the covariance matrices of two groups in order to maximize power. RACT uses permutation for statistical inference, ensuring an exact Type I error control. We validate RACT in simulation studies and evaluate its performance when testing for differences in gene expression networks between two types of lung cancer, as well as testing for covariance heterogeneity in diffusion tensor imaging (DTI) data taken on two different scanner types.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Rank-Adaptive Covariance Testing (RACT), a two-sample procedure for covariance matrices that replaces the usual Frobenius or max-norm with the Ky-Fan(k) norm of the difference of sample covariances. The rank k is chosen adaptively to concentrate power on low-rank signal differences, and inference is performed by label permutation on the pooled sample to guarantee exact finite-sample Type I error control. The method is motivated by genomics and neuroimaging settings and is illustrated on lung-cancer gene-expression networks and DTI scanner-comparison data, with supporting simulation results.

Significance. If the power advantage and exact control hold, RACT supplies a practical, non-parametric tool that exploits the low-rank structure commonly present in high-dimensional biomedical covariance matrices. The permutation guarantee is a clear methodological strength, and the two real-data applications demonstrate relevance to the target domains.

major comments (2)
  1. [§3] §3 (Method): the precise rule for selecting the data-dependent rank k is not stated; without an explicit algorithm it is impossible to verify that the permutation distribution remains exact once k is allowed to depend on the observed matrices.
  2. [§4] §4 (Simulations): power comparisons are reported against Frobenius and max-norm baselines, but the number of Monte Carlo replications, the precise low-rank signal construction, and standard-error information are omitted, preventing assessment of whether the reported gains are statistically reliable.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'exact Type I error control' should be qualified by noting that it holds conditionally on the chosen k, even when k is data-dependent.
  2. [§5] §5 (Applications): the preprocessing steps used to form the sample covariance matrices (centering, scaling, missing-value handling) are not described; these details are needed for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the helpful comments, which will improve the clarity and completeness of the manuscript. We address each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (Method): the precise rule for selecting the data-dependent rank k is not stated; without an explicit algorithm it is impossible to verify that the permutation distribution remains exact once k is allowed to depend on the observed matrices.

    Authors: We agree that an explicit statement of the adaptive rank-selection rule is required. In the revised manuscript we will add a precise algorithmic description of how k is chosen from the observed sample covariance matrices. Because the identical data-dependent rule is applied to every label-permuted replicate, the full test statistic (including the choice of k) remains a symmetric function of the pooled observations. Consequently the permutation distribution continues to be exact under the null of exchangeability, preserving finite-sample Type I error control. We will also insert a short remark making this symmetry explicit. revision: yes

  2. Referee: [§4] §4 (Simulations): power comparisons are reported against Frobenius and max-norm baselines, but the number of Monte Carlo replications, the precise low-rank signal construction, and standard-error information are omitted, preventing assessment of whether the reported gains are statistically reliable.

    Authors: We thank the referee for noting these omissions. In the revision we will report the exact number of Monte Carlo replications, give a detailed description of the low-rank signal construction used to generate the alternatives, and include standard-error estimates (or confidence intervals) for all reported power values so that the statistical reliability of the observed gains can be assessed directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation relies on the Ky-Fan(k) norm applied to differences in sample covariance matrices, with k chosen in a rank-adaptive manner and inference performed via label permutation on the pooled sample. Permutation testing yields an exact level-α test under exchangeability without requiring any fitted parameters or self-referential equations; the low-rank modeling premise is an external domain assumption rather than an internal reduction of the test statistic to its inputs. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results appear as load-bearing steps in the provided description. The central claims remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption that covariance differences in the target applications are low-rank; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Differences in joint behavior are weakly dispersed across many dimensions and arise from differences in low-rank structures
    Explicitly stated in the abstract as the motivating scenario for genomics and neuroimaging data.

pith-pipeline@v0.9.0 · 5770 in / 1377 out tokens · 25453 ms · 2026-05-24T06:51:49.566267+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    , Safer, H

    amar2013dissection APACrefauthors Amar, D. , Safer, H. \ Shamir, R. APACrefauthors \ 2013 . Dissection of regulatory networks that are altered in disease via differential co-expression Dissection of regulatory networks that are altered in disease via differential co-expression . PLoS Computational Biology 9 3 e1002955

  2. [2]

    , Liu, W

    cai2013two APACrefauthors Cai, T. , Liu, W. \ Xia, Y. APACrefauthors \ 2013 . Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings . Journal of the American Statistical Association 108 501 265--277

  3. [3]

    , Paul, D

    danaher2015covariance APACrefauthors Danaher, P. , Paul, D. \ Wang, P. APACrefauthors \ 2015 . Covariance-based analyses of biological pathways Covariance-based analyses of biological pathways . Biometrika 102 3 533--544

  4. [4]

    ding2023sampletestcovariancematrices APACrefauthors Ding, X. , Hu, Y. \ Wang, Z. APACrefauthors \ 2024 . Two Sample Test for Covariance Matrices in Ultra-High Dimension Two sample test for covariance matrices in ultra-high dimension . Journal of the American Statistical Association 0 0 1--12

  5. [5]

    \ Wang, Z

    ding2023global APACrefauthors Ding, X. \ Wang, Z. APACrefauthors \ 2025 . Global and local CLTs for linear spectral statistics of general sample covariance matrices when the dimension is much larger than the sample size with applications Global and local clts for linear spectral statistics of general sample covariance matrices when the dimension is much l...

  6. [6]

    , Liao, Y

    fan2016overview APACrefauthors Fan, J. , Liao, Y. \ Liu, H. APACrefauthors \ 2016 . An overview of the estimation of large covariance and precision matrices An overview of the estimation of large covariance and precision matrices . The Econometrics Journal 19 1 C1--C32

  7. [7]

    , Dickie, E

    ds003011:1.2.0 APACrefauthors Hawco, C. , Dickie, E. , Herman, G. , Turner, J. , Argyan, M. , Homan, P. Voineskos, A. APACrefauthors \ 2021 . Social Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. OpenNeuro. Social Processes Initiative in Neurobiology of the Schizophrenia(s) Traveling Human Phantoms. OpenNeuro

  8. [8]

    \ Chen, S X

    he2018high APACrefauthors He, J. \ Chen, S X. APACrefauthors \ 2018 . High-dimensional two-sample covariance matrix testing via super-diagonals High-dimensional two-sample covariance matrix testing via super-diagonals . Statistica Sinica 28 4 2671--2696

  9. [9]

    he2021asymptotically APACrefauthors He, Y. , Xu, G. , Wu, C. \ Pan, W. APACrefauthors \ 2021 . Asymptotically independent U-statistics in high-dimensional testing Asymptotically independent U-statistics in high-dimensional testing . Annals of statistics 49 1 154

  10. [10]

    , Chen, A A

    hu2023image APACrefauthors Hu, F. , Chen, A A. , Horng, H. , Bashyam, V. , Davatzikos, C. , Alexander-Bloch, A. Shinohara, R T. APACrefauthors \ 2023 . Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization Image harmonization: A review of statistical and deep le...

  11. [11]

    APACrefauthors \ 2020

    kuchibhotla2020exchangeability APACrefauthors Kuchibhotla, A K. APACrefauthors \ 2020 . Exchangeability, conformal prediction, and rank tests Exchangeability, conformal prediction, and rank tests . arXiv preprint arXiv:2005.06095

  12. [12]

    \ Romano, J P

    lehmann2021testing APACrefauthors Lehmann, E L. \ Romano, J P. APACrefauthors \ 2021 . Testing statistical hypotheses Testing statistical hypotheses \ ( 4 \ ). Springer

  13. [13]

    \ Chen, S X

    li2012two APACrefauthors Li, J. \ Chen, S X. APACrefauthors \ 2012 . Two sample tests for high-dimensional covariance matrices Two sample tests for high-dimensional covariance matrices . The Annals of Statistics 40 2 908 -- 940

  14. [14]

    , Park, J Y

    lock2022bidimensional APACrefauthors Lock, E F. , Park, J Y. \ Hoadley, K A. APACrefauthors \ 2022 . Bidimensional linked matrix factorization for pan-omics pan-cancer analysis Bidimensional linked matrix factorization for pan-omics pan-cancer analysis . The Annals of Applied Statistics 16 1 193

  15. [15]

    \ Ramos, M

    biocmanagerpackage APACrefauthors Morgan, M. \ Ramos, M. APACrefauthors \ 2024 . BiocManager: Access the Bioconductor Project Package Repository BiocManager: Access the Bioconductor Project Package Repository \ [ ]. APACrefURL https://CRAN.R-project.org/package=BiocManager APACrefURL R package version 1.30.25

  16. [16]

    , Kim, J

    pan2014powerful APACrefauthors Pan, W. , Kim, J. , Zhang, Y. , Shen, X. \ Wei, P. APACrefauthors \ 2014 . A powerful and adaptive association test for rare variants A powerful and adaptive association test for rare variants . Genetics 197 4 1081--1095

  17. [17]

    \ Lock, E F

    park2020integrative APACrefauthors Park, J Y. \ Lock, E F. APACrefauthors \ 2020 . Integrative factorization of bidimensionally linked matrices Integrative factorization of bidimensionally linked matrices . Biometrics 76 1 61--74

  18. [18]

    APACrefauthors \ 2007

    schott2007test APACrefauthors Schott, J R. APACrefauthors \ 2007 . A test for the equality of covariance matrices when the dimension is large relative to the sample sizes A test for the equality of covariance matrices when the dimension is large relative to the sample sizes . Computational Statistics & Data Analysis 51 12 6535--6542

  19. [19]

    APACrefauthors \ 1986

    simes1986improved APACrefauthors Simes, R J. APACrefauthors \ 1986 . An improved Bonferroni procedure for multiple tests of significance An improved bonferroni procedure for multiple tests of significance . Biometrika 73 3 751--754

  20. [20]

    \ Yanagihara, H

    srivastava2010testing APACrefauthors Srivastava, M S. \ Yanagihara, H. APACrefauthors \ 2010 . Testing the equality of several covariance matrices with fewer observations than the dimension Testing the equality of several covariance matrices with fewer observations than the dimension . Journal of Multivariate Analysis 101 6 1319--1329

  21. [21]

    , Nissan, M H

    suehnholz2024quantifying APACrefauthors Suehnholz, S P. , Nissan, M H. , Zhang, H. , Kundra, R. , Nandakumar, S. , Lu, C. others APACrefauthors \ 2024 . Quantifying the expanding landscape of clinical actionability for patients with cancer Quantifying the expanding landscape of clinical actionability for patients with cancer . Cancer Discovery 14 1 49--65

  22. [22]

    APACrefauthors \ 1953

    tukey1953problem APACrefauthors Tukey, J W. APACrefauthors \ 1953 . The problem of multiple comparisons. The problem of multiple comparisons. Unpublished manuscript. In The Collected Works of John W. Tukey VIII. Multiple Comparisons: 1948-1983, 1--300. Chapman and Hall, New York

  23. [23]

    APACrefauthors \ 2019

    wainwright2019high APACrefauthors Wainwright, M J. APACrefauthors \ 2019 . High-dimensional statistics: A non-asymptotic viewpoint High-dimensional statistics: A non-asymptotic viewpoint \ ( 48). Cambridge university press

  24. [24]

    , Wang, T

    yu2015useful APACrefauthors Yu, Y. , Wang, T. \ Samworth, R J. APACrefauthors \ 2015 . A useful variant of the Davis--Kahan theorem for statisticians A useful variant of the davis--kahan theorem for statisticians . Biometrika 102 2 315--323

  25. [25]

    , Chen, L

    zhang2024san APACrefauthors Zhang, R. , Chen, L. , Oliver, L D. , Voineskos, A N. \ Park, J Y. APACrefauthors \ 2024 . SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites . ...

  26. [26]

    , Oliver, L D

    10.1162/imag_a_00011 APACrefauthors Zhang, R. , Oliver, L D. , Voineskos, A N. \ Park, J Y. APACrefauthors \ 2023 08 . RELIEF: A structured multivariate approach for removal of latent inter-scanner effects RELIEF: A structured multivariate approach for removal of latent inter-scanner effects . Imaging Neuroscience 1 1-16

  27. [27]

    , Lei, J

    zhu2017testing APACrefauthors Zhu, L. , Lei, J. , Devlin, B. \ Roeder, K. APACrefauthors \ 2017 . Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes . The Annals of Applied Statistics 11 3 1810