A Bayes-Motivated Quadratic-Form Test for High-Dimensional Mean Testing

Daojiang He; Jing Zhou; Suren Xu

arxiv: 2512.10537 · v2 · submitted 2025-12-11 · 📊 stat.ME · stat.CO

A Bayes-Motivated Quadratic-Form Test for High-Dimensional Mean Testing

Daojiang He , Suren Xu , Jing Zhou This is my paper

Pith reviewed 2026-05-16 23:30 UTC · model grok-4.3

classification 📊 stat.ME stat.CO

keywords high-dimensional mean testingBayes factorquadratic form statistictwo-sample testasymptotic normalityheterogeneous variancesrobustness to misspecification

0 comments

The pith

A Bayes factor quadratic-form test detects mean differences in high dimensions when dimension grows linearly with sample size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-sample test for equality of high-dimensional mean vectors that starts from the Bayes factor computed with non-informative priors. The construction is intended for the regime in which the number of dimensions p and the sample size n satisfy p/n approaching a positive constant. Asymptotic normality of the resulting statistic is derived, together with the limiting power under local alternatives. Simulations indicate that the test competes with existing procedures when marginal variances differ across features and when sample sizes are modest, while preserving type I error control for both sparse and dense signals and remaining stable under distributional misspecification.

Core claim

We propose a two-sample mean test based on the Bayes factor with non-informative priors, specifically designed for scenarios where the dimension p grows with the sample size n with a linear rate p/n to a constant in (0, infinity). We establish the asymptotic normality of the test statistic and the asymptotic power. Through extensive simulations, we demonstrate that the proposed test performs competitively against several existing methods, particularly when the marginal variances of the individual features are heterogeneous and when the sample size is small. Furthermore, our test remains robust under distribution misspecification and maintains a well-controlled type I error rate even in small

What carries the argument

The quadratic-form test statistic obtained as the Bayes factor under flat priors on the mean difference vector.

Load-bearing premise

The ratio of dimension to sample size converges to a fixed positive finite constant.

What would settle it

A simulation with increasing n and p held at fixed ratio in which the properly standardized test statistic fails to approach a standard normal distribution under the null would falsify the asymptotic normality claim.

read the original abstract

We propose a two-sample mean test based on the Bayes factor with non-informative priors, specifically designed for scenarios where the dimension $p$ grows with the sample size $n$ with a linear rate $p/n \to c_1 \in (0, \infty)$. We establish the asymptotic normality of the test statistic and the asymptotic power. Through extensive simulations, we demonstrate that the proposed test performs competitively against several existing methods, particularly when the marginal variances of the individual features are heterogeneous and when the sample size is small. Furthermore, our test remains robust under distribution misspecification. The proposed method not only effectively detects both sparse and non-sparse differences in mean vectors but also maintains a well-controlled type I error rate, even in small-sample scenarios. We also demonstrate the performance of our proposed test using the small round blue cell tumors (SRBCT) dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Bayes-factor quadratic test for high-dimensional two-sample means that works well in simulations for heterogeneous variances and small n, but the small-n robustness rests on empirics since the theory is purely asymptotic.

read the letter

The main takeaway is that this paper develops a quadratic-form test statistic for two-sample high-dimensional mean testing, derived from a Bayes factor with non-informative priors, and establishes its asymptotic normality and power when p/n converges to a positive constant. The approach is new in how it motivates the quadratic form through the Bayes factor. Most high-dimensional mean tests use frequentist constructions like Hotelling's T-squared variants or max-type statistics, so this gives a different angle. The derivations appear to follow standard high-dimensional techniques for the asymptotics, and the simulations indicate competitive performance against several existing methods. It stands out particularly in cases with heterogeneous marginal variances and smaller sample sizes, while also showing robustness to distribution misspecification. The type I error stays controlled, and it detects both sparse and dense mean differences. The real-data application to the SRBCT dataset adds some practical flavor. One soft spot is the gap between the asymptotic theory and the small-n claims. All the formal results assume p/n approaches a constant in (0, infinity), with no Berry-Esseen type bounds or exact finite-sample distributions provided. The good performance for small n therefore depends entirely on the simulation studies. If those studies include a broad range of small n values and realistic heterogeneity patterns, the evidence is reasonable, but it leaves open whether the advantages are general or tied to the chosen simulation setups. The abstract mentions extensive simulations, but without seeing the exact designs, error bar reporting, or data exclusion rules, it's tough to assess how convincing they are. This work targets statisticians and data analysts dealing with high-dimensional data where sample sizes are limited and variances vary across features, such as in genomics or neuroimaging. Readers interested in mean vector testing procedures would find value in comparing this Bayes-motivated option to more traditional ones. The paper shows honest engagement with the literature and provides both theoretical and empirical support. It deserves a serious referee to verify the derivations and dig into the simulation details. I would recommend sending it for peer review rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-sample mean test derived from the Bayes factor with non-informative priors, specifically for the high-dimensional regime where p/n → c1 ∈ (0, ∞). It establishes asymptotic normality of the resulting quadratic-form statistic under the null and derives the asymptotic power. Simulations indicate competitive performance against existing methods (especially under heterogeneous marginal variances and small n), robustness to distributional misspecification, type-I error control, and detection of both sparse and dense mean differences; the method is also illustrated on the SRBCT dataset.

Significance. If the asymptotic derivations hold, the work supplies a theoretically justified Bayes-motivated quadratic statistic whose limiting behavior is explicit under linear dimension growth. The simulation evidence for small-n and heterogeneous-variance regimes, if reproducible, would address a practical gap where many existing high-dimensional tests degrade.

major comments (2)

[Abstract] Abstract and simulation results: the claims of 'well-controlled type I error rate, even in small-sample scenarios' and superior performance for small n rest entirely on Monte Carlo experiments, yet all theoretical guarantees (asymptotic normality and power) are derived exclusively under p/n → c1 ∈ (0, ∞). No Berry–Esseen bounds, Edgeworth expansions, or finite-sample error analysis are supplied, so the small-n advantages may be artifacts of the chosen simulation designs rather than general properties of the procedure.
[Theory] Asymptotic theory section: the derivation of the limiting null distribution of the Bayes-factor quadratic form must explicitly accommodate heterogeneous variances (the setting highlighted as advantageous in simulations). If the variance matrix is assumed diagonal or the normalization absorbs heterogeneity only under additional conditions, the stated asymptotic normality may not hold uniformly over the heterogeneous case emphasized in the abstract.

minor comments (2)

[Simulations] Simulations: report the number of Monte Carlo replications, standard errors or confidence intervals for empirical type-I error and power, and any sensitivity checks to the choice of simulation parameters (e.g., variance heterogeneity levels).
[Data analysis] Data analysis: clarify the preprocessing steps, feature selection, and any exclusion rules applied to the SRBCT dataset before testing.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating revisions where appropriate. Our responses focus on clarifying the scope of the theoretical results and simulation evidence without overstating finite-sample guarantees.

read point-by-point responses

Referee: [Abstract] Abstract and simulation results: the claims of 'well-controlled type I error rate, even in small-sample scenarios' and superior performance for small n rest entirely on Monte Carlo experiments, yet all theoretical guarantees (asymptotic normality and power) are derived exclusively under p/n → c1 ∈ (0, ∞). No Berry–Esseen bounds, Edgeworth expansions, or finite-sample error analysis are supplied, so the small-n advantages may be artifacts of the chosen simulation designs rather than general properties of the procedure.

Authors: We agree that the small-sample claims rely on simulation evidence rather than finite-sample theory. The asymptotic results are derived under p/n → c1, and we do not claim uniform finite-sample guarantees. The simulations (covering n as small as 20–50 with p up to several hundred) show consistent type-I error control and competitive power, particularly under heterogeneity. To address the concern, we will revise the abstract to state that small-n performance is observed in simulations and add a brief discussion of simulation design robustness in the main text. revision: partial
Referee: [Theory] Asymptotic theory section: the derivation of the limiting null distribution of the Bayes-factor quadratic form must explicitly accommodate heterogeneous variances (the setting highlighted as advantageous in simulations). If the variance matrix is assumed diagonal or the normalization absorbs heterogeneity only under additional conditions, the stated asymptotic normality may not hold uniformly over the heterogeneous case emphasized in the abstract.

Authors: The derivation in Section 3 assumes a general positive-definite covariance matrix Σ with eigenvalues bounded away from 0 and ∞ (allowing heterogeneity across features). The quadratic-form statistic is normalized by consistent estimators of the diagonal elements of Σ, and the central limit theorem is applied to the resulting standardized sum under the linear growth regime. This covers the heterogeneous case without requiring diagonality. We will add an explicit remark in the theory section stating the eigenvalue bounds and confirming that the limiting normality holds uniformly under these conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; asymptotics and simulations remain independent

full rationale

The derivation establishes asymptotic normality and power of the Bayes-factor quadratic statistic under the standard high-dimensional regime p/n → c1 ∈ (0, ∞) using conventional central-limit techniques for quadratic forms. Non-informative priors are invoked in the standard manner without being fitted to the target statistic. Simulation results for small-n behavior, heterogeneous variances, and misspecification robustness are presented separately and do not feed back into the asymptotic claims. No self-citations, fitted-input renamings, or self-definitional steps appear in the load-bearing chain. The paper is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the linear high-dimensional regime and standard non-informative prior choices for the Bayes factor; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption p/n → c1 ∈ (0, ∞)
Invoked to establish asymptotic normality and power of the test statistic.
standard math Non-informative priors for the Bayes factor
Used to derive the quadratic-form test statistic.

pith-pipeline@v0.9.0 · 5449 in / 1202 out tokens · 63296 ms · 2026-05-16T23:30:38.405249+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Hotelling, H.: The generalization of student’s ratio. Annals of Math ematical Statistics 2, 360–378 (1931) 23 Alternative 1 Alternative 2 p0 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 BF 0.6012 0.6448 0.5976 0.5776 0.5652 0.6966 0.7046 0.6794 0.6876 0.7292 PB 0.4290 0.4094 0.3894 0.3994 0.3680 0.5102 0.5126 0.5192 0.5020 0.5064 SD 0.5882 0.6030 0.6458 0.5358...

work page 1931
[2]

Statistica Sinica 6, 311–329 (1996)

Bai, Z., Saranadasa, H.: Eﬀect of high dimension: by an example of a two sample problem. Statistica Sinica 6, 311–329 (1996)

work page 1996
[3]

The Annals of Statistics 38(2), 808–835 (2010)

Chen, S.X., Qin, Y.L.: A two-sample test for high-dimensional data w ith applications to gene-set testing. The Annals of Statistics 38(2), 808–835 (2010)

work page 2010
[4]

Journal of Multivariate Analysis 99(3), 386–402 (2008)

Srivastava, M.S., Du, M.: A test for the mean vector with fewer ob servations than the dimension. Journal of Multivariate Analysis 99(3), 386–402 (2008)

work page 2008
[5]

Yang, S., Zheng, S., Li, R.: A new test for high-dimensional two-sa mple mean problems with consideration of correlation structure. The Annals o f Statistics 24 Size Power n1 = n2 20 30 40 50 60 20 30 40 50 60 BF 0.0502 0.0522 0.0518 0.0554 0.0502 0.1964 0.2910 0.3994 0.5354 0.6960 PB 0.0494 0.0564 0.0516 0.0524 0.0516 0.1410 0.2238 0.3148 0.4386 0.5866 S...

work page 1964
[6]

Journal of the Royal Statistical Society Ser ies B: Statistical Methodology 76(2), 349–372 (2014)

Cai, T.T., Liu, W.D., Xia, Y.: Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society Ser ies B: Statistical Methodology 76(2), 349–372 (2014)

work page 2014
[7]

The Annals of Statistics 48(3), 1304–1328 (2020)

Xue, K., Yao, F.: Distribution and correlation-free two-sample te st of high- dimensional means. The Annals of Statistics 48(3), 1304–1328 (2020)

work page 2020
[8]

Advances in Neural Informat ion Processing Systems 24, 1206–1214 (2011)

Lopes, M., Jacob, L., Wainwright, M.J.: A more powerful two-samp le test in high dimensions using random projection. Advances in Neural Informat ion Processing Systems 24, 1206–1214 (2011)

work page 2011
[9]

Journal of Computational a nd Graphical Statistics 25(3), 954–970 (2016)

Srivastava, R., Li, P., Ruppert, D.: Raptt: An exact two-sample t est in high dimensions using random projections. Journal of Computational a nd Graphical Statistics 25(3), 954–970 (2016)

work page 2016
[10]

Journal of the American Statistical Association 110(512), 1500–1514 (2015)

Guhaniyogi, R., Dunson, D.B.: Bayesian compressed regression. Journal of the American Statistical Association 110(512), 1500–1514 (2015)

work page 2015
[11]

Journal of Multivariate Analysis 188, 104813 (2022)

Huang, Y., Li, C., Li, R., Yang, S.: An overview of tests on high-dime nsional means. Journal of Multivariate Analysis 188, 104813 (2022)

work page 2022
[12]

Journal of the American Sta tistical Association 113(524), 1733–1741 (2018)

Zoh, R.S., Sarkar, A., Carroll, R.J., Mallick, B.K.: A powerful bayesia n test for equality of means in high dimensions. Journal of the American Sta tistical Association 113(524), 1733–1741 (2018)

work page 2018
[13]

Computational Statistics & Data Analysis 74, 26–38 (2014)

Thulin, M.: A high-dimensional two-sample test for the mean using random subspaces. Computational Statistics & Data Analysis 74, 26–38 (2014)

work page 2014
[14]

Computational Statis tics 39(3), 1301–1320 (2024)

Chen, F., Hai, Q., Wang, M.: Bayesian hypothesis testing for equa lity of high- dimensional means using cluster subspaces. Computational Statis tics 39(3), 1301–1320 (2024)

work page 2024
[15]

Mathematics 10(10), 1741 (2022) 25

Jiang, Y.Y., Xu, X.Z.: A two-sample test of high dimensional means b ased on posterior bayes factor. Mathematics 10(10), 1741 (2022) 25

work page 2022
[16]

Journal of the Royal Sta tistical Society Series B: Statistical Methodology 53(1), 111–128 (1991)

Aitkin, M.: Posterior bayes factors. Journal of the Royal Sta tistical Society Series B: Statistical Methodology 53(1), 111–128 (1991)

work page 1991
[17]

Bayesian Analysis 19(3), 869–893 (2024)

Lee, K., You, K., Lin, L.: Bayesian optimal two-sample tests for h igh-dimensional gaussian populations. Bayesian Analysis 19(3), 869–893 (2024)

work page 2024
[18]

Journal of Multivariate Analysis 114, 349–358 (2013)

Srivastava, M.S., Katayama, S., Kano, Y.: A two sample test in high dimensional data. Journal of Multivariate Analysis 114, 349–358 (2013)

work page 2013
[19]

Graduate Texts in Mathemat ics, vol

Shiryaev, A.: Probability, 2nd edn. Graduate Texts in Mathemat ics, vol. 95. Springer, New York (2016)

work page 2016
[20]

Wiley, New York (1958)

Anderson, T.W.: An Introduction to Multivariate Statistical Ana lysis. Wiley, New York (1958)

work page 1958
[21]

Journal of Multivariate Analysis 100(3), 518–532 (2009) 26

Srivastava, M.S.: A test for the mean vector with fewer observ ations than the dimension under non-normality. Journal of Multivariate Analysis 100(3), 518–532 (2009) 26

work page 2009

[1] [1]

Hotelling, H.: The generalization of student’s ratio. Annals of Math ematical Statistics 2, 360–378 (1931) 23 Alternative 1 Alternative 2 p0 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 BF 0.6012 0.6448 0.5976 0.5776 0.5652 0.6966 0.7046 0.6794 0.6876 0.7292 PB 0.4290 0.4094 0.3894 0.3994 0.3680 0.5102 0.5126 0.5192 0.5020 0.5064 SD 0.5882 0.6030 0.6458 0.5358...

work page 1931

[2] [2]

Statistica Sinica 6, 311–329 (1996)

Bai, Z., Saranadasa, H.: Eﬀect of high dimension: by an example of a two sample problem. Statistica Sinica 6, 311–329 (1996)

work page 1996

[3] [3]

The Annals of Statistics 38(2), 808–835 (2010)

Chen, S.X., Qin, Y.L.: A two-sample test for high-dimensional data w ith applications to gene-set testing. The Annals of Statistics 38(2), 808–835 (2010)

work page 2010

[4] [4]

Journal of Multivariate Analysis 99(3), 386–402 (2008)

Srivastava, M.S., Du, M.: A test for the mean vector with fewer ob servations than the dimension. Journal of Multivariate Analysis 99(3), 386–402 (2008)

work page 2008

[5] [5]

Yang, S., Zheng, S., Li, R.: A new test for high-dimensional two-sa mple mean problems with consideration of correlation structure. The Annals o f Statistics 24 Size Power n1 = n2 20 30 40 50 60 20 30 40 50 60 BF 0.0502 0.0522 0.0518 0.0554 0.0502 0.1964 0.2910 0.3994 0.5354 0.6960 PB 0.0494 0.0564 0.0516 0.0524 0.0516 0.1410 0.2238 0.3148 0.4386 0.5866 S...

work page 1964

[6] [6]

Journal of the Royal Statistical Society Ser ies B: Statistical Methodology 76(2), 349–372 (2014)

Cai, T.T., Liu, W.D., Xia, Y.: Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society Ser ies B: Statistical Methodology 76(2), 349–372 (2014)

work page 2014

[7] [7]

The Annals of Statistics 48(3), 1304–1328 (2020)

Xue, K., Yao, F.: Distribution and correlation-free two-sample te st of high- dimensional means. The Annals of Statistics 48(3), 1304–1328 (2020)

work page 2020

[8] [8]

Advances in Neural Informat ion Processing Systems 24, 1206–1214 (2011)

Lopes, M., Jacob, L., Wainwright, M.J.: A more powerful two-samp le test in high dimensions using random projection. Advances in Neural Informat ion Processing Systems 24, 1206–1214 (2011)

work page 2011

[9] [9]

Journal of Computational a nd Graphical Statistics 25(3), 954–970 (2016)

Srivastava, R., Li, P., Ruppert, D.: Raptt: An exact two-sample t est in high dimensions using random projections. Journal of Computational a nd Graphical Statistics 25(3), 954–970 (2016)

work page 2016

[10] [10]

Journal of the American Statistical Association 110(512), 1500–1514 (2015)

Guhaniyogi, R., Dunson, D.B.: Bayesian compressed regression. Journal of the American Statistical Association 110(512), 1500–1514 (2015)

work page 2015

[11] [11]

Journal of Multivariate Analysis 188, 104813 (2022)

Huang, Y., Li, C., Li, R., Yang, S.: An overview of tests on high-dime nsional means. Journal of Multivariate Analysis 188, 104813 (2022)

work page 2022

[12] [12]

Journal of the American Sta tistical Association 113(524), 1733–1741 (2018)

Zoh, R.S., Sarkar, A., Carroll, R.J., Mallick, B.K.: A powerful bayesia n test for equality of means in high dimensions. Journal of the American Sta tistical Association 113(524), 1733–1741 (2018)

work page 2018

[13] [13]

Computational Statistics & Data Analysis 74, 26–38 (2014)

Thulin, M.: A high-dimensional two-sample test for the mean using random subspaces. Computational Statistics & Data Analysis 74, 26–38 (2014)

work page 2014

[14] [14]

Computational Statis tics 39(3), 1301–1320 (2024)

Chen, F., Hai, Q., Wang, M.: Bayesian hypothesis testing for equa lity of high- dimensional means using cluster subspaces. Computational Statis tics 39(3), 1301–1320 (2024)

work page 2024

[15] [15]

Mathematics 10(10), 1741 (2022) 25

Jiang, Y.Y., Xu, X.Z.: A two-sample test of high dimensional means b ased on posterior bayes factor. Mathematics 10(10), 1741 (2022) 25

work page 2022

[16] [16]

Journal of the Royal Sta tistical Society Series B: Statistical Methodology 53(1), 111–128 (1991)

Aitkin, M.: Posterior bayes factors. Journal of the Royal Sta tistical Society Series B: Statistical Methodology 53(1), 111–128 (1991)

work page 1991

[17] [17]

Bayesian Analysis 19(3), 869–893 (2024)

Lee, K., You, K., Lin, L.: Bayesian optimal two-sample tests for h igh-dimensional gaussian populations. Bayesian Analysis 19(3), 869–893 (2024)

work page 2024

[18] [18]

Journal of Multivariate Analysis 114, 349–358 (2013)

Srivastava, M.S., Katayama, S., Kano, Y.: A two sample test in high dimensional data. Journal of Multivariate Analysis 114, 349–358 (2013)

work page 2013

[19] [19]

Graduate Texts in Mathemat ics, vol

Shiryaev, A.: Probability, 2nd edn. Graduate Texts in Mathemat ics, vol. 95. Springer, New York (2016)

work page 2016

[20] [20]

Wiley, New York (1958)

Anderson, T.W.: An Introduction to Multivariate Statistical Ana lysis. Wiley, New York (1958)

work page 1958

[21] [21]

Journal of Multivariate Analysis 100(3), 518–532 (2009) 26

Srivastava, M.S.: A test for the mean vector with fewer observ ations than the dimension under non-normality. Journal of Multivariate Analysis 100(3), 518–532 (2009) 26

work page 2009