pith. sign in

arxiv: 2606.25942 · v1 · pith:4BYFAZP6new · submitted 2026-06-24 · 📊 stat.ME

Elliptical Regularized Hotelling Testing for High Dimensional Data

Pith reviewed 2026-06-25 19:21 UTC · model grok-4.3

classification 📊 stat.ME
keywords high-dimensional testingspatial medianelliptical symmetryHotelling testCauchy combinationregularized statisticsheavy tailspervasive dependence
0
0 comments X

The pith

A regularized Hotelling test centered at the spatial median and combined across ridges via Cauchy rule controls error and gives explicit local power for high-dimensional location under elliptical symmetry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a test for a high-dimensional location parameter when observations follow elliptical distributions that may have heavy tails and strong cross-variable dependence. It uses the sample spatial median to center a spatial-sign covariance matrix, applies ridge regularization to form Hotelling-type statistics, and aggregates p-values from a fixed grid of ridge values with the Cauchy combination rule. Asymptotic normality under the null, consistent estimators for mean and variance, and an explicit local power formula are derived, along with justification that the combined p-value needs no estimated cross-ridge correlations. This construction is intended to remain valid where classical Hotelling procedures fail because of dimension, tails, or dependence.

Core claim

The ERHT-CC procedure, built from the spatial median and the spatial-sign covariance centered there, yields a statistic that is asymptotically standard normal under the null after suitable centering and scaling; its local power is given explicitly, and the Cauchy aggregation of fixed-ridge p-values admits an analytic combined p-value whose limiting distribution is also characterized.

What carries the argument

The sample spatial median together with the spatial-sign covariance matrix centered at that median, regularized at multiple ridge values and aggregated by the Cauchy combination rule.

If this is right

  • The test admits consistent estimators of its centering term and variance.
  • An explicit expression for local power as a function of signal strength is available.
  • The combined p-value from the finite ridge grid can be computed analytically without estimating correlations among the individual ridge p-values.
  • The procedure maintains its asymptotic properties under heavy tails and pervasive cross-sectional dependence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same spatial-median centering might be applied to other quadratic forms or to two-sample problems under elliptical symmetry.
  • The deterministic ridge grid could be replaced by an adaptive choice if a pilot estimate of the alternative becomes available.
  • Because the method avoids moment assumptions beyond elliptical symmetry, it may serve as a template for other high-dimensional procedures that currently require finite fourth moments.

Load-bearing premise

The observations follow an elliptically symmetric distribution, possibly with heavy tails.

What would settle it

Generate data from an elliptically symmetric distribution with heavy tails and pervasive dependence under the null of zero location and check whether the ERHT-CC statistic is close to standard normal after the paper's centering and scaling; the same check under a non-elliptical distribution would show departure if the assumption is necessary.

Figures

Figures reproduced from arXiv: 2606.25942 by Le Zhou, Long Feng, Xiaoyi Wang.

Figure 1
Figure 1. Figure 1: Empirical power under AR(1) dependence, Ωp,jk = 0.5 |j−k| , with (n, p) = (100, 200). 15 [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical power under compound-symmetry dependence, [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histogram of the paired probe-level Student-t statistics for the 54,675 probes in GSE19804. [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
read the original abstract

We consider one-sample testing of a high-dimensional location parameter under elliptically symmetric distributions with heavy tails and pervasive cross-sectional dependence. We propose an elliptical regularized Hotelling test with Cauchy combination (ERHT--CC), based on the sample spatial median and the spatial-sign covariance matrix centered at that median. We derive its null asymptotic normality, consistent estimators of the centering and variance, and an explicit local power function. Since the power-optimal ridge parameter depends on the unknown alternative, we aggregate fixed-ridge $p$-values over a deterministic grid using the Cauchy rule. We establish a finite-grid joint Gaussian limit, justify the analytic combined $p$-value without estimating cross-ridge correlations, and characterize its local power. Simulation studies and an empirical analysis demonstrate the favorable finite-sample performance of ERHT--CC under heavy tails and pervasive dependence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes the elliptical regularized Hotelling test with Cauchy combination (ERHT-CC) for one-sample high-dimensional location testing under elliptically symmetric distributions that allow heavy tails and pervasive cross-sectional dependence. The procedure is based on the sample spatial median and the spatial-sign covariance matrix centered at that median; the authors derive null asymptotic normality of the regularized statistics, consistent estimators of centering and variance terms, an explicit local power function, and a finite-grid joint Gaussian limit that justifies analytic Cauchy combination of p-values over a deterministic grid of ridge parameters without estimating cross-ridge correlations.

Significance. If the derivations hold, the work supplies a theoretically grounded robust procedure for high-dimensional mean testing that remains valid under heavy tails and dependence without requiring finite fourth moments. The explicit local power expression and the analytic Cauchy aggregation (avoiding correlation estimation) are concrete strengths that distinguish the contribution from purely simulation-based regularized Hotelling variants. These features could make the method useful in applications such as financial returns or genomic data where elliptical symmetry is a plausible modeling assumption.

minor comments (3)
  1. [Abstract] Abstract: the phrase "finite-grid joint Gaussian limit" is introduced without a short parenthetical gloss; adding one sentence would improve accessibility for readers outside the immediate literature on p-value combination.
  2. [Section 3] The construction of the deterministic ridge grid (spacing, range, number of points) is described only as "fixed" in the main text; moving the explicit sequence to the main body or adding a short remark on its insensitivity would strengthen reproducibility.
  3. [Simulation studies] Simulation section: while type-I error and power are reported, a compact table summarizing empirical sizes across dimension p, tail index, and dependence strength would make the finite-sample claims easier to compare with competing procedures.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and positive summary of our manuscript on the ERHT-CC procedure, as well as for highlighting its theoretical contributions under elliptical symmetry with heavy tails. We appreciate the recommendation of minor revision.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained under stated assumptions

full rationale

The paper explicitly assumes elliptically symmetric distributions and derives the null asymptotic normality, consistent estimators, joint Gaussian limit for the finite grid, and local power function from the spatial median and spatial-sign covariance properties under that model class. The ridge aggregation uses a deterministic external grid and analytic Cauchy combination without fitting parameters to the target data or estimating cross-ridge correlations from the same sample. No load-bearing step reduces by construction to a fitted input, self-citation chain, or renamed ansatz; all central claims rest on explicit derivations rather than circular re-use of the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption of elliptical symmetry and standard regularity conditions for high-dimensional asymptotics; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Observations are elliptically symmetric with possibly heavy tails and pervasive cross-sectional dependence
    Invoked to justify the spatial median, spatial-sign covariance, and the derived asymptotic normality and power function.

pith-pipeline@v0.9.1-grok · 5667 in / 1263 out tokens · 18419 ms · 2026-06-25T19:21:40.648104+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Zhidong Bai and Hewa Saranadasa

    doi: 10.1214/11-AOS910. Zhidong Bai and Hewa Saranadasa. Effect of high dimension: by an example of a two sample problem.Statistica Sinica, 6(2):311–329,

  2. [2]

    Anirvan Chakraborty and Probal Chaudhuri

    doi: 10.1111/rssb.12034. Anirvan Chakraborty and Probal Chaudhuri. Tests for high-dimensional data based on means, spatial signs and spatial ranks.The Annals of Statistics, 45(2):771–799,

  3. [3]

    doi: 10.1214/16-AOS1467. Lin S. Chen, Debashis Paul, Ross L. Prentice, and Pei Wang. A regularized Hotelling’sT 2 test for pathway analysis in proteomic studies.Journal of the American Statistical Association, 106(496): 1345–1360,

  4. [4]

    Song Xi Chen and Ying-Li Qin

    doi: 10.1198/jasa.2011.ap10599. Song Xi Chen and Ying-Li Qin. A two-sample test for high-dimensional data with applications to gene-set testing.The Annals of Statistics, 38(2):808–835,

  5. [5]

    Song Xi Chen, Jun Li, and Ping-Shou Zhong

    doi: 10.1214/09-AOS716. Song Xi Chen, Jun Li, and Ping-Shou Zhong. Two-sample and ANOVA tests for high dimensional means.The Annals of Statistics, 47(3):1443–1474,

  6. [6]

    91 Peter de Jong

    doi: 10.1214/18-AOS1720. 91 Peter de Jong. A central limit theorem for generalized quadratic forms.Probability Theory and Related Fields, 75(2):261–277,

  7. [7]

    Arthur P

    doi: 10.1007/BF00354037. Arthur P. Dempster. A high dimensional two sample significance test.The Annals of Mathematical Statistics, 29(4):995–1010,

  8. [8]

    Sequential design of experiments.The Annals of Mathematical Statistics, 30 (3):755–770, 1959

    doi: 10.1214/aoms/1177706437. David Donoho and Jiashun Jin. Higher criticism for detecting sparse heterogeneous mixtures.The Annals of Statistics, 32(3):962–994,

  9. [9]

    Long Feng and Fasheng Sun

    doi: 10.1214/009053604000000265. Long Feng and Fasheng Sun. A note on high-dimensional two-sample test.Statistics & Probability Letters, 105:29–36,

  10. [10]

    Long Feng and Fasheng Sun

    doi: 10.1016/j.spl.2015.05.017. Long Feng and Fasheng Sun. Spatial-sign based high-dimensional location test.Electronic Journal of Statistics, 10(2):2420–2434,

  11. [11]

    Long Feng, Changliang Zou, Zhaojun Wang, and Lixing Zhu

    doi: 10.1214/16-EJS1176. Long Feng, Changliang Zou, Zhaojun Wang, and Lixing Zhu. Two-sample Behrens–Fisher problem for high-dimensional data.Statistica Sinica, 25(4):1297–1312,

  12. [12]

    Long Feng, Changliang Zou, and Zhaojun Wang

    doi: 10.5705/ss.2014.048. Long Feng, Changliang Zou, and Zhaojun Wang. Multivariate-sign-based high-dimensional tests for the two-sample location problem.Journal of the American Statistical Association, 111(514): 721–735,

  13. [13]

    2017 , journal =

    doi: 10.1080/01621459.2015.1035380. Long Feng, Changliang Zou, Zhaojun Wang, and Lixing Zhu. CompositeT 2 test for high-dimensional data.Statistica Sinica, 27(3):1419–1436,

  14. [14]

    Long Feng, Xiaoxu Zhang, and Binghui Liu

    doi: 10.5705/ss.202015.0199. Long Feng, Xiaoxu Zhang, and Binghui Liu. A high-dimensional spatial rank test for two-sample location problems.Computational Statistics & Data Analysis, 144:106889,

  15. [15]

    An equivalence between generalized Maxwell model and fractional Zener model, Mechanics of Materials 100:148-153 (2016)

    doi: 10.1016/j. csda.2019.106889. Long Feng, Binghui Liu, and Yanyuan Ma. An inverse norm sign test of location parameter for high-dimensional data.Journal of Business & Economic Statistics, 39(3):807–815,

  16. [16]

    doi: 10.1080/07350015.2020.1736084. Karl B. Gregory, Raymond J. Carroll, Veerabhadran Baladandayuthapani, and Soumendra N. Lahiri. A two-sample test for equality of means in high dimension.Journal of the American Statistical Association, 110(510):837–849,

  17. [17]

    Journal of the American Statistical Association , year =

    doi: 10.1080/01621459.2014.934826. Walid Hachem, Philippe Loubaton, and Jamal Najim. Deterministic equivalents for certain func- tionals of large random matrices.The Annals of Applied Probability, 17(3):875–930,

  18. [18]

    Walid Hachem, Philippe Loubaton, Jamal Najim, and Pascal Vallet

    doi: 10.1214/105051606000000925. Walid Hachem, Philippe Loubaton, Jamal Najim, and Pascal Vallet. On bilinear forms based on the resolvent of large random matrices.Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 49(1):36–63,

  19. [19]

    Peter Hall and Jiashun Jin

    doi: 10.1214/11-AIHP450. Peter Hall and Jiashun Jin. Innovated higher criticism for detecting sparse signals in correlated noise.The Annals of Statistics, 38(3):1686–1732,

  20. [20]

    Yong He, Mingjuan Zhang, Xinsheng Zhang, and Wang Zhou

    doi: 10.1214/09-AOS764. Yong He, Mingjuan Zhang, Xinsheng Zhang, and Wang Zhou. High-dimensional two-sample mean vectors test and support recovery with factor adjustment.Computational Statistics & Data Analysis, 151:107004,

  21. [21]

    92 Xifen Huang, Binghui Liu, Qin Zhou, and Long Feng

    doi: 10.1016/j.csda.2020.107004. 92 Xifen Huang, Binghui Liu, Qin Zhou, and Long Feng. A high-dimensional inverse norm sign test for two-sample location problems.The Canadian Journal of Statistics, 51(4):1004–1033,

  22. [22]

    YuanHuang, ChangchengLi, RunzeLi, andSongshanYang

    doi: 10.1002/cjs.11731. YuanHuang, ChangchengLi, RunzeLi, andSongshanYang. Anoverviewoftestsonhigh-dimensional means.Journal of Multivariate Analysis, 188:104813,

  23. [23]

    Haoran Li, Alexander Aue, Debashis Paul, Jie Peng, and Pei Wang

    doi: 10.1016/j.jmva.2021.104813. Haoran Li, Alexander Aue, Debashis Paul, Jie Peng, and Pei Wang. An adaptable generalization of Hotelling’sT 2 test in high dimension.The Annals of Statistics, 48(3):1815–1847,

  24. [24]

    Jixuan Liu, Long Feng, Ping Zhao, and Zhaojun Wang

    doi: 10.1214/19-AOS1869. Jixuan Liu, Long Feng, Ping Zhao, and Zhaojun Wang. Spatial-sign based maxsum test for high dimensional location parameters.Statistica Sinica, 37(2),

  25. [25]

    forthcoming

    doi: 10.5705/ss.202024.0051. forthcoming. Yaowu Liu and Jun Xie. Cauchy combination test: A powerful test with analyticp-value calculation under arbitrary dependency structures.Journal of the American Statistical Association, 115(529): 393–402,

  26. [26]

    doi: 10.1080/01621459.2018.1554485. Miles E. Lopes, Laurent Jacob, and Martin J. Wainwright. A more powerful two-sample test in high dimensions using random projection. InAdvances in Neural Information Processing Systems 24, pages 1206–1214. Curran Associates, Inc.,

  27. [27]

    Jyrki Möttönen and Hannu Oja

    doi: 10.1016/j.jmva.2015.05.005. Jyrki Möttönen and Hannu Oja. Multivariate spatial sign and rank methods.Journal of Nonpara- metric Statistics, 5(2):201–213,

  28. [28]

    Ryan O’Donnell.Analysis of Boolean Functions

    doi: 10.1080/10485259508832643. Ryan O’Donnell.Analysis of Boolean Functions. Cambridge University Press, Cambridge,

  29. [29]

    Cambridge University Press, June 2014

    ISBN 978-1-107-03832-5. doi: 10.1017/CBO9781139814782. Hannu Oja.Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks, volume 199 ofLecture Notes in Statistics. Springer, New York,

  30. [30]

    Junyong Park and Deepak N

    doi: 10.3150/15-BEJ710. Junyong Park and Deepak N. Ayyala. A test for the mean vector in large dimension and small samples.Journal of Statistical Planning and Inference, 143(5):929–943,

  31. [31]

    2012.11.001

    doi: 10.1016/j.jspi. 2012.11.001. Muni S. Srivastava. A test for the mean vector with fewer observations than the dimension under non-normality.Journal of Multivariate Analysis, 100(3):518–532,

  32. [32]

    doi: 10.1016/j.jmva.2008. 06.006. Muni S. Srivastava and Meng Du. A test for the mean vector with fewer observations than the dimension.Journal of Multivariate Analysis, 99(3):386–402,

  33. [33]

    doi: 10.1016/j.jmva.2006.11.002. Muni S. Srivastava, Shota Katayama, and Yutaka Kano. A two sample test in high dimensional data.Journal of Multivariate Analysis, 114:349–358,

  34. [34]

    doi: 10.1016/j.jmva.2012.08.014. 93 G. W. Stewart and Ji-guang Sun.Matrix Perturbation Theory. Computer Science and Scientific Computing. Academic Press, Boston,

  35. [35]

    doi: 10.1016/j.csda.2013.12.003. Joel A. Tropp. User-friendly tail bounds for sums of random matrices.Foundations of Computational Mathematics, 12(4):389–434,

  36. [36]

    Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 ofCambridge Series in Statistical and Probabilistic Mathematics

    doi: 10.1007/s10208-011-9099-z. Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge,

  37. [37]

    Cambridge Series in Statistical and Probabilistic Mathematics, vol

    ISBN 978-1-108-41519-4. doi: 10.1017/9781108231596. Samuli Visuri, Visa Koivunen, and Hannu Oja. Sign and rank covariance matrices.Journal of Statistical Planning and Inference, 91(2):557–575,

  38. [38]

    Lan Wang, Bo Peng, and Runze Li

    doi: 10.1016/S0378-3758(00)00199-3. Lan Wang, Bo Peng, and Runze Li. A high-dimensional nonparametric multivariate test for mean vector.Journal of the American Statistical Association, 110(512):1658–1669,

  39. [39]

    Journal of the American Statistical Association , year =

    doi: 10.1080/01621459.2014.988215. Gongjun Xu, Lifeng Lin, Peng Wei, and Wei Pan. An adaptive two-sample test for high-dimensional means.Biometrika, 103(3):609–624,

  40. [40]

    Guowei Yan, Long Feng, and Xiaoxu Zhang

    doi: 10.1093/biomet/asw029. Guowei Yan, Long Feng, and Xiaoxu Zhang. High-dimensional Hettmansperger–Randles estimator and its applications. arXiv:2505.01669,

  41. [41]

    Ping Zhao and Long Feng

    doi: 10.1016/j.spl.2024.110226. Ping Zhao and Long Feng. Note on high dimensional spatial-sign test for one sample problem. arXiv:2601.08736,

  42. [42]

    doi: 10.1214/13-AOS1168. 94