pith. sign in

arxiv: 2605.21360 · v1 · pith:4WR7D6TBnew · submitted 2026-05-20 · 🧮 math.ST · cs.CC· stat.TH

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

Pith reviewed 2026-05-21 03:06 UTC · model grok-4.3

classification 🧮 math.ST cs.CCstat.TH
keywords sparse linear regressionlinear functional testingadaptive separation ratescomputational barriershigh-dimensional inferenceGaussian random designlow-degree polynomial bounds
0
0 comments X

The pith

A mixed test characterizes the adaptive separation rate for testing linear functionals in sparse regression up to logarithmic factors in the ultra-sparse regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies testing whether a linear functional of the sparse coefficient vector equals a fixed value in high-dimensional linear regression. The loading vector is arbitrary, the design covariance is unknown, and the sparsity level is unknown but bounded above by a known constant. A computationally efficient mixed test is built to control Type I error uniformly over the larger null class while delivering power against sparser alternatives. Information-theoretic lower bounds are derived that match the test's performance up to logs when the sparsity bound is ultra-small, for any loading profile. In denser sparsity regimes the same test is complemented by low-degree lower bounds and a reduction from sparse CCA to indicate that better statistical rates may require super-polynomial time.

Core claim

We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of ξ. In the ultra-sparse regime these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary ξ. In the moderately sparse regime these bounds match for several classes of loading vectors but may differ in general. In this regime we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. For flat sparse loadings we complement this with a polynomial-time reduction from sparse CCA.

What carries the argument

The mixed test that adapts to unknown sparsity bounded by ku while controlling uniform Type I error over the ku-sparse null and achieving power against k-sparse alternatives, with separation calibrated to the magnitude profile of the arbitrary loading ξ.

If this is right

  • In the ultra-sparse regime ku ≲ sqrt(n)/log p the mixed test attains the information-theoretic adaptive separation rate up to logarithmic factors for arbitrary ξ.
  • In the moderately sparse regime a low-degree lower bound matches the mixed-test upper bound up to logarithmic factors, indicating computational hardness for improvement.
  • For flat sparse loadings a polynomial-time reduction from sparse CCA shows that beating the rate is at least as hard as solving sparse CCA.
  • When the design covariance is known and diagonal the adaptive separation rate takes the same form as in the ultra-sparse regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In practice the mixed test offers a reliable polynomial-time procedure whose rate is provably tight for general loadings when sparsity is low.
  • Similar statistical-computational gaps may appear in related high-dimensional testing problems that involve arbitrary linear functionals and unknown nuisance parameters.
  • Numerical checks that vary the magnitude profile of ξ could directly test whether the lower bounds are sharp beyond the regimes already analyzed.

Load-bearing premise

The design matrix is Gaussian with unknown covariance, and Type I error must be controlled uniformly over all possible ku-sparse null hypotheses while power is assessed only against k-sparse alternatives.

What would settle it

A simulation or explicit construction in the ultra-sparse regime where some test detects a k-sparse signal whose magnitude lies below the information-theoretic lower bound calibrated to ξ would contradict the claimed separation rate.

Figures

Figures reproduced from arXiv: 2605.21360 by Dongming Huang, Jie Xie.

Figure 1
Figure 1. Figure 1: Phase diagram for the loading vector in ( [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
read the original abstract

We study the problem of testing $H_0: \xi^\top\beta=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $\xi$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $\xi$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $\xi$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript studies testing the linear functional hypothesis H_0: ξ^T β = t_0 in high-dimensional sparse linear regression with Gaussian random design and unknown covariance matrix. The loading vector ξ is arbitrary while the sparsity level k is unknown but upper-bounded by a known k_u. Type I error must be controlled uniformly over all k_u-sparse nulls, and power is assessed against k-sparse alternatives. The authors construct a computationally efficient mixed test that supplies an upper bound on the adaptive separation distance and derive an information-theoretic lower bound calibrated to the magnitude profile of ξ. These bounds match up to logarithmic factors in the ultra-sparse regime k_u ≲ √n / log p for arbitrary ξ; in the moderate-sparse regime they match for several loading classes, with a low-degree lower bound and a sparse-CCA reduction supplying evidence of computational hardness. Additional results examine the effect of partial knowledge of the design covariance under sparse signed-spiked and known-diagonal models.

Significance. If the central claims hold, the work advances the theory of adaptive inference for linear functionals in sparse regression by delivering sharp separation rates that incorporate general loadings, unknown covariance, and computational constraints. The explicit matching of upper and lower bounds in the ultra-sparse regime, the low-degree polynomial evidence for hardness, and the polynomial-time reduction from sparse CCA for flat loadings constitute concrete strengths. The uniform Type I control over the k_u-sparse null and the treatment of two covariance-information regimes further strengthen the contribution.

major comments (1)
  1. [§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.
minor comments (3)
  1. [§3.2] The definition of the mixed test statistic in §3.2 would benefit from an explicit display of the two component statistics and the data-splitting rule used to achieve uniform Type I control.
  2. [§2.2] Notation for the adaptive separation distance d_n(ξ, k_u) is introduced in the abstract and §1 but first appears with full dependence on the covariance in §2.2; a single consolidated display would improve readability.
  3. [§5.1] The sparse-CCA reduction in §5.1 is sketched at a high level; adding a short paragraph clarifying the parameter mapping between the regression and CCA instances would help readers verify the polynomial-time claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the contribution, and constructive feedback. We address the major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses
  1. Referee: [§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.

    Authors: We agree that the dependence of the polynomial degree on the magnitude profile of ξ should be stated more explicitly to strengthen the computational barrier claim. In the proof of the low-degree lower bound, the degree d is chosen as the smallest integer such that the sum of the d largest squared entries of the sorted magnitude profile of ξ exceeds a threshold determined by the separation rate; this choice ensures the bound matches the mixed-test upper bound up to logarithmic factors for the classes of ξ considered (e.g., those with power-law or flat decay). We will add a remark immediately following Theorem 4.1 that makes this dependence fully explicit, including the precise formula relating d to the ordered magnitudes |ξ|_{(1)} ≥ ⋯ ≥ |ξ|_{(p)}. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives adaptive separation rates for testing linear functionals in sparse regression using information-theoretic lower bounds calibrated directly to the magnitude profile of the arbitrary loading vector ξ, together with an explicit construction of a computationally efficient mixed test for the upper bound. These rates are obtained from first-principles arguments (including low-degree polynomial methods and a polynomial-time reduction from sparse CCA that is external to the present work) without any equation reducing the claimed rate to a fitted parameter or quantity defined from the same data or by self-referential definition. The Gaussian random-design model, unknown covariance, and uniform Type-I control over the k_u-sparse null are handled explicitly in the constructions and lower-bound arguments, rendering the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard high-dimensional linear model assumptions plus the specific sparsity bound k_u and Gaussian design; no new entities are postulated and no free parameters are fitted inside the rate derivations.

axioms (2)
  • domain assumption The design matrix has i.i.d. Gaussian rows with unknown covariance matrix.
    Stated in the model setup; used to derive both the mixed test and the information-theoretic lower bound.
  • domain assumption The regression vector β is exactly k-sparse for alternatives and at most k_u-sparse for the null, with k_u known.
    Central to the adaptive testing formulation and uniformity requirement.

pith-pipeline@v0.9.0 · 5859 in / 1425 out tokens · 26732 ms · 2026-05-21T03:06:24.964768+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 1 internal anchor

  1. [1]

    2, 477–505

    Arnab Auddy and Ming Yuan,Large-dimensional independent component analysis: Statis- tical optimality and computational tractability, The Annals of Statistics53(2025), no. 2, 477–505

  2. [2]

    6, 3448–3450

    Afonso S Bandeira, Edgar Dobriban, Dustin G Mixon, and William F Sawin,Certifying the restricted isometry property is hard, IEEE transactions on information theory59(2013), no. 6, 3448–3450

  3. [3]

    35, 2022, pp

    Afonso S Bandeira, Ahmed El Alaoui, Samuel Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik,The franz-parisi criterion and computational trade-offs in high dimensional statistics, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 33831– 33844. 23

  4. [4]

    1-3, 29–55

    Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Daniele Gardy, and Dominique Gouyou-Beauchamps,Generating functions for generating trees, Discrete mathematics246(2002), no. 1-3, 29–55

  5. [5]

    1046–1066

    QuentinBerthetandPhilippeRigollet,Complexity theoretic lower bounds for sparse principal component detection, Conference on learning theory, PMLR, 2013, pp. 1046–1066

  6. [6]

    Bickel, Ya’acov Ritov, and Alexandre B

    Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov,Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics37(2009), no. 4, 1705–1732

  7. [7]

    2, 615–639

    Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

  8. [8]

    Matthew Brennan, Guy Bresler, and Wasim Huleihel,Reducibility and computational lower bounds for problems with planted sparse structure, Conference On Learning Theory, PMLR, 2018, pp. 48–166

  9. [9]

    Guy Bresler, Sung Min Park, and Madalina Persu,Sparse pca from sparse linear regression, Advances in Neural Information Processing Systems, 2018

  10. [10]

    Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp

    Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K. Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp. 2577–2590

  11. [11]

    Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no

    Cristina Butucea and Yuri I. Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no. 5B, 2652 – 2688

  12. [12]

    Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no

    T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no. 2, 615–646

  13. [13]

    4, 1807–1836

    ,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics 46(2018), no. 4, 1807–1836

  14. [14]

    2, 391–419

    ,Semi-supervised inference for explained variance in high-dimensional linear regres- sion, Journal of the Royal Statistical Society: Series B (Statistical Methodology)82(2020), no. 2, 391–419

  15. [15]

    T. Tony Cai, Zijian Guo, and Rong Ma,Statistical inference for high-dimensional generalized linear models with binary outcomes, Journal of the American Statistical Association118 (2023), no. 542, 1319–1332

  16. [16]

    Tony Cai and Mark G

    T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over nonconvex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576

  17. [17]

    5, 2311 – 2343

    ,On adaptive estimation of linear functionals, The Annals of Statistics33(2005), no. 5, 2311 – 2343

  18. [18]

    Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no

    T. Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no. 3, 781–815

  19. [19]

    Tony Cai, Anru R

    T. Tony Cai, Anru R. Zhang, and Yuchen Zhou,Sparse group lasso: Optimal sample com- plexity, convergence rate, and statistical inference, IEEE Transactions on Information Theory 68(2022), no. 9, 5975–6002

  20. [20]

    Tony Cai and Harrison H

    T. Tony Cai and Harrison H. Zhou,Optimal rates of convergence for sparse covariance matrix estimation, The Annals of Statistics40(2012), no. 5, 2389 – 2420. 24

  21. [21]

    Tianxi Cai, T. Tony Cai, and Zijian Guo,Optimal statistical inference for individualized treatment effects in high-dimensional models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)83(2021), no. 4, 669–719

  22. [22]

    3, 2127–2153

    Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in het- eroscedastic gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

  23. [23]

    Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no

    Olivier Collier, Laëtitia Comminges, and Alexandre B. Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no. 3, 923 – 958

  24. [24]

    1, Elsevier, 2001, pp

    Kenneth R Davidson and Stanislaw J Szarek,Local operator theory, random matrices and banach spaces, Handbook of the geometry of Banach spaces, vol. 1, Elsevier, 2001, pp. 317– 366

  25. [25]

    Yash Deshpande and Andrea Montanari,Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, ConferenceonLearningTheory, PMLR,2015, pp.523– 562

  26. [26]

    VitalyFeldman, ElenaGrigorescu, LevReyzin, SantoshSVempala, andYingXiao,Statistical algorithms and a lower bound for detecting planted cliques, Journal of the ACM (JACM)64 (2017), no. 2, 1–37

  27. [27]

    Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no

    Chao Gao, Zongming Ma, Zhao Ren, and Harrison H. Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no. 5, 2168 – 2197

  28. [28]

    Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no

    Chao Gao, Zongming Ma, and Harrison H. Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no. 5, 2074 – 2101

  29. [29]

    4, 1697–1727

    Alden Green and Elad Romanov,The high-dimensional asymptotics of principal component regression, The Annals of Statistics53(2025), no. 4, 1697–1727

  30. [30]

    thesis, 2018, p

    Samuel Hopkins,Statistical inference and the sum of squares method, Ph.D. thesis, 2018, p. 430

  31. [31]

    1, 18–32

    Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparametric esti- mation of the value of a linear functional in gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

  32. [32]

    3, 685–718

    Adel Javanmard and Jason D Lee,A flexible framework for hypothesis testing in high dimen- sions, Journal of the Royal Statistical Society Series B: Statistical Methodology82(2020), no. 3, 685–718

  33. [33]

    1, 2869–2909

    Adel Javanmard and Andrea Montanari,Confidence intervals and hypothesis testing for high- dimensional regression, Journal of Machine Learning Research15(2014), no. 1, 2869–2909

  34. [34]

    6A, 2593 – 2622

    Adel Javanmard and Andrea Montanari,Debiasing the lasso: Optimal sample size for Gaus- sian designs, The Annals of Statistics46(2018), no. 6A, 2593 – 2622

  35. [35]

    Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no

    Iain M. Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no. 486, 682–693. 25

  36. [36]

    Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira,Notes on computational hard- ness of hypothesis testing: Predictions using the low-degree likelihood ratio, ISAAC Congress (InternationalSocietyforAnalysis, itsApplicationsandComputation), Springer, 2019, pp.1– 50

  37. [37]

    3, 1695–1738

    Nilanjana Laha and Rajarshi Mukherjee,On support recovery with sparse CCA: Information theoretic and computational limits, IEEE transactions on information theory69(2023), no. 3, 1695–1738

  38. [38]

    Beatrice Laurent and Pascal Massart,Adaptive estimation of a quadratic functional by model selection, The Annals of statistics (2000), 1302–1338

  39. [39]

    Lehmann and Joseph P

    Erich L. Lehmann and Joseph P. Romano,Testing statistical hypotheses, 3rd ed., Springer Texts in Statistics, Springer, New York, 2005

  40. [40]

    Journal of In- equalities in Pure & Applied Mathematics8(2007), no

    Xin Li and Chao-Ping Chen,Inequalities for the gamma function., JIPAM. Journal of In- equalities in Pure & Applied Mathematics8(2007), no. 1, 554–563

  41. [41]

    5, 2318–2348

    Yuetian Luo and Chao Gao,Computational lower bounds for graphon estimation via low- degree polynomials, The Annals of Statistics52(2024), no. 5, 2318–2348

  42. [42]

    28, 2015

    Tengyu Ma and Avi Wigderson,Sum-of-squares lower bounds for sparse pca, Advances in Neural Information Processing Systems, vol. 28, 2015

  43. [43]

    2, 772 – 801

    Zongming Ma,Sparse principal component analysis and iterative thresholding, The Annals of Statistics41(2013), no. 2, 772 – 801

  44. [44]

    61, 1089–1116

    Zongming Ma and Yihong Wu,Computational barriers in minimax submatrix detection, The Annals of Statistics43(2015), no. 61, 1089–1116

  45. [45]

    35, 2022, pp

    Aaron Potechin and Goutham Rajendran,Sub-exponential time sum-of-squares lower bounds for principal components analysis, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 35724–35740

  46. [46]

    Garvesh Raskutti, Martin J Wainwright, and Bin Yu,Restricted eigenvalue properties for correlated gaussian designs, The Journal of Machine Learning Research11(2010), 2241– 2259

  47. [47]

    10, 6976–6994

    ,Minimax rates of estimation for high-dimensional linear regression over lq-balls, IEEE transactions on information theory57(2011), no. 10, 6976–6994

  48. [48]

    2652–2663

    Galen Reeves, Jiaming Xu, and Ilias Zadik,The all-or-nothing phenomenon in sparse linear regression, Conference on Learning Theory, PMLR, 2019, pp. 2652–2663

  49. [49]

    BenjaminRossman,Average-case complexity of detecting cliques, Ph.D.thesis, Massachusetts Institute of Technology, 2010

  50. [50]

    3, 1833–1858

    Tselil Schramm and Alexander S Wein,Computational barriers to estimation from low-degree polynomials, The Annals of Statistics50(2022), no. 3, 1833–1858

  51. [51]

    Sun and C.-H

    T. Sun and C.-H. Zhang,Scaled sparse linear regression, Biometrika99(2012), no. 4, 879– 898

  52. [52]

    23, American Mathematical Soc., 1939

    Gabor Szeg,Orthogonal polynomials, vol. 23, American Mathematical Soc., 1939

  53. [53]

    1, 267–288

    Robert Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology)58(1996), no. 1, 267–288. 26

  54. [54]

    2, 1248–1259

    Andreas M Tillmann and Marc E Pfetsch,The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Transactions on Information Theory60(2013), no. 2, 1248–1259

  55. [55]

    Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

    Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

  56. [56]

    3, 1166–1202

    Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure,On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics 42(2014), no. 3, 1166–1202

  57. [57]

    Roman Vershynin,Introduction to the non-asymptotic analysis of random matrices, Com- pressed sensing, Cambridge University Press, 2012, pp. 210–268

  58. [58]

    47, Cambridge university press, 2018

    ,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018

  59. [59]

    Alexander S Wein,Computational complexity of statistics: New insights from low-degree polynomials, arXiv preprint arXiv:2506.10748 (2025)

  60. [60]

    Yixuan Wu, Yilun Zhu, Lei Cao, and Naichen Shi,Calibrated principal component regression, The 29th International Conference on Artificial Intelligence and Statistics, 2026

  61. [61]

    Jie Xie and Dongming Huang,Minimax and adaptive estimation of general linear functionals under sparsity, arXiv preprint arXiv:2509.25595 (2025)

  62. [62]

    Cun-Hui Zhang and Stephanie S. Zhang,Confidence intervals for low dimensional param- eters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)76(2014), no. 1, 217–242

  63. [63]

    546, 1579–1591

    Junlong Zhao, Yang Zhou, and Yufeng Liu,Estimation of linear functionals in high- dimensional linear models: From sparsity to nonsparsity, Journal of the American Statistical Association119(2024), no. 546, 1579–1591

  64. [64]

    Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

    Yinchu Zhu and Jelena Bradic,Linear hypothesis testing in dense high-dimensional linear models, Journal of the American Statistical Association113(2018), no. 524, 1583–1600. 27 Supplement to “Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers” This supplement collects proofs and auxiliary resu...

  65. [65]

    Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k)

    Moreover, Pu ∩Θ(k u;ξ, t 0) = Θspike(ku;ξ, t 0),Θ spike ±τ (k;ξ, t 0)⊆ P u for1≤k≤k u. Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k). Optimizing overmand using Proposition 2 gives inf 0≤m≤p    P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n ...

  66. [66]

    Applying this withD=Mgives ∥ˆΣ(1) M M −I M M∥2 =∥ ˆΣ(1) M −I p∥2 ≲ r ku logp n1 .(S.31) 53

    SinceM⊆ bB c and|M| ≤k u, the defining property ofbB∈B ku yields, for every such subset D⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D −I p∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|logp n1   2 , where, since we haven1 ≥ck u logpfor some sufficiently large constantc, the leading rate in the right-hand side is p (γ∗|D|logp)/n 1. Applying this withD=...

  67. [67]

    For any fixedD⊆B c ∗ with|D| ≤k u, the submatrixX(1) ·D has i.i.d.N(0,I |D|)columns, hence ˆΣ(1) DD = 1 n1 (X(1) ·D )⊤X(1) ·D = 1 n1 Z ⊤Z, forZ∈R n1×|D| with i.i.d.N(0,1)entries. Therefore, by similar calculation in Section C.3, with probability at least1−4(ep) 1−γ∗/2, we have ∥ˆΣ(1) DD −I DD∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|...

  68. [68]

    By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2

    SinceM⊆ bB c andG⊆ bB, we have ˆΣ(1) GM as a submatrix of ˆΣ(1) bBM. By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2. The defining property of bB∈B ku states that for everyD⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D bB∥2 ≤ q ∥Γ bB(ˆΣ(1))∥2 s |D| n1 + s |bB| n1 + s γ∗|D|logp n1 . Applying this withD=Mgives ∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ...

  69. [69]

    aY ℓ=1 exp{sℓUℓ −s 2 ℓ /2} bY j=1 exp{tjVj −t 2 j /2} # . Since(U, V)is centered Gaussian, G(s, t) =E

    We next control∥ˆΣ(1) GO∥2. OnE 1, since G⊆B ∗, O⊆B c ∗,|O| ≤ | bB| ≤k u, 54 the defining property ofB∗ ∈B ku in Equation (S.27), applied withD=O, gives ∥ˆΣ(1) OB∗∥2 ≤ q ∥ΓB∗(ˆΣ(1))∥2 s |O| n1 + s |B∗| n1 + s γ∗|O|logp n1 . Moreover, ∥ˆΣ(1) GO∥2 =∥ ˆΣ(1) OG∥2 ≤ ∥ ˆΣ(1) OB∗∥2. Using the same argument with Lemma C.1 in the bound on∥ˆΣ(1) GM ∥2, it holds wit...

  70. [70]

    IfLD(D) = 1 +o(1)asn→ ∞, then no degree-Dpolynomial weakly separatesQ 1 andQ 2

  71. [71]

    IfLD(D) =O(1)asn→ ∞, then no degree-Dpolynomial strongly separatesQ 1 andQ 2. Remark.Statement (i) asserts that, whenLD(D) = 1 +o(1), every degree-Dpolynomial test is asymptotically no better than the trivial constant statistic in the sense of weak separation. Statement (ii) is stronger in that it rules out vanishing total error: boundedness ofLD(D)permit...

  72. [72]

    IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n

    Supposek u ≲ √n/logp. IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then the logarithmic equivalence can be strengthened to τadap(ku, k;ξ)≍ak u r logp n

  73. [73]

    IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n

    Supposek u ≫ √n/logp. IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then τadap(ku, k;ξ)≍ak u r logp n . In the intermediate regimeku ≪K≪k 2 u, the general bounds in Theorem 1 and proposition 1 give a ( √ K√n + k3/2 u logp n ) ≲τ adap(ku, k;ξ)≲a r K∧ n logp ...

  74. [74]

    By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ)

    Ultra-sparse regimek u ≲ √n/logp. By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ). IfK≲k 2 u, thenH(k 2 u logp;ξ)≍a √ K, and the lower boundν1/√nsatisfiesν 1/√n≍a √ K/√n by Lemma F.1. Hence τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, Lemma F.1 gives ν1 ≳ak u. Therefore, Corollary 1 gives τadap(ku, k;ξ)≍ log aku r logp n . 80 IfK/k 2 u ≥p c forc >0, thenH(k...

  75. [75]

    Moderately sparse regimeku ≫ √n/logp. In this case, Proposition 1 yields the upper bound τadap(ku, k;ξ)≲ ku logp n H(n/logp;ξ)≍a r K∧ n logp ku logp n .(S.87) The lower bound in Theorem 1 is τadap(ku, k;ξ)≳ ν1√n ∨ν 2 ku logp n . IfK≲k u, then H(n/logp;ξ)≍H(k u ;ξ)≍a √ K. Recall thatν 2 =H(k u;ξ). We see that both the upper bound and the lower boundν2ku lo...

  76. [76]

    Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM

    Furthermore, it also implies that there exists a fixedMsuch that the interval[t 0, M] receives positive probability mass. Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM. When both events happen, we have maxj∈supp(ξ) |ξj| minj∈supp(ξ) |ξj| ≳(logp) 1/q, which is unbounded. •It is also not the exact...

  77. [77]

    There exist constants0< c − < C+ <∞such that Dξ(C+s)≤ k2 u 8 ,(S.111) and Dξ(c−s)≥ k2 u 2 .(S.112) Consequently, the solutionλin(16)is positive and satisfies c−s≤λ≤C +s.(S.113)

  78. [78]

    By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q

    It holds that   kξX j=1 ξ2 j exp(−λ2/ξ2 j )   1/2 ≲k uL1/q.(S.114) Let S2(t) = kξX j=1 ξ2 j exp(−t2/ξ2 j ). By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q . Therefore, by the definition ofν1 in (17), ν1 =k uλ+S 2(λ)1/2 ≍k u log ep k2 u 1/2+1/q . This proves (S.104). In the moderately sparse regime, Proposition 1 andN=...

  79. [79]

    For every grid intervalIm = [Bm, Bm+1)intersecting[t 0, CM(logp) 1/q], #{j:|ξ j| ∈I m} ≤(logp) Apexp(−cB mq).(S.119)

  80. [80]

    We havet 2 =x q+2

    Wheneverpexp(−CB mq)≥(logp) A, #{j:|ξ j| ∈I m} ≥(logp) −Apexp(−CB mq).(S.120) Proof of Lemma F.5.Setx=x(t). We havet 2 =x q+2. We first prove the upper bound (S.115). Choosem 0 ∈Zsuch that Bm0 ≤x < B m0+1. Form=m 0 +k, we have Bm =B m0Bk =xθB k, θ:= Bm0 x ∈[B −1,1]. It follows that t2 B2m = xq+2 x2θ2B2k =x qθ−2B−2k, Bmq =x qθqBqk. Sinceθ∈[B −1,1], there a...

Showing first 80 references.