Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

Dongming Huang; Jie Xie

arxiv: 2605.21360 · v1 · pith:4WR7D6TBnew · submitted 2026-05-20 · 🧮 math.ST · cs.CC· stat.TH

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

Jie Xie , Dongming Huang This is my paper

Pith reviewed 2026-05-21 03:06 UTC · model grok-4.3

classification 🧮 math.ST cs.CCstat.TH

keywords sparse linear regressionlinear functional testingadaptive separation ratescomputational barriershigh-dimensional inferenceGaussian random designlow-degree polynomial bounds

0 comments

The pith

A mixed test characterizes the adaptive separation rate for testing linear functionals in sparse regression up to logarithmic factors in the ultra-sparse regime.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies testing whether a linear functional of the sparse coefficient vector equals a fixed value in high-dimensional linear regression. The loading vector is arbitrary, the design covariance is unknown, and the sparsity level is unknown but bounded above by a known constant. A computationally efficient mixed test is built to control Type I error uniformly over the larger null class while delivering power against sparser alternatives. Information-theoretic lower bounds are derived that match the test's performance up to logs when the sparsity bound is ultra-small, for any loading profile. In denser sparsity regimes the same test is complemented by low-degree lower bounds and a reduction from sparse CCA to indicate that better statistical rates may require super-polynomial time.

Core claim

We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of ξ. In the ultra-sparse regime these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary ξ. In the moderately sparse regime these bounds match for several classes of loading vectors but may differ in general. In this regime we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. For flat sparse loadings we complement this with a polynomial-time reduction from sparse CCA.

What carries the argument

The mixed test that adapts to unknown sparsity bounded by ku while controlling uniform Type I error over the ku-sparse null and achieving power against k-sparse alternatives, with separation calibrated to the magnitude profile of the arbitrary loading ξ.

If this is right

In the ultra-sparse regime ku ≲ sqrt(n)/log p the mixed test attains the information-theoretic adaptive separation rate up to logarithmic factors for arbitrary ξ.
In the moderately sparse regime a low-degree lower bound matches the mixed-test upper bound up to logarithmic factors, indicating computational hardness for improvement.
For flat sparse loadings a polynomial-time reduction from sparse CCA shows that beating the rate is at least as hard as solving sparse CCA.
When the design covariance is known and diagonal the adaptive separation rate takes the same form as in the ultra-sparse regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In practice the mixed test offers a reliable polynomial-time procedure whose rate is provably tight for general loadings when sparsity is low.
Similar statistical-computational gaps may appear in related high-dimensional testing problems that involve arbitrary linear functionals and unknown nuisance parameters.
Numerical checks that vary the magnitude profile of ξ could directly test whether the lower bounds are sharp beyond the regimes already analyzed.

Load-bearing premise

The design matrix is Gaussian with unknown covariance, and Type I error must be controlled uniformly over all possible ku-sparse null hypotheses while power is assessed only against k-sparse alternatives.

What would settle it

A simulation or explicit construction in the ultra-sparse regime where some test detects a k-sparse signal whose magnitude lies below the information-theoretic lower bound calibrated to ξ would contradict the claimed separation rate.

Figures

Figures reproduced from arXiv: 2605.21360 by Dongming Huang, Jie Xie.

read the original abstract

We study the problem of testing $H_0: \xi^\top\beta=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $\xi$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $\xi$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $\xi$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pins down adaptive separation rates for testing a linear functional in sparse regression with arbitrary loadings, matched by lower bounds in key regimes plus computational hardness via low-degree and CCA reduction.

read the letter

The key point is that this paper works out adaptive separation rates for testing ξ^T β = t0 in high-dimensional sparse linear regression when the loading ξ is arbitrary, not just fixed or flat. They build a computationally efficient mixed test for the upper bound and pair it with an information-theoretic lower bound that tracks the magnitude profile of ξ. These match up to logs in the ultra-sparse regime where ku is roughly up to sqrt(n)/log p. In the moderate-sparse regime they add a low-degree lower bound that aligns with the upper bound up to logs, and for flat loadings they reduce from sparse CCA to show the rate is hard to beat in polynomial time. They also compare the known-covariance and sparse-spiked cases to highlight possible statistical-computational gaps.

Referee Report

1 major / 3 minor

Summary. The manuscript studies testing the linear functional hypothesis H_0: ξ^T β = t_0 in high-dimensional sparse linear regression with Gaussian random design and unknown covariance matrix. The loading vector ξ is arbitrary while the sparsity level k is unknown but upper-bounded by a known k_u. Type I error must be controlled uniformly over all k_u-sparse nulls, and power is assessed against k-sparse alternatives. The authors construct a computationally efficient mixed test that supplies an upper bound on the adaptive separation distance and derive an information-theoretic lower bound calibrated to the magnitude profile of ξ. These bounds match up to logarithmic factors in the ultra-sparse regime k_u ≲ √n / log p for arbitrary ξ; in the moderate-sparse regime they match for several loading classes, with a low-degree lower bound and a sparse-CCA reduction supplying evidence of computational hardness. Additional results examine the effect of partial knowledge of the design covariance under sparse signed-spiked and known-diagonal models.

Significance. If the central claims hold, the work advances the theory of adaptive inference for linear functionals in sparse regression by delivering sharp separation rates that incorporate general loadings, unknown covariance, and computational constraints. The explicit matching of upper and lower bounds in the ultra-sparse regime, the low-degree polynomial evidence for hardness, and the polynomial-time reduction from sparse CCA for flat loadings constitute concrete strengths. The uniform Type I control over the k_u-sparse null and the treatment of two covariance-information regimes further strengthen the contribution.

major comments (1)

[§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.

minor comments (3)

[§3.2] The definition of the mixed test statistic in §3.2 would benefit from an explicit display of the two component statistics and the data-splitting rule used to achieve uniform Type I control.
[§2.2] Notation for the adaptive separation distance d_n(ξ, k_u) is introduced in the abstract and §1 but first appears with full dependence on the covariance in §2.2; a single consolidated display would improve readability.
[§5.1] The sparse-CCA reduction in §5.1 is sketched at a high level; adding a short paragraph clarifying the parameter mapping between the regression and CCA instances would help readers verify the polynomial-time claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the contribution, and constructive feedback. We address the major comment below and will revise the manuscript accordingly to improve clarity.

read point-by-point responses

Referee: [§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.

Authors: We agree that the dependence of the polynomial degree on the magnitude profile of ξ should be stated more explicitly to strengthen the computational barrier claim. In the proof of the low-degree lower bound, the degree d is chosen as the smallest integer such that the sum of the d largest squared entries of the sorted magnitude profile of ξ exceeds a threshold determined by the separation rate; this choice ensures the bound matches the mixed-test upper bound up to logarithmic factors for the classes of ξ considered (e.g., those with power-law or flat decay). We will add a remark immediately following Theorem 4.1 that makes this dependence fully explicit, including the precise formula relating d to the ordered magnitudes |ξ|_{(1)} ≥ ⋯ ≥ |ξ|_{(p)}. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives adaptive separation rates for testing linear functionals in sparse regression using information-theoretic lower bounds calibrated directly to the magnitude profile of the arbitrary loading vector ξ, together with an explicit construction of a computationally efficient mixed test for the upper bound. These rates are obtained from first-principles arguments (including low-degree polynomial methods and a polynomial-time reduction from sparse CCA that is external to the present work) without any equation reducing the claimed rate to a fitted parameter or quantity defined from the same data or by self-referential definition. The Gaussian random-design model, unknown covariance, and uniform Type-I control over the k_u-sparse null are handled explicitly in the constructions and lower-bound arguments, rendering the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard high-dimensional linear model assumptions plus the specific sparsity bound k_u and Gaussian design; no new entities are postulated and no free parameters are fitted inside the rate derivations.

axioms (2)

domain assumption The design matrix has i.i.d. Gaussian rows with unknown covariance matrix.
Stated in the model setup; used to derive both the mixed test and the information-theoretic lower bound.
domain assumption The regression vector β is exactly k-sparse for alternatives and at most k_u-sparse for the null, with k_u known.
Central to the adaptive testing formulation and uniformity requirement.

pith-pipeline@v0.9.0 · 5859 in / 1425 out tokens · 26732 ms · 2026-05-21T03:06:24.964768+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 1 internal anchor

[1]

2, 477–505

Arnab Auddy and Ming Yuan,Large-dimensional independent component analysis: Statis- tical optimality and computational tractability, The Annals of Statistics53(2025), no. 2, 477–505

work page 2025
[2]

6, 3448–3450

Afonso S Bandeira, Edgar Dobriban, Dustin G Mixon, and William F Sawin,Certifying the restricted isometry property is hard, IEEE transactions on information theory59(2013), no. 6, 3448–3450

work page 2013
[3]

35, 2022, pp

Afonso S Bandeira, Ahmed El Alaoui, Samuel Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik,The franz-parisi criterion and computational trade-offs in high dimensional statistics, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 33831– 33844. 23

work page 2022
[4]

1-3, 29–55

Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Daniele Gardy, and Dominique Gouyou-Beauchamps,Generating functions for generating trees, Discrete mathematics246(2002), no. 1-3, 29–55

work page 2002
[5]

1046–1066

QuentinBerthetandPhilippeRigollet,Complexity theoretic lower bounds for sparse principal component detection, Conference on learning theory, PMLR, 2013, pp. 1046–1066

work page 2013
[6]

Bickel, Ya’acov Ritov, and Alexandre B

Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov,Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics37(2009), no. 4, 1705–1732

work page 2009
[7]

2, 615–639

Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

work page 2022
[8]

Matthew Brennan, Guy Bresler, and Wasim Huleihel,Reducibility and computational lower bounds for problems with planted sparse structure, Conference On Learning Theory, PMLR, 2018, pp. 48–166

work page 2018
[9]

Guy Bresler, Sung Min Park, and Madalina Persu,Sparse pca from sparse linear regression, Advances in Neural Information Processing Systems, 2018

work page 2018
[10]

Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp

Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K. Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp. 2577–2590

work page 2025
[11]

Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no

Cristina Butucea and Yuri I. Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no. 5B, 2652 – 2688

work page 2013
[12]

Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no

T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no. 2, 615–646

work page 2017
[13]

4, 1807–1836

,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics 46(2018), no. 4, 1807–1836

work page 2018
[14]

2, 391–419

,Semi-supervised inference for explained variance in high-dimensional linear regres- sion, Journal of the Royal Statistical Society: Series B (Statistical Methodology)82(2020), no. 2, 391–419

work page 2020
[15]

T. Tony Cai, Zijian Guo, and Rong Ma,Statistical inference for high-dimensional generalized linear models with binary outcomes, Journal of the American Statistical Association118 (2023), no. 542, 1319–1332

work page 2023
[16]

Tony Cai and Mark G

T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over nonconvex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576

work page 2004
[17]

5, 2311 – 2343

,On adaptive estimation of linear functionals, The Annals of Statistics33(2005), no. 5, 2311 – 2343

work page 2005
[18]

Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no

T. Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no. 3, 781–815

work page 2015
[19]

Tony Cai, Anru R

T. Tony Cai, Anru R. Zhang, and Yuchen Zhou,Sparse group lasso: Optimal sample com- plexity, convergence rate, and statistical inference, IEEE Transactions on Information Theory 68(2022), no. 9, 5975–6002

work page 2022
[20]

Tony Cai and Harrison H

T. Tony Cai and Harrison H. Zhou,Optimal rates of convergence for sparse covariance matrix estimation, The Annals of Statistics40(2012), no. 5, 2389 – 2420. 24

work page 2012
[21]

Tianxi Cai, T. Tony Cai, and Zijian Guo,Optimal statistical inference for individualized treatment effects in high-dimensional models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)83(2021), no. 4, 669–719

work page 2021
[22]

3, 2127–2153

Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in het- eroscedastic gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

work page 2024
[23]

Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no

Olivier Collier, Laëtitia Comminges, and Alexandre B. Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no. 3, 923 – 958

work page 2017
[24]

1, Elsevier, 2001, pp

Kenneth R Davidson and Stanislaw J Szarek,Local operator theory, random matrices and banach spaces, Handbook of the geometry of Banach spaces, vol. 1, Elsevier, 2001, pp. 317– 366

work page 2001
[25]

Yash Deshpande and Andrea Montanari,Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, ConferenceonLearningTheory, PMLR,2015, pp.523– 562

work page 2015
[26]

VitalyFeldman, ElenaGrigorescu, LevReyzin, SantoshSVempala, andYingXiao,Statistical algorithms and a lower bound for detecting planted cliques, Journal of the ACM (JACM)64 (2017), no. 2, 1–37

work page 2017
[27]

Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no

Chao Gao, Zongming Ma, Zhao Ren, and Harrison H. Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no. 5, 2168 – 2197

work page 2015
[28]

Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no

Chao Gao, Zongming Ma, and Harrison H. Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no. 5, 2074 – 2101

work page 2017
[29]

4, 1697–1727

Alden Green and Elad Romanov,The high-dimensional asymptotics of principal component regression, The Annals of Statistics53(2025), no. 4, 1697–1727

work page 2025
[30]

thesis, 2018, p

Samuel Hopkins,Statistical inference and the sum of squares method, Ph.D. thesis, 2018, p. 430

work page 2018
[31]

1, 18–32

Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparametric esti- mation of the value of a linear functional in gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

work page 1985
[32]

3, 685–718

Adel Javanmard and Jason D Lee,A flexible framework for hypothesis testing in high dimen- sions, Journal of the Royal Statistical Society Series B: Statistical Methodology82(2020), no. 3, 685–718

work page 2020
[33]

1, 2869–2909

Adel Javanmard and Andrea Montanari,Confidence intervals and hypothesis testing for high- dimensional regression, Journal of Machine Learning Research15(2014), no. 1, 2869–2909

work page 2014
[34]

6A, 2593 – 2622

Adel Javanmard and Andrea Montanari,Debiasing the lasso: Optimal sample size for Gaus- sian designs, The Annals of Statistics46(2018), no. 6A, 2593 – 2622

work page 2018
[35]

Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no

Iain M. Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no. 486, 682–693. 25

work page 2009
[36]

Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira,Notes on computational hard- ness of hypothesis testing: Predictions using the low-degree likelihood ratio, ISAAC Congress (InternationalSocietyforAnalysis, itsApplicationsandComputation), Springer, 2019, pp.1– 50

work page 2019
[37]

3, 1695–1738

Nilanjana Laha and Rajarshi Mukherjee,On support recovery with sparse CCA: Information theoretic and computational limits, IEEE transactions on information theory69(2023), no. 3, 1695–1738

work page 2023
[38]

Beatrice Laurent and Pascal Massart,Adaptive estimation of a quadratic functional by model selection, The Annals of statistics (2000), 1302–1338

work page 2000
[39]

Lehmann and Joseph P

Erich L. Lehmann and Joseph P. Romano,Testing statistical hypotheses, 3rd ed., Springer Texts in Statistics, Springer, New York, 2005

work page 2005
[40]

Journal of In- equalities in Pure & Applied Mathematics8(2007), no

Xin Li and Chao-Ping Chen,Inequalities for the gamma function., JIPAM. Journal of In- equalities in Pure & Applied Mathematics8(2007), no. 1, 554–563

work page 2007
[41]

5, 2318–2348

Yuetian Luo and Chao Gao,Computational lower bounds for graphon estimation via low- degree polynomials, The Annals of Statistics52(2024), no. 5, 2318–2348

work page 2024
[42]

28, 2015

Tengyu Ma and Avi Wigderson,Sum-of-squares lower bounds for sparse pca, Advances in Neural Information Processing Systems, vol. 28, 2015

work page 2015
[43]

2, 772 – 801

Zongming Ma,Sparse principal component analysis and iterative thresholding, The Annals of Statistics41(2013), no. 2, 772 – 801

work page 2013
[44]

61, 1089–1116

Zongming Ma and Yihong Wu,Computational barriers in minimax submatrix detection, The Annals of Statistics43(2015), no. 61, 1089–1116

work page 2015
[45]

35, 2022, pp

Aaron Potechin and Goutham Rajendran,Sub-exponential time sum-of-squares lower bounds for principal components analysis, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 35724–35740

work page 2022
[46]

Garvesh Raskutti, Martin J Wainwright, and Bin Yu,Restricted eigenvalue properties for correlated gaussian designs, The Journal of Machine Learning Research11(2010), 2241– 2259

work page 2010
[47]

10, 6976–6994

,Minimax rates of estimation for high-dimensional linear regression over lq-balls, IEEE transactions on information theory57(2011), no. 10, 6976–6994

work page 2011
[48]

2652–2663

Galen Reeves, Jiaming Xu, and Ilias Zadik,The all-or-nothing phenomenon in sparse linear regression, Conference on Learning Theory, PMLR, 2019, pp. 2652–2663

work page 2019
[49]

BenjaminRossman,Average-case complexity of detecting cliques, Ph.D.thesis, Massachusetts Institute of Technology, 2010

work page 2010
[50]

3, 1833–1858

Tselil Schramm and Alexander S Wein,Computational barriers to estimation from low-degree polynomials, The Annals of Statistics50(2022), no. 3, 1833–1858

work page 2022
[51]

Sun and C.-H

T. Sun and C.-H. Zhang,Scaled sparse linear regression, Biometrika99(2012), no. 4, 879– 898

work page 2012
[52]

23, American Mathematical Soc., 1939

Gabor Szeg,Orthogonal polynomials, vol. 23, American Mathematical Soc., 1939

work page 1939
[53]

1, 267–288

Robert Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology)58(1996), no. 1, 267–288. 26

work page 1996
[54]

2, 1248–1259

Andreas M Tillmann and Marc E Pfetsch,The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Transactions on Information Theory60(2013), no. 2, 1248–1259

work page 2013
[55]

Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

work page 2009
[56]

3, 1166–1202

Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure,On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics 42(2014), no. 3, 1166–1202

work page 2014
[57]

Roman Vershynin,Introduction to the non-asymptotic analysis of random matrices, Com- pressed sensing, Cambridge University Press, 2012, pp. 210–268

work page 2012
[58]

47, Cambridge university press, 2018

,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018

work page 2018
[59]

Alexander S Wein,Computational complexity of statistics: New insights from low-degree polynomials, arXiv preprint arXiv:2506.10748 (2025)

work page arXiv 2025
[60]

Yixuan Wu, Yilun Zhu, Lei Cao, and Naichen Shi,Calibrated principal component regression, The 29th International Conference on Artificial Intelligence and Statistics, 2026

work page 2026
[61]

Jie Xie and Dongming Huang,Minimax and adaptive estimation of general linear functionals under sparsity, arXiv preprint arXiv:2509.25595 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[62]

Cun-Hui Zhang and Stephanie S. Zhang,Confidence intervals for low dimensional param- eters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)76(2014), no. 1, 217–242

work page 2014
[63]

546, 1579–1591

Junlong Zhao, Yang Zhou, and Yufeng Liu,Estimation of linear functionals in high- dimensional linear models: From sparsity to nonsparsity, Journal of the American Statistical Association119(2024), no. 546, 1579–1591

work page 2024
[64]

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

Yinchu Zhu and Jelena Bradic,Linear hypothesis testing in dense high-dimensional linear models, Journal of the American Statistical Association113(2018), no. 524, 1583–1600. 27 Supplement to “Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers” This supplement collects proofs and auxiliary resu...

work page 2018
[65]

Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k)

Moreover, Pu ∩Θ(k u;ξ, t 0) = Θspike(ku;ξ, t 0),Θ spike ±τ (k;ξ, t 0)⊆ P u for1≤k≤k u. Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k). Optimizing overmand using Proposition 2 gives inf 0≤m≤p    P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n ...

work page
[66]

Applying this withD=Mgives ∥ˆΣ(1) M M −I M M∥2 =∥ ˆΣ(1) M −I p∥2 ≲ r ku logp n1 .(S.31) 53

SinceM⊆ bB c and|M| ≤k u, the defining property ofbB∈B ku yields, for every such subset D⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D −I p∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|logp n1   2 , where, since we haven1 ≥ck u logpfor some sufficiently large constantc, the leading rate in the right-hand side is p (γ∗|D|logp)/n 1. Applying this withD=...

work page
[67]

For any fixedD⊆B c ∗ with|D| ≤k u, the submatrixX(1) ·D has i.i.d.N(0,I |D|)columns, hence ˆΣ(1) DD = 1 n1 (X(1) ·D )⊤X(1) ·D = 1 n1 Z ⊤Z, forZ∈R n1×|D| with i.i.d.N(0,1)entries. Therefore, by similar calculation in Section C.3, with probability at least1−4(ep) 1−γ∗/2, we have ∥ˆΣ(1) DD −I DD∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|...

work page
[68]

By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2

SinceM⊆ bB c andG⊆ bB, we have ˆΣ(1) GM as a submatrix of ˆΣ(1) bBM. By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2. The defining property of bB∈B ku states that for everyD⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D bB∥2 ≤ q ∥Γ bB(ˆΣ(1))∥2 s |D| n1 + s |bB| n1 + s γ∗|D|logp n1 . Applying this withD=Mgives ∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ...

work page
[69]

aY ℓ=1 exp{sℓUℓ −s 2 ℓ /2} bY j=1 exp{tjVj −t 2 j /2} # . Since(U, V)is centered Gaussian, G(s, t) =E

We next control∥ˆΣ(1) GO∥2. OnE 1, since G⊆B ∗, O⊆B c ∗,|O| ≤ | bB| ≤k u, 54 the defining property ofB∗ ∈B ku in Equation (S.27), applied withD=O, gives ∥ˆΣ(1) OB∗∥2 ≤ q ∥ΓB∗(ˆΣ(1))∥2 s |O| n1 + s |B∗| n1 + s γ∗|O|logp n1 . Moreover, ∥ˆΣ(1) GO∥2 =∥ ˆΣ(1) OG∥2 ≤ ∥ ˆΣ(1) OB∗∥2. Using the same argument with Lemma C.1 in the bound on∥ˆΣ(1) GM ∥2, it holds wit...

work page
[70]

IfLD(D) = 1 +o(1)asn→ ∞, then no degree-Dpolynomial weakly separatesQ 1 andQ 2

work page
[71]

IfLD(D) =O(1)asn→ ∞, then no degree-Dpolynomial strongly separatesQ 1 andQ 2. Remark.Statement (i) asserts that, whenLD(D) = 1 +o(1), every degree-Dpolynomial test is asymptotically no better than the trivial constant statistic in the sense of weak separation. Statement (ii) is stronger in that it rules out vanishing total error: boundedness ofLD(D)permit...

work page
[72]

IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n

Supposek u ≲ √n/logp. IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then the logarithmic equivalence can be strengthened to τadap(ku, k;ξ)≍ak u r logp n

work page
[73]

IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n

Supposek u ≫ √n/logp. IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then τadap(ku, k;ξ)≍ak u r logp n . In the intermediate regimeku ≪K≪k 2 u, the general bounds in Theorem 1 and proposition 1 give a ( √ K√n + k3/2 u logp n ) ≲τ adap(ku, k;ξ)≲a r K∧ n logp ...

work page
[74]

By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ)

Ultra-sparse regimek u ≲ √n/logp. By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ). IfK≲k 2 u, thenH(k 2 u logp;ξ)≍a √ K, and the lower boundν1/√nsatisfiesν 1/√n≍a √ K/√n by Lemma F.1. Hence τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, Lemma F.1 gives ν1 ≳ak u. Therefore, Corollary 1 gives τadap(ku, k;ξ)≍ log aku r logp n . 80 IfK/k 2 u ≥p c forc >0, thenH(k...

work page
[75]

Moderately sparse regimeku ≫ √n/logp. In this case, Proposition 1 yields the upper bound τadap(ku, k;ξ)≲ ku logp n H(n/logp;ξ)≍a r K∧ n logp ku logp n .(S.87) The lower bound in Theorem 1 is τadap(ku, k;ξ)≳ ν1√n ∨ν 2 ku logp n . IfK≲k u, then H(n/logp;ξ)≍H(k u ;ξ)≍a √ K. Recall thatν 2 =H(k u;ξ). We see that both the upper bound and the lower boundν2ku lo...

work page
[76]

Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM

Furthermore, it also implies that there exists a fixedMsuch that the interval[t 0, M] receives positive probability mass. Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM. When both events happen, we have maxj∈supp(ξ) |ξj| minj∈supp(ξ) |ξj| ≳(logp) 1/q, which is unbounded. •It is also not the exact...

work page
[77]

There exist constants0< c − < C+ <∞such that Dξ(C+s)≤ k2 u 8 ,(S.111) and Dξ(c−s)≥ k2 u 2 .(S.112) Consequently, the solutionλin(16)is positive and satisfies c−s≤λ≤C +s.(S.113)

work page
[78]

By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q

It holds that   kξX j=1 ξ2 j exp(−λ2/ξ2 j )   1/2 ≲k uL1/q.(S.114) Let S2(t) = kξX j=1 ξ2 j exp(−t2/ξ2 j ). By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q . Therefore, by the definition ofν1 in (17), ν1 =k uλ+S 2(λ)1/2 ≍k u log ep k2 u 1/2+1/q . This proves (S.104). In the moderately sparse regime, Proposition 1 andN=...

work page
[79]

For every grid intervalIm = [Bm, Bm+1)intersecting[t 0, CM(logp) 1/q], #{j:|ξ j| ∈I m} ≤(logp) Apexp(−cB mq).(S.119)

work page
[80]

We havet 2 =x q+2

Wheneverpexp(−CB mq)≥(logp) A, #{j:|ξ j| ∈I m} ≥(logp) −Apexp(−CB mq).(S.120) Proof of Lemma F.5.Setx=x(t). We havet 2 =x q+2. We first prove the upper bound (S.115). Choosem 0 ∈Zsuch that Bm0 ≤x < B m0+1. Form=m 0 +k, we have Bm =B m0Bk =xθB k, θ:= Bm0 x ∈[B −1,1]. It follows that t2 B2m = xq+2 x2θ2B2k =x qθ−2B−2k, Bmq =x qθqBqk. Sinceθ∈[B −1,1], there a...

work page

Showing first 80 references.

[1] [1]

2, 477–505

Arnab Auddy and Ming Yuan,Large-dimensional independent component analysis: Statis- tical optimality and computational tractability, The Annals of Statistics53(2025), no. 2, 477–505

work page 2025

[2] [2]

6, 3448–3450

Afonso S Bandeira, Edgar Dobriban, Dustin G Mixon, and William F Sawin,Certifying the restricted isometry property is hard, IEEE transactions on information theory59(2013), no. 6, 3448–3450

work page 2013

[3] [3]

35, 2022, pp

Afonso S Bandeira, Ahmed El Alaoui, Samuel Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik,The franz-parisi criterion and computational trade-offs in high dimensional statistics, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 33831– 33844. 23

work page 2022

[4] [4]

1-3, 29–55

Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Daniele Gardy, and Dominique Gouyou-Beauchamps,Generating functions for generating trees, Discrete mathematics246(2002), no. 1-3, 29–55

work page 2002

[5] [5]

1046–1066

QuentinBerthetandPhilippeRigollet,Complexity theoretic lower bounds for sparse principal component detection, Conference on learning theory, PMLR, 2013, pp. 1046–1066

work page 2013

[6] [6]

Bickel, Ya’acov Ritov, and Alexandre B

Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov,Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics37(2009), no. 4, 1705–1732

work page 2009

[7] [7]

2, 615–639

Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

work page 2022

[8] [8]

Matthew Brennan, Guy Bresler, and Wasim Huleihel,Reducibility and computational lower bounds for problems with planted sparse structure, Conference On Learning Theory, PMLR, 2018, pp. 48–166

work page 2018

[9] [9]

Guy Bresler, Sung Min Park, and Madalina Persu,Sparse pca from sparse linear regression, Advances in Neural Information Processing Systems, 2018

work page 2018

[10] [10]

Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp

Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K. Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp. 2577–2590

work page 2025

[11] [11]

Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no

Cristina Butucea and Yuri I. Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no. 5B, 2652 – 2688

work page 2013

[12] [12]

Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no

T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no. 2, 615–646

work page 2017

[13] [13]

4, 1807–1836

,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics 46(2018), no. 4, 1807–1836

work page 2018

[14] [14]

2, 391–419

,Semi-supervised inference for explained variance in high-dimensional linear regres- sion, Journal of the Royal Statistical Society: Series B (Statistical Methodology)82(2020), no. 2, 391–419

work page 2020

[15] [15]

T. Tony Cai, Zijian Guo, and Rong Ma,Statistical inference for high-dimensional generalized linear models with binary outcomes, Journal of the American Statistical Association118 (2023), no. 542, 1319–1332

work page 2023

[16] [16]

Tony Cai and Mark G

T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over nonconvex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576

work page 2004

[17] [17]

5, 2311 – 2343

,On adaptive estimation of linear functionals, The Annals of Statistics33(2005), no. 5, 2311 – 2343

work page 2005

[18] [18]

Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no

T. Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no. 3, 781–815

work page 2015

[19] [19]

Tony Cai, Anru R

T. Tony Cai, Anru R. Zhang, and Yuchen Zhou,Sparse group lasso: Optimal sample com- plexity, convergence rate, and statistical inference, IEEE Transactions on Information Theory 68(2022), no. 9, 5975–6002

work page 2022

[20] [20]

Tony Cai and Harrison H

T. Tony Cai and Harrison H. Zhou,Optimal rates of convergence for sparse covariance matrix estimation, The Annals of Statistics40(2012), no. 5, 2389 – 2420. 24

work page 2012

[21] [21]

Tianxi Cai, T. Tony Cai, and Zijian Guo,Optimal statistical inference for individualized treatment effects in high-dimensional models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)83(2021), no. 4, 669–719

work page 2021

[22] [22]

3, 2127–2153

Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in het- eroscedastic gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

work page 2024

[23] [23]

Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no

Olivier Collier, Laëtitia Comminges, and Alexandre B. Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no. 3, 923 – 958

work page 2017

[24] [24]

1, Elsevier, 2001, pp

Kenneth R Davidson and Stanislaw J Szarek,Local operator theory, random matrices and banach spaces, Handbook of the geometry of Banach spaces, vol. 1, Elsevier, 2001, pp. 317– 366

work page 2001

[25] [25]

Yash Deshpande and Andrea Montanari,Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, ConferenceonLearningTheory, PMLR,2015, pp.523– 562

work page 2015

[26] [26]

VitalyFeldman, ElenaGrigorescu, LevReyzin, SantoshSVempala, andYingXiao,Statistical algorithms and a lower bound for detecting planted cliques, Journal of the ACM (JACM)64 (2017), no. 2, 1–37

work page 2017

[27] [27]

Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no

Chao Gao, Zongming Ma, Zhao Ren, and Harrison H. Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no. 5, 2168 – 2197

work page 2015

[28] [28]

Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no

Chao Gao, Zongming Ma, and Harrison H. Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no. 5, 2074 – 2101

work page 2017

[29] [29]

4, 1697–1727

Alden Green and Elad Romanov,The high-dimensional asymptotics of principal component regression, The Annals of Statistics53(2025), no. 4, 1697–1727

work page 2025

[30] [30]

thesis, 2018, p

Samuel Hopkins,Statistical inference and the sum of squares method, Ph.D. thesis, 2018, p. 430

work page 2018

[31] [31]

1, 18–32

Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparametric esti- mation of the value of a linear functional in gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

work page 1985

[32] [32]

3, 685–718

Adel Javanmard and Jason D Lee,A flexible framework for hypothesis testing in high dimen- sions, Journal of the Royal Statistical Society Series B: Statistical Methodology82(2020), no. 3, 685–718

work page 2020

[33] [33]

1, 2869–2909

Adel Javanmard and Andrea Montanari,Confidence intervals and hypothesis testing for high- dimensional regression, Journal of Machine Learning Research15(2014), no. 1, 2869–2909

work page 2014

[34] [34]

6A, 2593 – 2622

Adel Javanmard and Andrea Montanari,Debiasing the lasso: Optimal sample size for Gaus- sian designs, The Annals of Statistics46(2018), no. 6A, 2593 – 2622

work page 2018

[35] [35]

Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no

Iain M. Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no. 486, 682–693. 25

work page 2009

[36] [36]

Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira,Notes on computational hard- ness of hypothesis testing: Predictions using the low-degree likelihood ratio, ISAAC Congress (InternationalSocietyforAnalysis, itsApplicationsandComputation), Springer, 2019, pp.1– 50

work page 2019

[37] [37]

3, 1695–1738

Nilanjana Laha and Rajarshi Mukherjee,On support recovery with sparse CCA: Information theoretic and computational limits, IEEE transactions on information theory69(2023), no. 3, 1695–1738

work page 2023

[38] [38]

Beatrice Laurent and Pascal Massart,Adaptive estimation of a quadratic functional by model selection, The Annals of statistics (2000), 1302–1338

work page 2000

[39] [39]

Lehmann and Joseph P

Erich L. Lehmann and Joseph P. Romano,Testing statistical hypotheses, 3rd ed., Springer Texts in Statistics, Springer, New York, 2005

work page 2005

[40] [40]

Journal of In- equalities in Pure & Applied Mathematics8(2007), no

Xin Li and Chao-Ping Chen,Inequalities for the gamma function., JIPAM. Journal of In- equalities in Pure & Applied Mathematics8(2007), no. 1, 554–563

work page 2007

[41] [41]

5, 2318–2348

Yuetian Luo and Chao Gao,Computational lower bounds for graphon estimation via low- degree polynomials, The Annals of Statistics52(2024), no. 5, 2318–2348

work page 2024

[42] [42]

28, 2015

Tengyu Ma and Avi Wigderson,Sum-of-squares lower bounds for sparse pca, Advances in Neural Information Processing Systems, vol. 28, 2015

work page 2015

[43] [43]

2, 772 – 801

Zongming Ma,Sparse principal component analysis and iterative thresholding, The Annals of Statistics41(2013), no. 2, 772 – 801

work page 2013

[44] [44]

61, 1089–1116

Zongming Ma and Yihong Wu,Computational barriers in minimax submatrix detection, The Annals of Statistics43(2015), no. 61, 1089–1116

work page 2015

[45] [45]

35, 2022, pp

Aaron Potechin and Goutham Rajendran,Sub-exponential time sum-of-squares lower bounds for principal components analysis, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 35724–35740

work page 2022

[46] [46]

Garvesh Raskutti, Martin J Wainwright, and Bin Yu,Restricted eigenvalue properties for correlated gaussian designs, The Journal of Machine Learning Research11(2010), 2241– 2259

work page 2010

[47] [47]

10, 6976–6994

,Minimax rates of estimation for high-dimensional linear regression over lq-balls, IEEE transactions on information theory57(2011), no. 10, 6976–6994

work page 2011

[48] [48]

2652–2663

Galen Reeves, Jiaming Xu, and Ilias Zadik,The all-or-nothing phenomenon in sparse linear regression, Conference on Learning Theory, PMLR, 2019, pp. 2652–2663

work page 2019

[49] [49]

BenjaminRossman,Average-case complexity of detecting cliques, Ph.D.thesis, Massachusetts Institute of Technology, 2010

work page 2010

[50] [50]

3, 1833–1858

Tselil Schramm and Alexander S Wein,Computational barriers to estimation from low-degree polynomials, The Annals of Statistics50(2022), no. 3, 1833–1858

work page 2022

[51] [51]

Sun and C.-H

T. Sun and C.-H. Zhang,Scaled sparse linear regression, Biometrika99(2012), no. 4, 879– 898

work page 2012

[52] [52]

23, American Mathematical Soc., 1939

Gabor Szeg,Orthogonal polynomials, vol. 23, American Mathematical Soc., 1939

work page 1939

[53] [53]

1, 267–288

Robert Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology)58(1996), no. 1, 267–288. 26

work page 1996

[54] [54]

2, 1248–1259

Andreas M Tillmann and Marc E Pfetsch,The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Transactions on Information Theory60(2013), no. 2, 1248–1259

work page 2013

[55] [55]

Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009

work page 2009

[56] [56]

3, 1166–1202

Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure,On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics 42(2014), no. 3, 1166–1202

work page 2014

[57] [57]

Roman Vershynin,Introduction to the non-asymptotic analysis of random matrices, Com- pressed sensing, Cambridge University Press, 2012, pp. 210–268

work page 2012

[58] [58]

47, Cambridge university press, 2018

,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018

work page 2018

[59] [59]

Alexander S Wein,Computational complexity of statistics: New insights from low-degree polynomials, arXiv preprint arXiv:2506.10748 (2025)

work page arXiv 2025

[60] [60]

Yixuan Wu, Yilun Zhu, Lei Cao, and Naichen Shi,Calibrated principal component regression, The 29th International Conference on Artificial Intelligence and Statistics, 2026

work page 2026

[61] [61]

Jie Xie and Dongming Huang,Minimax and adaptive estimation of general linear functionals under sparsity, arXiv preprint arXiv:2509.25595 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[62] [62]

Cun-Hui Zhang and Stephanie S. Zhang,Confidence intervals for low dimensional param- eters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)76(2014), no. 1, 217–242

work page 2014

[63] [63]

546, 1579–1591

Junlong Zhao, Yang Zhou, and Yufeng Liu,Estimation of linear functionals in high- dimensional linear models: From sparsity to nonsparsity, Journal of the American Statistical Association119(2024), no. 546, 1579–1591

work page 2024

[64] [64]

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

Yinchu Zhu and Jelena Bradic,Linear hypothesis testing in dense high-dimensional linear models, Journal of the American Statistical Association113(2018), no. 524, 1583–1600. 27 Supplement to “Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers” This supplement collects proofs and auxiliary resu...

work page 2018

[65] [65]

Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k)

Moreover, Pu ∩Θ(k u;ξ, t 0) = Θspike(ku;ξ, t 0),Θ spike ±τ (k;ξ, t 0)⊆ P u for1≤k≤k u. Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k). Optimizing overmand using Proposition 2 gives inf 0≤m≤p    P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n ...

work page

[66] [66]

Applying this withD=Mgives ∥ˆΣ(1) M M −I M M∥2 =∥ ˆΣ(1) M −I p∥2 ≲ r ku logp n1 .(S.31) 53

SinceM⊆ bB c and|M| ≤k u, the defining property ofbB∈B ku yields, for every such subset D⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D −I p∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|logp n1   2 , where, since we haven1 ≥ck u logpfor some sufficiently large constantc, the leading rate in the right-hand side is p (γ∗|D|logp)/n 1. Applying this withD=...

work page

[67] [67]

For any fixedD⊆B c ∗ with|D| ≤k u, the submatrixX(1) ·D has i.i.d.N(0,I |D|)columns, hence ˆΣ(1) DD = 1 n1 (X(1) ·D )⊤X(1) ·D = 1 n1 Z ⊤Z, forZ∈R n1×|D| with i.i.d.N(0,1)entries. Therefore, by similar calculation in Section C.3, with probability at least1−4(ep) 1−γ∗/2, we have ∥ˆΣ(1) DD −I DD∥2 ≤2   s |D| n1 + s γ∗|D|logp n1   +   s |D| n1 + s γ∗|D|...

work page

[68] [68]

By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2

SinceM⊆ bB c andG⊆ bB, we have ˆΣ(1) GM as a submatrix of ˆΣ(1) bBM. By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2. The defining property of bB∈B ku states that for everyD⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D bB∥2 ≤ q ∥Γ bB(ˆΣ(1))∥2 s |D| n1 + s |bB| n1 + s γ∗|D|logp n1 . Applying this withD=Mgives ∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ...

work page

[69] [69]

aY ℓ=1 exp{sℓUℓ −s 2 ℓ /2} bY j=1 exp{tjVj −t 2 j /2} # . Since(U, V)is centered Gaussian, G(s, t) =E

We next control∥ˆΣ(1) GO∥2. OnE 1, since G⊆B ∗, O⊆B c ∗,|O| ≤ | bB| ≤k u, 54 the defining property ofB∗ ∈B ku in Equation (S.27), applied withD=O, gives ∥ˆΣ(1) OB∗∥2 ≤ q ∥ΓB∗(ˆΣ(1))∥2 s |O| n1 + s |B∗| n1 + s γ∗|O|logp n1 . Moreover, ∥ˆΣ(1) GO∥2 =∥ ˆΣ(1) OG∥2 ≤ ∥ ˆΣ(1) OB∗∥2. Using the same argument with Lemma C.1 in the bound on∥ˆΣ(1) GM ∥2, it holds wit...

work page

[70] [70]

IfLD(D) = 1 +o(1)asn→ ∞, then no degree-Dpolynomial weakly separatesQ 1 andQ 2

work page

[71] [71]

IfLD(D) =O(1)asn→ ∞, then no degree-Dpolynomial strongly separatesQ 1 andQ 2. Remark.Statement (i) asserts that, whenLD(D) = 1 +o(1), every degree-Dpolynomial test is asymptotically no better than the trivial constant statistic in the sense of weak separation. Statement (ii) is stronger in that it rules out vanishing total error: boundedness ofLD(D)permit...

work page

[72] [72]

IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n

Supposek u ≲ √n/logp. IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then the logarithmic equivalence can be strengthened to τadap(ku, k;ξ)≍ak u r logp n

work page

[73] [73]

IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n

Supposek u ≫ √n/logp. IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then τadap(ku, k;ξ)≍ak u r logp n . In the intermediate regimeku ≪K≪k 2 u, the general bounds in Theorem 1 and proposition 1 give a ( √ K√n + k3/2 u logp n ) ≲τ adap(ku, k;ξ)≲a r K∧ n logp ...

work page

[74] [74]

By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ)

Ultra-sparse regimek u ≲ √n/logp. By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ). IfK≲k 2 u, thenH(k 2 u logp;ξ)≍a √ K, and the lower boundν1/√nsatisfiesν 1/√n≍a √ K/√n by Lemma F.1. Hence τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, Lemma F.1 gives ν1 ≳ak u. Therefore, Corollary 1 gives τadap(ku, k;ξ)≍ log aku r logp n . 80 IfK/k 2 u ≥p c forc >0, thenH(k...

work page

[75] [75]

Moderately sparse regimeku ≫ √n/logp. In this case, Proposition 1 yields the upper bound τadap(ku, k;ξ)≲ ku logp n H(n/logp;ξ)≍a r K∧ n logp ku logp n .(S.87) The lower bound in Theorem 1 is τadap(ku, k;ξ)≳ ν1√n ∨ν 2 ku logp n . IfK≲k u, then H(n/logp;ξ)≍H(k u ;ξ)≍a √ K. Recall thatν 2 =H(k u;ξ). We see that both the upper bound and the lower boundν2ku lo...

work page

[76] [76]

Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM

Furthermore, it also implies that there exists a fixedMsuch that the interval[t 0, M] receives positive probability mass. Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM. When both events happen, we have maxj∈supp(ξ) |ξj| minj∈supp(ξ) |ξj| ≳(logp) 1/q, which is unbounded. •It is also not the exact...

work page

[77] [77]

There exist constants0< c − < C+ <∞such that Dξ(C+s)≤ k2 u 8 ,(S.111) and Dξ(c−s)≥ k2 u 2 .(S.112) Consequently, the solutionλin(16)is positive and satisfies c−s≤λ≤C +s.(S.113)

work page

[78] [78]

By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q

It holds that   kξX j=1 ξ2 j exp(−λ2/ξ2 j )   1/2 ≲k uL1/q.(S.114) Let S2(t) = kξX j=1 ξ2 j exp(−t2/ξ2 j ). By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q . Therefore, by the definition ofν1 in (17), ν1 =k uλ+S 2(λ)1/2 ≍k u log ep k2 u 1/2+1/q . This proves (S.104). In the moderately sparse regime, Proposition 1 andN=...

work page

[79] [79]

For every grid intervalIm = [Bm, Bm+1)intersecting[t 0, CM(logp) 1/q], #{j:|ξ j| ∈I m} ≤(logp) Apexp(−cB mq).(S.119)

work page

[80] [80]

We havet 2 =x q+2

Wheneverpexp(−CB mq)≥(logp) A, #{j:|ξ j| ∈I m} ≥(logp) −Apexp(−CB mq).(S.120) Proof of Lemma F.5.Setx=x(t). We havet 2 =x q+2. We first prove the upper bound (S.115). Choosem 0 ∈Zsuch that Bm0 ≤x < B m0+1. Form=m 0 +k, we have Bm =B m0Bk =xθB k, θ:= Bm0 x ∈[B −1,1]. It follows that t2 B2m = xq+2 x2θ2B2k =x qθ−2B−2k, Bmq =x qθqBqk. Sinceθ∈[B −1,1], there a...

work page