Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers
Pith reviewed 2026-05-21 03:06 UTC · model grok-4.3
The pith
A mixed test characterizes the adaptive separation rate for testing linear functionals in sparse regression up to logarithmic factors in the ultra-sparse regime.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of ξ. In the ultra-sparse regime these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary ξ. In the moderately sparse regime these bounds match for several classes of loading vectors but may differ in general. In this regime we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. For flat sparse loadings we complement this with a polynomial-time reduction from sparse CCA.
What carries the argument
The mixed test that adapts to unknown sparsity bounded by ku while controlling uniform Type I error over the ku-sparse null and achieving power against k-sparse alternatives, with separation calibrated to the magnitude profile of the arbitrary loading ξ.
If this is right
- In the ultra-sparse regime ku ≲ sqrt(n)/log p the mixed test attains the information-theoretic adaptive separation rate up to logarithmic factors for arbitrary ξ.
- In the moderately sparse regime a low-degree lower bound matches the mixed-test upper bound up to logarithmic factors, indicating computational hardness for improvement.
- For flat sparse loadings a polynomial-time reduction from sparse CCA shows that beating the rate is at least as hard as solving sparse CCA.
- When the design covariance is known and diagonal the adaptive separation rate takes the same form as in the ultra-sparse regime.
Where Pith is reading between the lines
- In practice the mixed test offers a reliable polynomial-time procedure whose rate is provably tight for general loadings when sparsity is low.
- Similar statistical-computational gaps may appear in related high-dimensional testing problems that involve arbitrary linear functionals and unknown nuisance parameters.
- Numerical checks that vary the magnitude profile of ξ could directly test whether the lower bounds are sharp beyond the regimes already analyzed.
Load-bearing premise
The design matrix is Gaussian with unknown covariance, and Type I error must be controlled uniformly over all possible ku-sparse null hypotheses while power is assessed only against k-sparse alternatives.
What would settle it
A simulation or explicit construction in the ultra-sparse regime where some test detects a k-sparse signal whose magnitude lies below the information-theoretic lower bound calibrated to ξ would contradict the claimed separation rate.
Figures
read the original abstract
We study the problem of testing $H_0: \xi^\top\beta=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $\xi$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $\xi$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $\xi$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies testing the linear functional hypothesis H_0: ξ^T β = t_0 in high-dimensional sparse linear regression with Gaussian random design and unknown covariance matrix. The loading vector ξ is arbitrary while the sparsity level k is unknown but upper-bounded by a known k_u. Type I error must be controlled uniformly over all k_u-sparse nulls, and power is assessed against k-sparse alternatives. The authors construct a computationally efficient mixed test that supplies an upper bound on the adaptive separation distance and derive an information-theoretic lower bound calibrated to the magnitude profile of ξ. These bounds match up to logarithmic factors in the ultra-sparse regime k_u ≲ √n / log p for arbitrary ξ; in the moderate-sparse regime they match for several loading classes, with a low-degree lower bound and a sparse-CCA reduction supplying evidence of computational hardness. Additional results examine the effect of partial knowledge of the design covariance under sparse signed-spiked and known-diagonal models.
Significance. If the central claims hold, the work advances the theory of adaptive inference for linear functionals in sparse regression by delivering sharp separation rates that incorporate general loadings, unknown covariance, and computational constraints. The explicit matching of upper and lower bounds in the ultra-sparse regime, the low-degree polynomial evidence for hardness, and the polynomial-time reduction from sparse CCA for flat loadings constitute concrete strengths. The uniform Type I control over the k_u-sparse null and the treatment of two covariance-information regimes further strengthen the contribution.
major comments (1)
- [§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.
minor comments (3)
- [§3.2] The definition of the mixed test statistic in §3.2 would benefit from an explicit display of the two component statistics and the data-splitting rule used to achieve uniform Type I control.
- [§2.2] Notation for the adaptive separation distance d_n(ξ, k_u) is introduced in the abstract and §1 but first appears with full dependence on the covariance in §2.2; a single consolidated display would improve readability.
- [§5.1] The sparse-CCA reduction in §5.1 is sketched at a high level; adding a short paragraph clarifying the parameter mapping between the regression and CCA instances would help readers verify the polynomial-time claim.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive assessment of the contribution, and constructive feedback. We address the major comment below and will revise the manuscript accordingly to improve clarity.
read point-by-point responses
-
Referee: [§4.3, Theorem 4.1] §4.3, Theorem 4.1: the low-degree lower bound is stated to match the mixed-test upper bound up to logarithmic factors in the moderate-sparse regime for several classes of ξ; however, the precise dependence of the degree on the magnitude profile of ξ is not made fully explicit, which is load-bearing for the claimed computational barrier.
Authors: We agree that the dependence of the polynomial degree on the magnitude profile of ξ should be stated more explicitly to strengthen the computational barrier claim. In the proof of the low-degree lower bound, the degree d is chosen as the smallest integer such that the sum of the d largest squared entries of the sorted magnitude profile of ξ exceeds a threshold determined by the separation rate; this choice ensures the bound matches the mixed-test upper bound up to logarithmic factors for the classes of ξ considered (e.g., those with power-law or flat decay). We will add a remark immediately following Theorem 4.1 that makes this dependence fully explicit, including the precise formula relating d to the ordered magnitudes |ξ|_{(1)} ≥ ⋯ ≥ |ξ|_{(p)}. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives adaptive separation rates for testing linear functionals in sparse regression using information-theoretic lower bounds calibrated directly to the magnitude profile of the arbitrary loading vector ξ, together with an explicit construction of a computationally efficient mixed test for the upper bound. These rates are obtained from first-principles arguments (including low-degree polynomial methods and a polynomial-time reduction from sparse CCA that is external to the present work) without any equation reducing the claimed rate to a fitted parameter or quantity defined from the same data or by self-referential definition. The Gaussian random-design model, unknown covariance, and uniform Type-I control over the k_u-sparse null are handled explicitly in the constructions and lower-bound arguments, rendering the central claims self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The design matrix has i.i.d. Gaussian rows with unknown covariance matrix.
- domain assumption The regression vector β is exactly k-sparse for alternatives and at most k_u-sparse for the null, with k_u known.
Reference graph
Works this paper leans on
-
[1]
Arnab Auddy and Ming Yuan,Large-dimensional independent component analysis: Statis- tical optimality and computational tractability, The Annals of Statistics53(2025), no. 2, 477–505
work page 2025
-
[2]
Afonso S Bandeira, Edgar Dobriban, Dustin G Mixon, and William F Sawin,Certifying the restricted isometry property is hard, IEEE transactions on information theory59(2013), no. 6, 3448–3450
work page 2013
-
[3]
Afonso S Bandeira, Ahmed El Alaoui, Samuel Hopkins, Tselil Schramm, Alexander S Wein, and Ilias Zadik,The franz-parisi criterion and computational trade-offs in high dimensional statistics, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 33831– 33844. 23
work page 2022
-
[4]
Cyril Banderier, Mireille Bousquet-Mélou, Alain Denise, Philippe Flajolet, Daniele Gardy, and Dominique Gouyou-Beauchamps,Generating functions for generating trees, Discrete mathematics246(2002), no. 1-3, 29–55
work page 2002
- [5]
-
[6]
Bickel, Ya’acov Ritov, and Alexandre B
Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov,Simultaneous analysis of lasso and dantzig selector, The Annals of Statistics37(2009), no. 4, 1705–1732
work page 2009
-
[7]
Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639
work page 2022
-
[8]
Matthew Brennan, Guy Bresler, and Wasim Huleihel,Reducibility and computational lower bounds for problems with planted sparse structure, Conference On Learning Theory, PMLR, 2018, pp. 48–166
work page 2018
-
[9]
Guy Bresler, Sung Min Park, and Madalina Persu,Sparse pca from sparse linear regression, Advances in Neural Information Processing Systems, 2018
work page 2018
-
[10]
Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K. Kothari,The quasi- polynomial low-degree conjecture is false, 2025IEEE66thAnnualSymposiumonFoundations of Computer Science (FOCS), 2025, pp. 2577–2590
work page 2025
-
[11]
Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no
Cristina Butucea and Yuri I. Ingster,Detection of a sparse submatrix of a high-dimensional noisy matrix, Bernoulli19(2013), no. 5B, 2652 – 2688
work page 2013
-
[12]
T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity, The Annals of Statistics45(2017), no. 2, 615–646
work page 2017
-
[13]
,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics 46(2018), no. 4, 1807–1836
work page 2018
-
[14]
,Semi-supervised inference for explained variance in high-dimensional linear regres- sion, Journal of the Royal Statistical Society: Series B (Statistical Methodology)82(2020), no. 2, 391–419
work page 2020
-
[15]
T. Tony Cai, Zijian Guo, and Rong Ma,Statistical inference for high-dimensional generalized linear models with binary outcomes, Journal of the American Statistical Association118 (2023), no. 542, 1319–1332
work page 2023
-
[16]
T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over nonconvex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576
work page 2004
-
[17]
,On adaptive estimation of linear functionals, The Annals of Statistics33(2005), no. 5, 2311 – 2343
work page 2005
-
[18]
T. Tony Cai, Zongming Ma, and Yihong Wu,Optimal estimation and rank detection for sparse spiked covariance matrices, Probability theory and related fields161(2015), no. 3, 781–815
work page 2015
-
[19]
T. Tony Cai, Anru R. Zhang, and Yuchen Zhou,Sparse group lasso: Optimal sample com- plexity, convergence rate, and statistical inference, IEEE Transactions on Information Theory 68(2022), no. 9, 5975–6002
work page 2022
-
[20]
T. Tony Cai and Harrison H. Zhou,Optimal rates of convergence for sparse covariance matrix estimation, The Annals of Statistics40(2012), no. 5, 2389 – 2420. 24
work page 2012
-
[21]
Tianxi Cai, T. Tony Cai, and Zijian Guo,Optimal statistical inference for individualized treatment effects in high-dimensional models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)83(2021), no. 4, 669–719
work page 2021
-
[22]
Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in het- eroscedastic gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153
work page 2024
-
[23]
Olivier Collier, Laëtitia Comminges, and Alexandre B. Tsybakov,Minimax estimation of linear and quadratic functionals on sparsity classes, The Annals of Statistics45(2017), no. 3, 923 – 958
work page 2017
-
[24]
Kenneth R Davidson and Stanislaw J Szarek,Local operator theory, random matrices and banach spaces, Handbook of the geometry of Banach spaces, vol. 1, Elsevier, 2001, pp. 317– 366
work page 2001
-
[25]
Yash Deshpande and Andrea Montanari,Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, ConferenceonLearningTheory, PMLR,2015, pp.523– 562
work page 2015
-
[26]
VitalyFeldman, ElenaGrigorescu, LevReyzin, SantoshSVempala, andYingXiao,Statistical algorithms and a lower bound for detecting planted cliques, Journal of the ACM (JACM)64 (2017), no. 2, 1–37
work page 2017
-
[27]
Chao Gao, Zongming Ma, Zhao Ren, and Harrison H. Zhou,Minimax estimation in sparse canonical correlation analysis, The Annals of Statistics43(2015), no. 5, 2168 – 2197
work page 2015
-
[28]
Chao Gao, Zongming Ma, and Harrison H. Zhou,Sparse CCA: Adaptive estimation and computational barriers, The Annals of Statistics45(2017), no. 5, 2074 – 2101
work page 2017
-
[29]
Alden Green and Elad Romanov,The high-dimensional asymptotics of principal component regression, The Annals of Statistics53(2025), no. 4, 1697–1727
work page 2025
-
[30]
Samuel Hopkins,Statistical inference and the sum of squares method, Ph.D. thesis, 2018, p. 430
work page 2018
- [31]
-
[32]
Adel Javanmard and Jason D Lee,A flexible framework for hypothesis testing in high dimen- sions, Journal of the Royal Statistical Society Series B: Statistical Methodology82(2020), no. 3, 685–718
work page 2020
-
[33]
Adel Javanmard and Andrea Montanari,Confidence intervals and hypothesis testing for high- dimensional regression, Journal of Machine Learning Research15(2014), no. 1, 2869–2909
work page 2014
-
[34]
Adel Javanmard and Andrea Montanari,Debiasing the lasso: Optimal sample size for Gaus- sian designs, The Annals of Statistics46(2018), no. 6A, 2593 – 2622
work page 2018
-
[35]
Iain M. Johnstone and Arthur Yu Lu,On consistency and sparsity for principal components analysis in high dimensions, Journal of the American Statistical Association104(2009), no. 486, 682–693. 25
work page 2009
-
[36]
Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira,Notes on computational hard- ness of hypothesis testing: Predictions using the low-degree likelihood ratio, ISAAC Congress (InternationalSocietyforAnalysis, itsApplicationsandComputation), Springer, 2019, pp.1– 50
work page 2019
-
[37]
Nilanjana Laha and Rajarshi Mukherjee,On support recovery with sparse CCA: Information theoretic and computational limits, IEEE transactions on information theory69(2023), no. 3, 1695–1738
work page 2023
-
[38]
Beatrice Laurent and Pascal Massart,Adaptive estimation of a quadratic functional by model selection, The Annals of statistics (2000), 1302–1338
work page 2000
-
[39]
Erich L. Lehmann and Joseph P. Romano,Testing statistical hypotheses, 3rd ed., Springer Texts in Statistics, Springer, New York, 2005
work page 2005
-
[40]
Journal of In- equalities in Pure & Applied Mathematics8(2007), no
Xin Li and Chao-Ping Chen,Inequalities for the gamma function., JIPAM. Journal of In- equalities in Pure & Applied Mathematics8(2007), no. 1, 554–563
work page 2007
-
[41]
Yuetian Luo and Chao Gao,Computational lower bounds for graphon estimation via low- degree polynomials, The Annals of Statistics52(2024), no. 5, 2318–2348
work page 2024
- [42]
-
[43]
Zongming Ma,Sparse principal component analysis and iterative thresholding, The Annals of Statistics41(2013), no. 2, 772 – 801
work page 2013
-
[44]
Zongming Ma and Yihong Wu,Computational barriers in minimax submatrix detection, The Annals of Statistics43(2015), no. 61, 1089–1116
work page 2015
-
[45]
Aaron Potechin and Goutham Rajendran,Sub-exponential time sum-of-squares lower bounds for principal components analysis, Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 35724–35740
work page 2022
-
[46]
Garvesh Raskutti, Martin J Wainwright, and Bin Yu,Restricted eigenvalue properties for correlated gaussian designs, The Journal of Machine Learning Research11(2010), 2241– 2259
work page 2010
-
[47]
,Minimax rates of estimation for high-dimensional linear regression over lq-balls, IEEE transactions on information theory57(2011), no. 10, 6976–6994
work page 2011
- [48]
-
[49]
BenjaminRossman,Average-case complexity of detecting cliques, Ph.D.thesis, Massachusetts Institute of Technology, 2010
work page 2010
-
[50]
Tselil Schramm and Alexander S Wein,Computational barriers to estimation from low-degree polynomials, The Annals of Statistics50(2022), no. 3, 1833–1858
work page 2022
-
[51]
T. Sun and C.-H. Zhang,Scaled sparse linear regression, Biometrika99(2012), no. 4, 879– 898
work page 2012
-
[52]
23, American Mathematical Soc., 1939
Gabor Szeg,Orthogonal polynomials, vol. 23, American Mathematical Soc., 1939
work page 1939
-
[53]
Robert Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology)58(1996), no. 1, 267–288. 26
work page 1996
-
[54]
Andreas M Tillmann and Marc E Pfetsch,The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Transactions on Information Theory60(2013), no. 2, 1248–1259
work page 2013
-
[55]
Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009
Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statis- tics, Springer, 2009
work page 2009
-
[56]
Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure,On asymptotically optimal confidence regions and tests for high-dimensional models, The Annals of Statistics 42(2014), no. 3, 1166–1202
work page 2014
-
[57]
Roman Vershynin,Introduction to the non-asymptotic analysis of random matrices, Com- pressed sensing, Cambridge University Press, 2012, pp. 210–268
work page 2012
-
[58]
47, Cambridge university press, 2018
,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018
work page 2018
- [59]
-
[60]
Yixuan Wu, Yilun Zhu, Lei Cao, and Naichen Shi,Calibrated principal component regression, The 29th International Conference on Artificial Intelligence and Statistics, 2026
work page 2026
-
[61]
Jie Xie and Dongming Huang,Minimax and adaptive estimation of general linear functionals under sparsity, arXiv preprint arXiv:2509.25595 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Cun-Hui Zhang and Stephanie S. Zhang,Confidence intervals for low dimensional param- eters in high dimensional linear models, Journal of the Royal Statistical Society: Series B (Statistical Methodology)76(2014), no. 1, 217–242
work page 2014
-
[63]
Junlong Zhao, Yang Zhou, and Yufeng Liu,Estimation of linear functionals in high- dimensional linear models: From sparsity to nonsparsity, Journal of the American Statistical Association119(2024), no. 546, 1579–1591
work page 2024
-
[64]
Yinchu Zhu and Jelena Bradic,Linear hypothesis testing in dense high-dimensional linear models, Journal of the American Statistical Association113(2018), no. 524, 1583–1600. 27 Supplement to “Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers” This supplement collects proofs and auxiliary resu...
work page 2018
-
[65]
Moreover, Pu ∩Θ(k u;ξ, t 0) = Θspike(ku;ξ, t 0),Θ spike ±τ (k;ξ, t 0)⊆ P u for1≤k≤k u. Thus Lemma A.5 gives, for every fixed deterministicm, τ spike adap (ku, k;ξ)≲ P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n +ν 2 ku logp n , where we have usedσ≤M 2 forθ∈Θ(k). Optimizing overmand using Proposition 2 gives inf 0≤m≤p P j≤m ξ2 j 1/2 √n +|ξ m+1|ku r logp n ...
-
[66]
Applying this withD=Mgives ∥ˆΣ(1) M M −I M M∥2 =∥ ˆΣ(1) M −I p∥2 ≲ r ku logp n1 .(S.31) 53
SinceM⊆ bB c and|M| ≤k u, the defining property ofbB∈B ku yields, for every such subset D⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D −I p∥2 ≤2 s |D| n1 + s γ∗|D|logp n1 + s |D| n1 + s γ∗|D|logp n1 2 , where, since we haven1 ≥ck u logpfor some sufficiently large constantc, the leading rate in the right-hand side is p (γ∗|D|logp)/n 1. Applying this withD=...
-
[67]
For any fixedD⊆B c ∗ with|D| ≤k u, the submatrixX(1) ·D has i.i.d.N(0,I |D|)columns, hence ˆΣ(1) DD = 1 n1 (X(1) ·D )⊤X(1) ·D = 1 n1 Z ⊤Z, forZ∈R n1×|D| with i.i.d.N(0,1)entries. Therefore, by similar calculation in Section C.3, with probability at least1−4(ep) 1−γ∗/2, we have ∥ˆΣ(1) DD −I DD∥2 ≤2 s |D| n1 + s γ∗|D|logp n1 + s |D| n1 + s γ∗|D|...
-
[68]
By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2
SinceM⊆ bB c andG⊆ bB, we have ˆΣ(1) GM as a submatrix of ˆΣ(1) bBM. By monotonicity of the operator norm under taking submatrices,∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ(1) bBM ∥2. The defining property of bB∈B ku states that for everyD⊆ bB c with|D| ≤k u, ∥ˆΣ(1) D bB∥2 ≤ q ∥Γ bB(ˆΣ(1))∥2 s |D| n1 + s |bB| n1 + s γ∗|D|logp n1 . Applying this withD=Mgives ∥ˆΣ(1) GM ∥2 ≤ ∥ ˆΣ...
-
[69]
We next control∥ˆΣ(1) GO∥2. OnE 1, since G⊆B ∗, O⊆B c ∗,|O| ≤ | bB| ≤k u, 54 the defining property ofB∗ ∈B ku in Equation (S.27), applied withD=O, gives ∥ˆΣ(1) OB∗∥2 ≤ q ∥ΓB∗(ˆΣ(1))∥2 s |O| n1 + s |B∗| n1 + s γ∗|O|logp n1 . Moreover, ∥ˆΣ(1) GO∥2 =∥ ˆΣ(1) OG∥2 ≤ ∥ ˆΣ(1) OB∗∥2. Using the same argument with Lemma C.1 in the bound on∥ˆΣ(1) GM ∥2, it holds wit...
-
[70]
IfLD(D) = 1 +o(1)asn→ ∞, then no degree-Dpolynomial weakly separatesQ 1 andQ 2
-
[71]
IfLD(D) =O(1)asn→ ∞, then no degree-Dpolynomial strongly separatesQ 1 andQ 2. Remark.Statement (i) asserts that, whenLD(D) = 1 +o(1), every degree-Dpolynomial test is asymptotically no better than the trivial constant statistic in the sense of weak separation. Statement (ii) is stronger in that it rules out vanishing total error: boundedness ofLD(D)permit...
-
[72]
IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n
Supposek u ≲ √n/logp. IfK≲k 2 u, then τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then the logarithmic equivalence can be strengthened to τadap(ku, k;ξ)≍ak u r logp n
-
[73]
IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n
Supposek u ≫ √n/logp. IfK≲k u, then τadap(ku, k;ξ)≍a √ K ku logp n . IfK≳k 2 u, then τadap(ku, k;ξ)≍ log aku r logp n . Moreover, ifK/k 2 u ≥p c for some constantc >0, then τadap(ku, k;ξ)≍ak u r logp n . In the intermediate regimeku ≪K≪k 2 u, the general bounds in Theorem 1 and proposition 1 give a ( √ K√n + k3/2 u logp n ) ≲τ adap(ku, k;ξ)≲a r K∧ n logp ...
-
[74]
By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ)
Ultra-sparse regimek u ≲ √n/logp. By Proposition 1, τadap(ku, k;ξ)≲ 1√n H(k 2 u logp;ξ). IfK≲k 2 u, thenH(k 2 u logp;ξ)≍a √ K, and the lower boundν1/√nsatisfiesν 1/√n≍a √ K/√n by Lemma F.1. Hence τadap(ku, k;ξ)≍ a √ K√n . IfK≳k 2 u, Lemma F.1 gives ν1 ≳ak u. Therefore, Corollary 1 gives τadap(ku, k;ξ)≍ log aku r logp n . 80 IfK/k 2 u ≥p c forc >0, thenH(k...
-
[75]
Moderately sparse regimeku ≫ √n/logp. In this case, Proposition 1 yields the upper bound τadap(ku, k;ξ)≲ ku logp n H(n/logp;ξ)≍a r K∧ n logp ku logp n .(S.87) The lower bound in Theorem 1 is τadap(ku, k;ξ)≳ ν1√n ∨ν 2 ku logp n . IfK≲k u, then H(n/logp;ξ)≍H(k u ;ξ)≍a √ K. Recall thatν 2 =H(k u;ξ). We see that both the upper bound and the lower boundν2ku lo...
-
[76]
Furthermore, it also implies that there exists a fixedMsuch that the interval[t 0, M] receives positive probability mass. Therefore, with probability tending to 1, there is at least one nonzero coordinate with magnitude at mostM. When both events happen, we have maxj∈supp(ξ) |ξj| minj∈supp(ξ) |ξj| ≳(logp) 1/q, which is unbounded. •It is also not the exact...
-
[77]
There exist constants0< c − < C+ <∞such that Dξ(C+s)≤ k2 u 8 ,(S.111) and Dξ(c−s)≥ k2 u 2 .(S.112) Consequently, the solutionλin(16)is positive and satisfies c−s≤λ≤C +s.(S.113)
-
[78]
By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q
It holds that kξX j=1 ξ2 j exp(−λ2/ξ2 j ) 1/2 ≲k uL1/q.(S.114) Let S2(t) = kξX j=1 ξ2 j exp(−t2/ξ2 j ). By Lemma F.4, we have λ≍ log ep k2 u 1/2+1/q , and S2(λ)1/2 ≲k u log ep k2 u 1/q . Therefore, by the definition ofν1 in (17), ν1 =k uλ+S 2(λ)1/2 ≍k u log ep k2 u 1/2+1/q . This proves (S.104). In the moderately sparse regime, Proposition 1 andN=...
-
[79]
For every grid intervalIm = [Bm, Bm+1)intersecting[t 0, CM(logp) 1/q], #{j:|ξ j| ∈I m} ≤(logp) Apexp(−cB mq).(S.119)
-
[80]
Wheneverpexp(−CB mq)≥(logp) A, #{j:|ξ j| ∈I m} ≥(logp) −Apexp(−CB mq).(S.120) Proof of Lemma F.5.Setx=x(t). We havet 2 =x q+2. We first prove the upper bound (S.115). Choosem 0 ∈Zsuch that Bm0 ≤x < B m0+1. Form=m 0 +k, we have Bm =B m0Bk =xθB k, θ:= Bm0 x ∈[B −1,1]. It follows that t2 B2m = xq+2 x2θ2B2k =x qθ−2B−2k, Bmq =x qθqBqk. Sinceθ∈[B −1,1], there a...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.