Minimax and adaptive estimation of general linear functionals under sparsity
Pith reviewed 2026-05-18 11:55 UTC · model grok-4.3
The pith
For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating a general linear functional of an s-sparse vector is explicit in sparsity, loadings, tail decay, and noise level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating L(θ) = η^T θ with s-sparse θ is explicit in s, η, the tail parameter, and the noise level. The rate is attained by a hybrid estimator that performs plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings. The matching lower bound follows from a loading-dependent sparse prior construction. For unknown sparsity an η-dependent Lepski-type adaptive estimator achieves the oracle rate up to the optimal logarithmic factor over a broad verifiable class of loading vectors.
What carries the argument
Hybrid estimator that applies plug-in estimation to coordinates with large |η_i| and thresholding to coordinates with small |η_i|, together with a loading-dependent sparse prior used to prove the lower bound.
If this is right
- Heterogeneity across the entries of the loading vector η produces different minimax and adaptive rates, as illustrated by explicit examples.
- The adaptive procedure matches the oracle minimax rate up to a logarithmic factor for a broad class of loading vectors when sparsity is unknown.
- Asymmetry of the noise distribution can strictly increase the minimax rate for some choices of η.
- The new χ² bound extends sharp lower-bound constructions to generalized Gaussian distributions beyond the Gaussian case.
Where Pith is reading between the lines
- The explicit dependence on η implies that in applications the choice or design of the loading vector can be used to reduce estimation risk.
- The same hybrid construction may extend to estimating other sparse functionals such as quadratic forms or norms.
- When noise is observed to be asymmetric in practice, the worst-case asymmetric lower-bound construction supplies a conservative risk benchmark.
- The techniques could be tested on sparse linear regression problems where the parameter of interest is a linear functional of the regression vector.
Load-bearing premise
The additive noise is symmetric and has exponentially decaying tails.
What would settle it
Simulate the hybrid estimator on data generated from symmetric exponential-tailed noise, vary the sparsity s and the loading vector η across several patterns, and check whether the observed risk matches the explicit rate formula given in the paper.
read the original abstract
We study nonasymptotic minimax estimation of the linear functional $L(\theta)=\eta^\top \theta$ for a high-dimensional $s$-sparse mean vector with an arbitrary loading vector $\eta$. For symmetric noise with exponentially decaying tails, we derive the sharp minimax rate, explicit in $s$, $\eta$, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings, and the matching lower bound is obtained via a loading-dependent sparse prior. For unknown sparsity, we construct an $\eta$-dependent Lepski-type procedure and show that, for a broad verifiable class of loading vectors, its risk matches the oracle rate up to the optimal logarithmic factor. Explicit examples illustrate how heterogeneity in $\eta$ changes both the minimax and adaptive rates. We also extend the analysis to non-symmetric noise, hypothesis testing, and estimation with unknown noise variance, where we show that asymmetry can increase the minimax rate in certain examples of $\eta$. Among these results, the two main technical novelties are the following. First, we extend the sharp lower-bound theory beyond the Gaussian setting via a new $\chi^2$ bound for generalized Gaussian distributions. Second, for possibly non-symmetric noise, we derive new lower bounds through a worst-case asymmetric construction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives sharp non-asymptotic minimax rates for estimating the linear functional L(θ) = ηᵀθ where θ ∈ ℝᵖ is s-sparse and η is an arbitrary loading vector. For symmetric noise with exponentially decaying tails, an explicit upper bound is obtained by plug-in estimation on coordinates with large |ηᵢ| combined with thresholding on the remainder; this is matched by a lower bound constructed from a loading-dependent sparse prior, supported by a new χ²-divergence bound for generalized Gaussian distributions. For unknown sparsity an η-dependent Lepski-type procedure is shown to attain the oracle rate up to logarithmic factors on a verifiable class of loadings. The analysis is extended to non-symmetric noise (where asymmetry can strictly increase the rate for some η), hypothesis testing, and unknown noise variance, with explicit examples illustrating the effect of heterogeneity in η.
Significance. If the derivations hold, the work supplies the first sharp, fully explicit minimax rates for general linear functionals under sparsity that are non-asymptotic and valid beyond the Gaussian case. The new χ² bound for generalized Gaussians and the loading-dependent prior construction are technically substantive contributions that may be reusable. The adaptive result and the demonstration that asymmetry can worsen the rate in concrete examples add practical and conceptual value. The explicit dependence on s, η, tail index, and noise level, together with the verifiable class for adaptivity, strengthens the contribution.
major comments (2)
- [§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.
- [§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.
minor comments (2)
- [§6] The definition of the “verifiable class” of η for which the adaptive procedure attains the oracle rate up to log factors should be stated more explicitly (e.g., as a concrete condition on the ordered loadings) so that readers can check membership without additional derivation.
- [§2] Notation for the tail parameter and the generalized-Gaussian family is introduced in §2 but used without repeated reminder in later sections; a short table or boxed definition would improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.
Authors: We thank the referee for this observation. The χ² bound for generalized Gaussian distributions is established in Appendix A.3. To improve accessibility and enable immediate verification of the exponential-tail regime, we will insert a concise proof sketch together with explicit constant tracking into the main text of Section 4 in the revised manuscript. revision: yes
-
Referee: [§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.
Authors: The loading-dependent sparse prior is constructed specifically for the symmetric-noise setting. For non-symmetric noise we employ an independent worst-case asymmetric construction, as already indicated in the abstract and developed in Section 6. We will add a short clarifying sentence at the end of Section 5 that explicitly distinguishes the two constructions and cross-references the asymmetric argument. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central derivations for the sharp minimax rate of L(θ) = ηᵀθ under s-sparsity rely on explicit upper bounds via plug-in and thresholding estimators, matched to lower bounds from a loading-dependent sparse prior and a new χ² divergence bound for generalized Gaussian distributions. These constructions are presented as technical novelties in Sections 3–5 with explicit dependence on s, η, tail parameters, and noise level, without reducing to fitted inputs from the same data, self-definitional equations, or load-bearing self-citations. The adaptive Lepski-type procedure in Section 6 attains the oracle rate up to log factors on a verifiable class of η, and extensions to non-symmetric noise use worst-case asymmetric constructions. No steps match the enumerated circularity patterns; the argument is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Noise is symmetric with exponentially decaying tails.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We study nonasymptotic minimax estimation of the linear functional L(θ)=η^⊤θ for a high-dimensional s-sparse mean vector... sharp minimax rate, explicit in s, η, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639
work page 2022
-
[2]
T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no. 2, 615–646
work page 2017
-
[3]
,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics46(2018), no. 4, 1807–1836
work page 2018
-
[4]
T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over non- convex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576. 23
work page 2004
-
[5]
,On adaptive estimation of linear functionals, The Annals of Statistics33 (2005), no. 5, 2311 – 2343
work page 2005
-
[6]
Tianxi Cai, T Tony Cai, and Zijian Guo,Optimal statistical inference for individu- alized treatment effects in high-dimensional models, Journal of the Royal Statistical Society Series B: Statistical Methodology83(2021), no. 4, 669–719
work page 2021
-
[7]
Alexandra Carpentier, Olivier Collier, Laetitia Comminges, Alexandre B Tsybakov, and Yuhao Wang,Estimation of theℓ 2-norm and testing in sparse linear regression with unknown variance, Bernoulli28(2022), no. 4, 2744–2787
work page 2022
-
[8]
Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in heteroscedastic Gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153
work page 2024
-
[9]
Olivier Collier, La¨ etitia Comminges, and Alexandre B. Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no. 3, 923 – 958
work page 2017
-
[10]
Olivier Collier, La¨ etitia Comminges, Alexandre B Tsybakov, and Nicolas Verze- len,Optimal adaptive estimation of linear functionals under sparsity, The Annals of Statistics46(2018), no. 6A, 3130–3150
work page 2018
-
[11]
L Comminges, O Collier, M Ndaoud, and AB Tsybakov,Adaptive robust estimation in sparse vector model, The Annals of Statistics49(2021), no. 3, 1347–1377
work page 2021
- [12]
- [13]
-
[14]
Yu Golubev and B Levit,An oracle approach to adaptive estimation of linear func- tionals in a Gaussian model, Mathematical Methods of Statistics13(2004), no. 4, 392–408
work page 2004
- [15]
-
[16]
16, Springer Science & Business Media, 1981
Ildar Abdulovich Ibragimov and Rafail Zalmanovich Has’ Minskii,Statistical estima- tion: asymptotic theory, vol. 16, Springer Science & Business Media, 1981
work page 1981
-
[17]
I. M. Johnstone,Gaussian estimation: Sequence and wavelet models, Draft manuscript, available athttps://imjohnstone.su.domains/GE_08_09_17.pdf, 2017
work page 2017
-
[18]
Subhodh Kotekal and Chao Gao,Minimax rates for sparse signal detection under correlation, Information and Inference: A Journal of the IMA12(2023), no. 4, 2873–2969. 24
work page 2023
-
[19]
,Sparsity meets correlation in Gaussian sequence model, The Annals of Statis- tics53(2025), no. 3, 1095–1122
work page 2025
-
[20]
Arun Kumar Kuchibhotla and Abhishek Chakrabortty,Moving beyond sub- Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression, Information and Inference: A Journal of the IMA11(2022), no. 4, 1389–1456
work page 2022
-
[21]
B´ eatrice Laurent, Carenne Lude˜ na, and Cl´ ementine Prieur,Adaptive estimation of linear functionals by model selection, Electronic Journal of Statistics2(2008), no. none, 993 – 1020
work page 2008
-
[22]
L. LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no. 1, 38 – 53
work page 1973
- [23]
-
[24]
Haoyang Liu, Chao Gao, and Richard J. Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no. 2, 1081 – 1112
work page 2021
-
[25]
Petrov,Limit theorems of probability theory
Valentin V. Petrov,Limit theorems of probability theory. Sequences of independent random variables, Oxf. Stud. Probab., vol. 4, Oxford: Clarendon Press, 1995
work page 1995
-
[26]
Alexandre B Tsybakov,Pointwise and sup-norm sharp adaptive estimation of func- tions on the sobolev classes, The Annals of Statistics26(1998), no. 6, 2420–2469
work page 1998
-
[27]
Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009
Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009
work page 2009
-
[28]
Minimax and adaptive estimation of general linear functionals under sparsity
Roman Vershynin,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018. 25 Supplement to “Minimax and adaptive estimation of general linear functionals under sparsity” Section A provides detailed proofs of all upper bound results in the paper. Section B provides the proof for the uniquene...
work page 2018
-
[29]
1 + 2 α 1/α#vuut dX j=1 η2 j exp(−β1/|ηj|α) ≤
such that ν:= ∆− X j∈U2 EZ2 j ≥∆/2≥ ζ 288 Φadp(s′;η) log(es′) 12 We apply Lemma 9 withp= 6 to obtain P( X j∈U2 Xj > ν)≤ 288 ζ ·(1 + 2 6)23 6p C ∗ 12 Φadp(s′;η) log(es′) −6 (s′)10(log(es′))−1 [Φadp(s′;η)] 6 + exp − 2( ζ 288 Φadp(s′;η) log(es′) )2 82e6 ·64 p C ∗ 4(s′)−2(log(es′))−1 [Φadp(s′;η)] 2 ≤Cζ −6(log(es′))5(s′)−10 + exp −C ′ζ2(s′)2/log(es ′) ,...
-
[30]
Whens≲d γd/2, both the logarithmic factors are equivalent to log(d) so the first term dominates
-
[31]
In this case, the two terms ared 2γλ+γd ands 2 log2/α(d)
Whend γd/2 ≪s≪d γλ+γd/2 (which is≪ √ dby assumption), the second logarithmic factor remains the same as log(d) but the first factor becomes≍ dγd/2 s α . In this case, the two terms ared 2γλ+γd ands 2 log2/α(d). This suggests that the first term dominates if and only ifs≲ d2γλ+γd log2/α(d). 28 D.3 Exponentially decaying loading vector Some of the calculati...
-
[32]
Supposeβ >1, by (55), we have X j<j0 exp(−β/|ηj|α)≍s 2. Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍s 2λ2 o ≍s 2 log1/α(1 +j α/2 0 /sα)
-
[33]
By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0
Supposeβ≤1, we haveλ o ≍log 1/2 + (j0/s2) and therefore,s 2λ2 o ≤j 0. By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍j 0 ≍s 2 log1/α(1 +j α/2 0 /sα) Combining the two cases, we have Φo(s;η)≍Φ o(s;1 j0). It then follows that Φadp(s;η)≍ Φadp(s;1 j0). 30
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.