Minimax and adaptive estimation of general linear functionals under sparsity

Dongming Huang; Jie Xie

arxiv: 2509.25595 · v3 · submitted 2025-09-29 · 🧮 math.ST · stat.TH

Minimax and adaptive estimation of general linear functionals under sparsity

Jie Xie , Dongming Huang This is my paper

Pith reviewed 2026-05-18 11:55 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords minimax estimationlinear functionalsparsityadaptive estimationhigh-dimensional statisticssub-exponential noiseLepski procedurechi-squared bound

0 comments

The pith

For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating a general linear functional of an s-sparse vector is explicit in sparsity, loadings, tail decay, and noise level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies nonasymptotic minimax estimation of the linear functional L(θ) = η^T θ where θ is an s-sparse vector in high dimensions and η is an arbitrary loading vector. For symmetric noise with exponentially decaying tails, it derives the sharp minimax rate expressed directly in terms of s, the components of η, the tail parameter, and the noise level. An estimator that switches between plug-in estimation on coordinates with large |η_i| and thresholding on the remaining coordinates attains the upper bound. A matching lower bound is constructed from a sparse prior that depends on the specific values in η. When sparsity is unknown, an η-dependent Lepski-type procedure recovers the oracle rate up to a logarithmic factor for a wide class of loading vectors.

Core claim

For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating L(θ) = η^T θ with s-sparse θ is explicit in s, η, the tail parameter, and the noise level. The rate is attained by a hybrid estimator that performs plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings. The matching lower bound follows from a loading-dependent sparse prior construction. For unknown sparsity an η-dependent Lepski-type adaptive estimator achieves the oracle rate up to the optimal logarithmic factor over a broad verifiable class of loading vectors.

What carries the argument

Hybrid estimator that applies plug-in estimation to coordinates with large |η_i| and thresholding to coordinates with small |η_i|, together with a loading-dependent sparse prior used to prove the lower bound.

If this is right

Heterogeneity across the entries of the loading vector η produces different minimax and adaptive rates, as illustrated by explicit examples.
The adaptive procedure matches the oracle minimax rate up to a logarithmic factor for a broad class of loading vectors when sparsity is unknown.
Asymmetry of the noise distribution can strictly increase the minimax rate for some choices of η.
The new χ² bound extends sharp lower-bound constructions to generalized Gaussian distributions beyond the Gaussian case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit dependence on η implies that in applications the choice or design of the loading vector can be used to reduce estimation risk.
The same hybrid construction may extend to estimating other sparse functionals such as quadratic forms or norms.
When noise is observed to be asymmetric in practice, the worst-case asymmetric lower-bound construction supplies a conservative risk benchmark.
The techniques could be tested on sparse linear regression problems where the parameter of interest is a linear functional of the regression vector.

Load-bearing premise

The additive noise is symmetric and has exponentially decaying tails.

What would settle it

Simulate the hybrid estimator on data generated from symmetric exponential-tailed noise, vary the sparsity s and the loading vector η across several patterns, and check whether the observed risk matches the explicit rate formula given in the paper.

read the original abstract

We study nonasymptotic minimax estimation of the linear functional $L(\theta)=\eta^\top \theta$ for a high-dimensional $s$-sparse mean vector with an arbitrary loading vector $\eta$. For symmetric noise with exponentially decaying tails, we derive the sharp minimax rate, explicit in $s$, $\eta$, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings, and the matching lower bound is obtained via a loading-dependent sparse prior. For unknown sparsity, we construct an $\eta$-dependent Lepski-type procedure and show that, for a broad verifiable class of loading vectors, its risk matches the oracle rate up to the optimal logarithmic factor. Explicit examples illustrate how heterogeneity in $\eta$ changes both the minimax and adaptive rates. We also extend the analysis to non-symmetric noise, hypothesis testing, and estimation with unknown noise variance, where we show that asymmetry can increase the minimax rate in certain examples of $\eta$. Among these results, the two main technical novelties are the following. First, we extend the sharp lower-bound theory beyond the Gaussian setting via a new $\chi^2$ bound for generalized Gaussian distributions. Second, for possibly non-symmetric noise, we derive new lower bounds through a worst-case asymmetric construction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives explicit non-asymptotic minimax rates for sparse linear functional estimation that depend on the loading vector, backed by a new chi-squared bound and a matching adaptive procedure.

read the letter

Colleague, the main thing to know is that this work derives sharp rates for estimating L(θ) = η^T θ when θ is s-sparse, with the rates written out explicitly in terms of s, the arbitrary η, the tail index, and noise level, for symmetric noise with exponential tails. They match an upper bound (plug-in on large loadings plus thresholding on small ones) to a lower bound built from a loading-dependent sparse prior, and they add an η-dependent Lepski-type adaptive estimator that hits the oracle rate up to logs on a verifiable class of loadings. The two technical pieces that stand out are the new χ² divergence bound for generalized Gaussians, which lets them move past standard Gaussian assumptions, and the worst-case asymmetric constructions that show how non-symmetry can raise the rate in some η examples. Explicit illustrations of how η heterogeneity changes both minimax and adaptive rates are also useful. The derivations appear to avoid circularity and rest on new but verifiable bounds rather than data-dependent fitting. Soft spots are limited. The adaptive guarantee holds only on a broad but not fully general class of η, which is typical yet restricts immediate use. The non-symmetric and unknown-variance extensions read as solid but secondary. Without the full proofs in front of me I cannot check every constant in the divergence calculations, though the stress-test finds no internal contradictions or load-bearing gaps. This is aimed at people working on high-dimensional sparse estimation and minimax functional estimation. A reader who cares about explicit rates under general noise and adaptive procedures will get concrete value from the constructions and examples. The work is grounded enough and the novelties are real enough that it deserves a serious referee.

Referee Report

2 major / 2 minor

Summary. The manuscript derives sharp non-asymptotic minimax rates for estimating the linear functional L(θ) = ηᵀθ where θ ∈ ℝᵖ is s-sparse and η is an arbitrary loading vector. For symmetric noise with exponentially decaying tails, an explicit upper bound is obtained by plug-in estimation on coordinates with large |ηᵢ| combined with thresholding on the remainder; this is matched by a lower bound constructed from a loading-dependent sparse prior, supported by a new χ²-divergence bound for generalized Gaussian distributions. For unknown sparsity an η-dependent Lepski-type procedure is shown to attain the oracle rate up to logarithmic factors on a verifiable class of loadings. The analysis is extended to non-symmetric noise (where asymmetry can strictly increase the rate for some η), hypothesis testing, and unknown noise variance, with explicit examples illustrating the effect of heterogeneity in η.

Significance. If the derivations hold, the work supplies the first sharp, fully explicit minimax rates for general linear functionals under sparsity that are non-asymptotic and valid beyond the Gaussian case. The new χ² bound for generalized Gaussians and the loading-dependent prior construction are technically substantive contributions that may be reusable. The adaptive result and the demonstration that asymmetry can worsen the rate in concrete examples add practical and conceptual value. The explicit dependence on s, η, tail index, and noise level, together with the verifiable class for adaptivity, strengthens the contribution.

major comments (2)

[§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.
[§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.

minor comments (2)

[§6] The definition of the “verifiable class” of η for which the adaptive procedure attains the oracle rate up to log factors should be stated more explicitly (e.g., as a concrete condition on the ordered loadings) so that readers can check membership without additional derivation.
[§2] Notation for the tail parameter and the generalized-Gaussian family is introduced in §2 but used without repeated reminder in later sections; a short table or boxed definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We address the two major comments point by point below.

read point-by-point responses

Referee: [§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.

Authors: We thank the referee for this observation. The χ² bound for generalized Gaussian distributions is established in Appendix A.3. To improve accessibility and enable immediate verification of the exponential-tail regime, we will insert a concise proof sketch together with explicit constant tracking into the main text of Section 4 in the revised manuscript. revision: yes
Referee: [§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.

Authors: The loading-dependent sparse prior is constructed specifically for the symmetric-noise setting. For non-symmetric noise we employ an independent worst-case asymmetric construction, as already indicated in the abstract and developed in Section 6. We will add a short clarifying sentence at the end of Section 5 that explicitly distinguishes the two constructions and cross-references the asymmetric argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central derivations for the sharp minimax rate of L(θ) = ηᵀθ under s-sparsity rely on explicit upper bounds via plug-in and thresholding estimators, matched to lower bounds from a loading-dependent sparse prior and a new χ² divergence bound for generalized Gaussian distributions. These constructions are presented as technical novelties in Sections 3–5 with explicit dependence on s, η, tail parameters, and noise level, without reducing to fitted inputs from the same data, self-definitional equations, or load-bearing self-citations. The adaptive Lepski-type procedure in Section 6 attains the oracle rate up to log factors on a verifiable class of η, and extensions to non-symmetric noise use worst-case asymmetric constructions. No steps match the enumerated circularity patterns; the argument is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumed noise tail and symmetry properties plus the sparsity structure of θ; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Noise is symmetric with exponentially decaying tails.
Invoked to obtain the sharp minimax rate and to extend the lower-bound theory via the new χ² bound.

pith-pipeline@v0.9.0 · 5763 in / 1238 out tokens · 48619 ms · 2026-05-18T11:55:36.501331+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study nonasymptotic minimax estimation of the linear functional L(θ)=η^⊤θ for a high-dimensional s-sparse mean vector... sharp minimax rate, explicit in s, η, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

2, 615–639

Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

work page 2022
[2]

Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no

T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no. 2, 615–646

work page 2017
[3]

4, 1807–1836

,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics46(2018), no. 4, 1807–1836

work page 2018
[4]

Tony Cai and Mark G

T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over non- convex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576. 23

work page 2004
[5]

5, 2311 – 2343

,On adaptive estimation of linear functionals, The Annals of Statistics33 (2005), no. 5, 2311 – 2343

work page 2005
[6]

4, 669–719

Tianxi Cai, T Tony Cai, and Zijian Guo,Optimal statistical inference for individu- alized treatment effects in high-dimensional models, Journal of the Royal Statistical Society Series B: Statistical Methodology83(2021), no. 4, 669–719

work page 2021
[7]

4, 2744–2787

Alexandra Carpentier, Olivier Collier, Laetitia Comminges, Alexandre B Tsybakov, and Yuhao Wang,Estimation of theℓ 2-norm and testing in sparse linear regression with unknown variance, Bernoulli28(2022), no. 4, 2744–2787

work page 2022
[8]

3, 2127–2153

Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in heteroscedastic Gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

work page 2024
[9]

Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no

Olivier Collier, La¨ etitia Comminges, and Alexandre B. Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no. 3, 923 – 958

work page 2017
[10]

6A, 3130–3150

Olivier Collier, La¨ etitia Comminges, Alexandre B Tsybakov, and Nicolas Verze- len,Optimal adaptive estimation of linear functionals under sparsity, The Annals of Statistics46(2018), no. 6A, 3130–3150

work page 2018
[11]

3, 1347–1377

L Comminges, O Collier, M Ndaoud, and AB Tsybakov,Adaptive robust estimation in sparse vector model, The Annals of Statistics49(2021), no. 3, 1347–1377

work page 2021
[12]

1, 41–67

David L Donoho, Iain M Johnstone, Jeffrey C Hoch, and Alan S Stern,Maximum entropy and the nearly black object, Journal of the Royal Statistical Society: Series B (Methodological)54(1992), no. 1, 41–67

work page 1992
[13]

1, 53–65

Georgii Ksenofontovich Golubev,The method of risk envelope in estimation of linear functionals, Problems of Information Transmission40(2004), no. 1, 53–65

work page 2004
[14]

4, 392–408

Yu Golubev and B Levit,An oracle approach to adaptive estimation of linear func- tionals in a Gaussian model, Mathematical Methods of Statistics13(2004), no. 4, 392–408

work page 2004
[15]

1, 18–32

Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparamet- ric estimation of the value of a linear functional in Gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

work page 1985
[16]

16, Springer Science & Business Media, 1981

Ildar Abdulovich Ibragimov and Rafail Zalmanovich Has’ Minskii,Statistical estima- tion: asymptotic theory, vol. 16, Springer Science & Business Media, 1981

work page 1981
[17]

I. M. Johnstone,Gaussian estimation: Sequence and wavelet models, Draft manuscript, available athttps://imjohnstone.su.domains/GE_08_09_17.pdf, 2017

work page 2017
[18]

4, 2873–2969

Subhodh Kotekal and Chao Gao,Minimax rates for sparse signal detection under correlation, Information and Inference: A Journal of the IMA12(2023), no. 4, 2873–2969. 24

work page 2023
[19]

3, 1095–1122

,Sparsity meets correlation in Gaussian sequence model, The Annals of Statis- tics53(2025), no. 3, 1095–1122

work page 2025
[20]

4, 1389–1456

Arun Kumar Kuchibhotla and Abhishek Chakrabortty,Moving beyond sub- Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression, Information and Inference: A Journal of the IMA11(2022), no. 4, 1389–1456

work page 2022
[21]

none, 993 – 1020

B´ eatrice Laurent, Carenne Lude˜ na, and Cl´ ementine Prieur,Adaptive estimation of linear functionals by model selection, Electronic Journal of Statistics2(2008), no. none, 993 – 1020

work page 2008
[22]

LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no

L. LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no. 1, 38 – 53

work page 1973
[23]

Mengchu Li, Yudong Chen, Tengyao Wang, and Yi Yu,Robust mean change point testing in high-dimensional data with heavy tails, arXiv preprint arXiv:2305.18987 (2023)

work page arXiv 2023
[24]

Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no

Haoyang Liu, Chao Gao, and Richard J. Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no. 2, 1081 – 1112

work page 2021
[25]

Petrov,Limit theorems of probability theory

Valentin V. Petrov,Limit theorems of probability theory. Sequences of independent random variables, Oxf. Stud. Probab., vol. 4, Oxford: Clarendon Press, 1995

work page 1995
[26]

6, 2420–2469

Alexandre B Tsybakov,Pointwise and sup-norm sharp adaptive estimation of func- tions on the sobolev classes, The Annals of Statistics26(1998), no. 6, 2420–2469

work page 1998
[27]

Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

work page 2009
[28]

Minimax and adaptive estimation of general linear functionals under sparsity

Roman Vershynin,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018. 25 Supplement to “Minimax and adaptive estimation of general linear functionals under sparsity” Section A provides detailed proofs of all upper bound results in the paper. Section B provides the proof for the uniquene...

work page 2018
[29]

1 + 2 α 1/α#vuut dX j=1 η2 j exp(−β1/|ηj|α) ≤

such that ν:= ∆− X j∈U2 EZ2 j ≥∆/2≥ ζ 288 Φadp(s′;η) log(es′) 12 We apply Lemma 9 withp= 6 to obtain P( X j∈U2 Xj > ν)≤ 288 ζ ·(1 + 2 6)23 6p C ∗ 12 Φadp(s′;η) log(es′) −6 (s′)10(log(es′))−1 [Φadp(s′;η)] 6 + exp  − 2( ζ 288 Φadp(s′;η) log(es′) )2 82e6 ·64 p C ∗ 4(s′)−2(log(es′))−1 [Φadp(s′;η)] 2   ≤Cζ −6(log(es′))5(s′)−10 + exp −C ′ζ2(s′)2/log(es ′) ,...

work page
[30]

Whens≲d γd/2, both the logarithmic factors are equivalent to log(d) so the first term dominates

work page
[31]

In this case, the two terms ared 2γλ+γd ands 2 log2/α(d)

Whend γd/2 ≪s≪d γλ+γd/2 (which is≪ √ dby assumption), the second logarithmic factor remains the same as log(d) but the first factor becomes≍ dγd/2 s α . In this case, the two terms ared 2γλ+γd ands 2 log2/α(d). This suggests that the first term dominates if and only ifs≲ d2γλ+γd log2/α(d). 28 D.3 Exponentially decaying loading vector Some of the calculati...

work page
[32]

Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2

Supposeβ >1, by (55), we have X j<j0 exp(−β/|ηj|α)≍s 2. Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍s 2λ2 o ≍s 2 log1/α(1 +j α/2 0 /sα)

work page
[33]

By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0

Supposeβ≤1, we haveλ o ≍log 1/2 + (j0/s2) and therefore,s 2λ2 o ≤j 0. By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍j 0 ≍s 2 log1/α(1 +j α/2 0 /sα) Combining the two cases, we have Φo(s;η)≍Φ o(s;1 j0). It then follows that Φadp(s;η)≍ Φadp(s;1 j0). 30

work page

[1] [1]

2, 615–639

Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

work page 2022

[2] [2]

Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no

T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no. 2, 615–646

work page 2017

[3] [3]

4, 1807–1836

,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics46(2018), no. 4, 1807–1836

work page 2018

[4] [4]

Tony Cai and Mark G

T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over non- convex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576. 23

work page 2004

[5] [5]

5, 2311 – 2343

,On adaptive estimation of linear functionals, The Annals of Statistics33 (2005), no. 5, 2311 – 2343

work page 2005

[6] [6]

4, 669–719

Tianxi Cai, T Tony Cai, and Zijian Guo,Optimal statistical inference for individu- alized treatment effects in high-dimensional models, Journal of the Royal Statistical Society Series B: Statistical Methodology83(2021), no. 4, 669–719

work page 2021

[7] [7]

4, 2744–2787

Alexandra Carpentier, Olivier Collier, Laetitia Comminges, Alexandre B Tsybakov, and Yuhao Wang,Estimation of theℓ 2-norm and testing in sparse linear regression with unknown variance, Bernoulli28(2022), no. 4, 2744–2787

work page 2022

[8] [8]

3, 2127–2153

Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in heteroscedastic Gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

work page 2024

[9] [9]

Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no

Olivier Collier, La¨ etitia Comminges, and Alexandre B. Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no. 3, 923 – 958

work page 2017

[10] [10]

6A, 3130–3150

Olivier Collier, La¨ etitia Comminges, Alexandre B Tsybakov, and Nicolas Verze- len,Optimal adaptive estimation of linear functionals under sparsity, The Annals of Statistics46(2018), no. 6A, 3130–3150

work page 2018

[11] [11]

3, 1347–1377

L Comminges, O Collier, M Ndaoud, and AB Tsybakov,Adaptive robust estimation in sparse vector model, The Annals of Statistics49(2021), no. 3, 1347–1377

work page 2021

[12] [12]

1, 41–67

David L Donoho, Iain M Johnstone, Jeffrey C Hoch, and Alan S Stern,Maximum entropy and the nearly black object, Journal of the Royal Statistical Society: Series B (Methodological)54(1992), no. 1, 41–67

work page 1992

[13] [13]

1, 53–65

Georgii Ksenofontovich Golubev,The method of risk envelope in estimation of linear functionals, Problems of Information Transmission40(2004), no. 1, 53–65

work page 2004

[14] [14]

4, 392–408

Yu Golubev and B Levit,An oracle approach to adaptive estimation of linear func- tionals in a Gaussian model, Mathematical Methods of Statistics13(2004), no. 4, 392–408

work page 2004

[15] [15]

1, 18–32

Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparamet- ric estimation of the value of a linear functional in Gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

work page 1985

[16] [16]

16, Springer Science & Business Media, 1981

Ildar Abdulovich Ibragimov and Rafail Zalmanovich Has’ Minskii,Statistical estima- tion: asymptotic theory, vol. 16, Springer Science & Business Media, 1981

work page 1981

[17] [17]

I. M. Johnstone,Gaussian estimation: Sequence and wavelet models, Draft manuscript, available athttps://imjohnstone.su.domains/GE_08_09_17.pdf, 2017

work page 2017

[18] [18]

4, 2873–2969

Subhodh Kotekal and Chao Gao,Minimax rates for sparse signal detection under correlation, Information and Inference: A Journal of the IMA12(2023), no. 4, 2873–2969. 24

work page 2023

[19] [19]

3, 1095–1122

,Sparsity meets correlation in Gaussian sequence model, The Annals of Statis- tics53(2025), no. 3, 1095–1122

work page 2025

[20] [20]

4, 1389–1456

Arun Kumar Kuchibhotla and Abhishek Chakrabortty,Moving beyond sub- Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression, Information and Inference: A Journal of the IMA11(2022), no. 4, 1389–1456

work page 2022

[21] [21]

none, 993 – 1020

B´ eatrice Laurent, Carenne Lude˜ na, and Cl´ ementine Prieur,Adaptive estimation of linear functionals by model selection, Electronic Journal of Statistics2(2008), no. none, 993 – 1020

work page 2008

[22] [22]

LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no

L. LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no. 1, 38 – 53

work page 1973

[23] [23]

Mengchu Li, Yudong Chen, Tengyao Wang, and Yi Yu,Robust mean change point testing in high-dimensional data with heavy tails, arXiv preprint arXiv:2305.18987 (2023)

work page arXiv 2023

[24] [24]

Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no

Haoyang Liu, Chao Gao, and Richard J. Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no. 2, 1081 – 1112

work page 2021

[25] [25]

Petrov,Limit theorems of probability theory

Valentin V. Petrov,Limit theorems of probability theory. Sequences of independent random variables, Oxf. Stud. Probab., vol. 4, Oxford: Clarendon Press, 1995

work page 1995

[26] [26]

6, 2420–2469

Alexandre B Tsybakov,Pointwise and sup-norm sharp adaptive estimation of func- tions on the sobolev classes, The Annals of Statistics26(1998), no. 6, 2420–2469

work page 1998

[27] [27]

Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

work page 2009

[28] [28]

Minimax and adaptive estimation of general linear functionals under sparsity

Roman Vershynin,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018. 25 Supplement to “Minimax and adaptive estimation of general linear functionals under sparsity” Section A provides detailed proofs of all upper bound results in the paper. Section B provides the proof for the uniquene...

work page 2018

[29] [29]

1 + 2 α 1/α#vuut dX j=1 η2 j exp(−β1/|ηj|α) ≤

such that ν:= ∆− X j∈U2 EZ2 j ≥∆/2≥ ζ 288 Φadp(s′;η) log(es′) 12 We apply Lemma 9 withp= 6 to obtain P( X j∈U2 Xj > ν)≤ 288 ζ ·(1 + 2 6)23 6p C ∗ 12 Φadp(s′;η) log(es′) −6 (s′)10(log(es′))−1 [Φadp(s′;η)] 6 + exp  − 2( ζ 288 Φadp(s′;η) log(es′) )2 82e6 ·64 p C ∗ 4(s′)−2(log(es′))−1 [Φadp(s′;η)] 2   ≤Cζ −6(log(es′))5(s′)−10 + exp −C ′ζ2(s′)2/log(es ′) ,...

work page

[30] [30]

Whens≲d γd/2, both the logarithmic factors are equivalent to log(d) so the first term dominates

work page

[31] [31]

In this case, the two terms ared 2γλ+γd ands 2 log2/α(d)

Whend γd/2 ≪s≪d γλ+γd/2 (which is≪ √ dby assumption), the second logarithmic factor remains the same as log(d) but the first factor becomes≍ dγd/2 s α . In this case, the two terms ared 2γλ+γd ands 2 log2/α(d). This suggests that the first term dominates if and only ifs≲ d2γλ+γd log2/α(d). 28 D.3 Exponentially decaying loading vector Some of the calculati...

work page

[32] [32]

Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2

Supposeβ >1, by (55), we have X j<j0 exp(−β/|ηj|α)≍s 2. Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍s 2λ2 o ≍s 2 log1/α(1 +j α/2 0 /sα)

work page

[33] [33]

By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0

Supposeβ≤1, we haveλ o ≍log 1/2 + (j0/s2) and therefore,s 2λ2 o ≤j 0. By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍j 0 ≍s 2 log1/α(1 +j α/2 0 /sα) Combining the two cases, we have Φo(s;η)≍Φ o(s;1 j0). It then follows that Φadp(s;η)≍ Φadp(s;1 j0). 30

work page