pith. sign in

arxiv: 2509.25595 · v3 · submitted 2025-09-29 · 🧮 math.ST · stat.TH

Minimax and adaptive estimation of general linear functionals under sparsity

Pith reviewed 2026-05-18 11:55 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords minimax estimationlinear functionalsparsityadaptive estimationhigh-dimensional statisticssub-exponential noiseLepski procedurechi-squared bound
0
0 comments X

The pith

For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating a general linear functional of an s-sparse vector is explicit in sparsity, loadings, tail decay, and noise level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies nonasymptotic minimax estimation of the linear functional L(θ) = η^T θ where θ is an s-sparse vector in high dimensions and η is an arbitrary loading vector. For symmetric noise with exponentially decaying tails, it derives the sharp minimax rate expressed directly in terms of s, the components of η, the tail parameter, and the noise level. An estimator that switches between plug-in estimation on coordinates with large |η_i| and thresholding on the remaining coordinates attains the upper bound. A matching lower bound is constructed from a sparse prior that depends on the specific values in η. When sparsity is unknown, an η-dependent Lepski-type procedure recovers the oracle rate up to a logarithmic factor for a wide class of loading vectors.

Core claim

For symmetric noise with exponentially decaying tails, the sharp minimax rate for estimating L(θ) = η^T θ with s-sparse θ is explicit in s, η, the tail parameter, and the noise level. The rate is attained by a hybrid estimator that performs plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings. The matching lower bound follows from a loading-dependent sparse prior construction. For unknown sparsity an η-dependent Lepski-type adaptive estimator achieves the oracle rate up to the optimal logarithmic factor over a broad verifiable class of loading vectors.

What carries the argument

Hybrid estimator that applies plug-in estimation to coordinates with large |η_i| and thresholding to coordinates with small |η_i|, together with a loading-dependent sparse prior used to prove the lower bound.

If this is right

  • Heterogeneity across the entries of the loading vector η produces different minimax and adaptive rates, as illustrated by explicit examples.
  • The adaptive procedure matches the oracle minimax rate up to a logarithmic factor for a broad class of loading vectors when sparsity is unknown.
  • Asymmetry of the noise distribution can strictly increase the minimax rate for some choices of η.
  • The new χ² bound extends sharp lower-bound constructions to generalized Gaussian distributions beyond the Gaussian case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit dependence on η implies that in applications the choice or design of the loading vector can be used to reduce estimation risk.
  • The same hybrid construction may extend to estimating other sparse functionals such as quadratic forms or norms.
  • When noise is observed to be asymmetric in practice, the worst-case asymmetric lower-bound construction supplies a conservative risk benchmark.
  • The techniques could be tested on sparse linear regression problems where the parameter of interest is a linear functional of the regression vector.

Load-bearing premise

The additive noise is symmetric and has exponentially decaying tails.

What would settle it

Simulate the hybrid estimator on data generated from symmetric exponential-tailed noise, vary the sparsity s and the loading vector η across several patterns, and check whether the observed risk matches the explicit rate formula given in the paper.

read the original abstract

We study nonasymptotic minimax estimation of the linear functional $L(\theta)=\eta^\top \theta$ for a high-dimensional $s$-sparse mean vector with an arbitrary loading vector $\eta$. For symmetric noise with exponentially decaying tails, we derive the sharp minimax rate, explicit in $s$, $\eta$, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings, and the matching lower bound is obtained via a loading-dependent sparse prior. For unknown sparsity, we construct an $\eta$-dependent Lepski-type procedure and show that, for a broad verifiable class of loading vectors, its risk matches the oracle rate up to the optimal logarithmic factor. Explicit examples illustrate how heterogeneity in $\eta$ changes both the minimax and adaptive rates. We also extend the analysis to non-symmetric noise, hypothesis testing, and estimation with unknown noise variance, where we show that asymmetry can increase the minimax rate in certain examples of $\eta$. Among these results, the two main technical novelties are the following. First, we extend the sharp lower-bound theory beyond the Gaussian setting via a new $\chi^2$ bound for generalized Gaussian distributions. Second, for possibly non-symmetric noise, we derive new lower bounds through a worst-case asymmetric construction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript derives sharp non-asymptotic minimax rates for estimating the linear functional L(θ) = ηᵀθ where θ ∈ ℝᵖ is s-sparse and η is an arbitrary loading vector. For symmetric noise with exponentially decaying tails, an explicit upper bound is obtained by plug-in estimation on coordinates with large |ηᵢ| combined with thresholding on the remainder; this is matched by a lower bound constructed from a loading-dependent sparse prior, supported by a new χ²-divergence bound for generalized Gaussian distributions. For unknown sparsity an η-dependent Lepski-type procedure is shown to attain the oracle rate up to logarithmic factors on a verifiable class of loadings. The analysis is extended to non-symmetric noise (where asymmetry can strictly increase the rate for some η), hypothesis testing, and unknown noise variance, with explicit examples illustrating the effect of heterogeneity in η.

Significance. If the derivations hold, the work supplies the first sharp, fully explicit minimax rates for general linear functionals under sparsity that are non-asymptotic and valid beyond the Gaussian case. The new χ² bound for generalized Gaussians and the loading-dependent prior construction are technically substantive contributions that may be reusable. The adaptive result and the demonstration that asymmetry can worsen the rate in concrete examples add practical and conceptual value. The explicit dependence on s, η, tail index, and noise level, together with the verifiable class for adaptivity, strengthens the contribution.

major comments (2)
  1. [§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.
  2. [§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.
minor comments (2)
  1. [§6] The definition of the “verifiable class” of η for which the adaptive procedure attains the oracle rate up to log factors should be stated more explicitly (e.g., as a concrete condition on the ordered loadings) so that readers can check membership without additional derivation.
  2. [§2] Notation for the tail parameter and the generalized-Gaussian family is introduced in §2 but used without repeated reminder in later sections; a short table or boxed definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [§4] §4 (new χ² bound for generalized Gaussians): the bound is invoked to close the lower-bound argument for the symmetric exponential-tail case; it is load-bearing for the claimed sharpness, yet the manuscript does not appear to supply a self-contained proof or explicit constant tracking that would allow immediate verification of the exponential-tail regime.

    Authors: We thank the referee for this observation. The χ² bound for generalized Gaussian distributions is established in Appendix A.3. To improve accessibility and enable immediate verification of the exponential-tail regime, we will insert a concise proof sketch together with explicit constant tracking into the main text of Section 4 in the revised manuscript. revision: yes

  2. Referee: [§5] §5 (loading-dependent sparse prior): the prior is constructed to depend on the specific η and s; while this yields the matching lower bound, the construction appears to rely on symmetry in an essential way, and it is unclear whether the same technique directly yields the stated non-symmetric lower bounds or whether a separate worst-case construction is required.

    Authors: The loading-dependent sparse prior is constructed specifically for the symmetric-noise setting. For non-symmetric noise we employ an independent worst-case asymmetric construction, as already indicated in the abstract and developed in Section 6. We will add a short clarifying sentence at the end of Section 5 that explicitly distinguishes the two constructions and cross-references the asymmetric argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central derivations for the sharp minimax rate of L(θ) = ηᵀθ under s-sparsity rely on explicit upper bounds via plug-in and thresholding estimators, matched to lower bounds from a loading-dependent sparse prior and a new χ² divergence bound for generalized Gaussian distributions. These constructions are presented as technical novelties in Sections 3–5 with explicit dependence on s, η, tail parameters, and noise level, without reducing to fitted inputs from the same data, self-definitional equations, or load-bearing self-citations. The adaptive Lepski-type procedure in Section 6 attains the oracle rate up to log factors on a verifiable class of η, and extensions to non-symmetric noise use worst-case asymmetric constructions. No steps match the enumerated circularity patterns; the argument is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumed noise tail and symmetry properties plus the sparsity structure of θ; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Noise is symmetric with exponentially decaying tails.
    Invoked to obtain the sharp minimax rate and to extend the lower-bound theory via the new χ² bound.

pith-pipeline@v0.9.0 · 5763 in / 1238 out tokens · 48619 ms · 2026-05-18T11:55:36.501331+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We study nonasymptotic minimax estimation of the linear functional L(θ)=η^⊤θ for a high-dimensional s-sparse mean vector... sharp minimax rate, explicit in s, η, the tail parameter, and the noise level. The proposed estimator combines plug-in estimation for coordinates with large loadings and thresholding for coordinates with small loadings...

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    2, 615–639

    Jelena Bradic, Jianqing Fan, and Yinchu Zhu,Testability of high-dimensional linear models with nonsparse structures, The Annals of Statistics50(2022), no. 2, 615–639

  2. [2]

    Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no

    T. Tony Cai and Zijian Guo,Confidence intervals for high-dimensional linear regres- sion: Minimax rates and adaptivity, Annals of Statistics45(2017), no. 2, 615–646

  3. [3]

    4, 1807–1836

    ,Accuracy assessment for high-dimensional linear regression, The Annals of Statistics46(2018), no. 4, 1807–1836

  4. [4]

    Tony Cai and Mark G

    T. Tony Cai and Mark G. Low,Minimax estimation of linear functionals over non- convex parameter spaces, The Annals of Statistics32(2004), no. 2, 552 – 576. 23

  5. [5]

    5, 2311 – 2343

    ,On adaptive estimation of linear functionals, The Annals of Statistics33 (2005), no. 5, 2311 – 2343

  6. [6]

    4, 669–719

    Tianxi Cai, T Tony Cai, and Zijian Guo,Optimal statistical inference for individu- alized treatment effects in high-dimensional models, Journal of the Royal Statistical Society Series B: Statistical Methodology83(2021), no. 4, 669–719

  7. [7]

    4, 2744–2787

    Alexandra Carpentier, Olivier Collier, Laetitia Comminges, Alexandre B Tsybakov, and Yuhao Wang,Estimation of theℓ 2-norm and testing in sparse linear regression with unknown variance, Bernoulli28(2022), no. 4, 2744–2787

  8. [8]

    3, 2127–2153

    Julien Chhor, Rajarshi Mukherjee, and Subhabrata Sen,Sparse signal detection in heteroscedastic Gaussian sequence models: sharp minimax rates, Bernoulli30(2024), no. 3, 2127–2153

  9. [9]

    Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no

    Olivier Collier, La¨ etitia Comminges, and Alexandre B. Tsybakov,Minimax estima- tion of linear and quadratic functionals on sparsity classes, The Annals of Statistics 45(2017), no. 3, 923 – 958

  10. [10]

    6A, 3130–3150

    Olivier Collier, La¨ etitia Comminges, Alexandre B Tsybakov, and Nicolas Verze- len,Optimal adaptive estimation of linear functionals under sparsity, The Annals of Statistics46(2018), no. 6A, 3130–3150

  11. [11]

    3, 1347–1377

    L Comminges, O Collier, M Ndaoud, and AB Tsybakov,Adaptive robust estimation in sparse vector model, The Annals of Statistics49(2021), no. 3, 1347–1377

  12. [12]

    1, 41–67

    David L Donoho, Iain M Johnstone, Jeffrey C Hoch, and Alan S Stern,Maximum entropy and the nearly black object, Journal of the Royal Statistical Society: Series B (Methodological)54(1992), no. 1, 41–67

  13. [13]

    1, 53–65

    Georgii Ksenofontovich Golubev,The method of risk envelope in estimation of linear functionals, Problems of Information Transmission40(2004), no. 1, 53–65

  14. [14]

    4, 392–408

    Yu Golubev and B Levit,An oracle approach to adaptive estimation of linear func- tionals in a Gaussian model, Mathematical Methods of Statistics13(2004), no. 4, 392–408

  15. [15]

    1, 18–32

    Ildar Abdullovich Ibragimov and Rafail Zalmanovich Khas’ minskii,On nonparamet- ric estimation of the value of a linear functional in Gaussian white noise, Theory of Probability & Its Applications29(1985), no. 1, 18–32

  16. [16]

    16, Springer Science & Business Media, 1981

    Ildar Abdulovich Ibragimov and Rafail Zalmanovich Has’ Minskii,Statistical estima- tion: asymptotic theory, vol. 16, Springer Science & Business Media, 1981

  17. [17]

    I. M. Johnstone,Gaussian estimation: Sequence and wavelet models, Draft manuscript, available athttps://imjohnstone.su.domains/GE_08_09_17.pdf, 2017

  18. [18]

    4, 2873–2969

    Subhodh Kotekal and Chao Gao,Minimax rates for sparse signal detection under correlation, Information and Inference: A Journal of the IMA12(2023), no. 4, 2873–2969. 24

  19. [19]

    3, 1095–1122

    ,Sparsity meets correlation in Gaussian sequence model, The Annals of Statis- tics53(2025), no. 3, 1095–1122

  20. [20]

    4, 1389–1456

    Arun Kumar Kuchibhotla and Abhishek Chakrabortty,Moving beyond sub- Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression, Information and Inference: A Journal of the IMA11(2022), no. 4, 1389–1456

  21. [21]

    none, 993 – 1020

    B´ eatrice Laurent, Carenne Lude˜ na, and Cl´ ementine Prieur,Adaptive estimation of linear functionals by model selection, Electronic Journal of Statistics2(2008), no. none, 993 – 1020

  22. [22]

    LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no

    L. LeCam,Convergence of estimates under dimensionality restrictions, The Annals of Statistics1(1973), no. 1, 38 – 53

  23. [23]

    Mengchu Li, Yudong Chen, Tengyao Wang, and Yi Yu,Robust mean change point testing in high-dimensional data with heavy tails, arXiv preprint arXiv:2305.18987 (2023)

  24. [24]

    Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no

    Haoyang Liu, Chao Gao, and Richard J. Samworth,Minimax rates in sparse, high- dimensional change point detection, The Annals of Statistics49(2021), no. 2, 1081 – 1112

  25. [25]

    Petrov,Limit theorems of probability theory

    Valentin V. Petrov,Limit theorems of probability theory. Sequences of independent random variables, Oxf. Stud. Probab., vol. 4, Oxford: Clarendon Press, 1995

  26. [26]

    6, 2420–2469

    Alexandre B Tsybakov,Pointwise and sup-norm sharp adaptive estimation of func- tions on the sobolev classes, The Annals of Statistics26(1998), no. 6, 2420–2469

  27. [27]

    Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

    Alexandre B. Tsybakov,Introduction to nonparametric estimation, Springer Series in Statistics, Springer, 2009

  28. [28]

    Minimax and adaptive estimation of general linear functionals under sparsity

    Roman Vershynin,High-dimensional probability: An introduction with applications in data science, vol. 47, Cambridge university press, 2018. 25 Supplement to “Minimax and adaptive estimation of general linear functionals under sparsity” Section A provides detailed proofs of all upper bound results in the paper. Section B provides the proof for the uniquene...

  29. [29]

    1 + 2 α 1/α#vuut dX j=1 η2 j exp(−β1/|ηj|α) ≤

    such that ν:= ∆− X j∈U2 EZ2 j ≥∆/2≥ ζ 288 Φadp(s′;η) log(es′) 12 We apply Lemma 9 withp= 6 to obtain P( X j∈U2 Xj > ν)≤ 288 ζ ·(1 + 2 6)23 6p C ∗ 12 Φadp(s′;η) log(es′) −6 (s′)10(log(es′))−1 [Φadp(s′;η)] 6 + exp  − 2( ζ 288 Φadp(s′;η) log(es′) )2 82e6 ·64 p C ∗ 4(s′)−2(log(es′))−1 [Φadp(s′;η)] 2   ≤Cζ −6(log(es′))5(s′)−10 + exp −C ′ζ2(s′)2/log(es ′) ,...

  30. [30]

    Whens≲d γd/2, both the logarithmic factors are equivalent to log(d) so the first term dominates

  31. [31]

    In this case, the two terms ared 2γλ+γd ands 2 log2/α(d)

    Whend γd/2 ≪s≪d γλ+γd/2 (which is≪ √ dby assumption), the second logarithmic factor remains the same as log(d) but the first factor becomes≍ dγd/2 s α . In this case, the two terms ared 2γλ+γd ands 2 log2/α(d). This suggests that the first term dominates if and only ifs≲ d2γλ+γd log2/α(d). 28 D.3 Exponentially decaying loading vector Some of the calculati...

  32. [32]

    Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2

    Supposeβ >1, by (55), we have X j<j0 exp(−β/|ηj|α)≍s 2. Sinceη j ∈(1/2,1) for anyj < j 0, we haveλ o =β 1/α ≍log 1/α(1 +j α/2 0 /sα).By Equation (54) withγ= 2, we have ν2(s)≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍s 2. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍s 2λ2 o ≍s 2 log1/α(1 +j α/2 0 /sα)

  33. [33]

    By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0

    Supposeβ≤1, we haveλ o ≍log 1/2 + (j0/s2) and therefore,s 2λ2 o ≤j 0. By Equa- tion (54) withγ= 2, we have ν2 ≍ X j<j0 η2 j exp(−β/|ηj|α)≍ X j<j0 exp(−β/|ηj|α)≍j 0. Therefore, we have Φo(s;η)≍s 2λ2 o +ν 2 ≍j 0 ≍s 2 log1/α(1 +j α/2 0 /sα) Combining the two cases, we have Φo(s;η)≍Φ o(s;1 j0). It then follows that Φadp(s;η)≍ Φadp(s;1 j0). 30