Bayesian High-dimensional Grouped-regression using Sparse Projection-posterior
Pith reviewed 2026-05-25 08:26 UTC · model grok-4.3
The pith
Sparse projection maps convert dense Bayesian samples into sparse posteriors that achieve optimal contraction rates and consistent group selection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By immersing dense posterior draws into a lower-dimensional sparse space through penalty-based projection maps, the induced posteriors attain the minimax optimal contraction rates for estimation and prediction while also being consistent for selecting the correct groups; the debiased Group LASSO map further ensures that credible sets have exact coverage.
What carries the argument
Sparsity-inducing projection maps (Group LASSO Projection Posterior, Group SCAD Projection Posterior, Adaptive Group LASSO Projection Posterior) that embed dense samples into structured sparse parameter spaces.
Load-bearing premise
The projection maps map dense posterior samples into the sparse space without distorting the contraction rates or selection properties that the proofs rely on.
What would settle it
Empirical posterior contraction rates slower than the minimax rate, or failure of the procedure to select the true groups with probability approaching one, would falsify the optimality and consistency claims.
Figures
read the original abstract
We present a novel Bayesian approach for high-dimensional grouped regression under sparsity. We leverage a sparse projection method that uses a sparsity-inducing map to derive an induced posterior on a lower-dimensional parameter space. Our method introduces three distinct projection maps based on popular penalty functions: the Group LASSO Projection Posterior, Group SCAD Projection Posterior, and Adaptive Group LASSO Projection Posterior. Each projection map is constructed to immerse dense posterior samples into a structured, sparse space, allowing for effective group selection and estimation in high-dimensional settings. We derive optimal posterior contraction rates for estimation and prediction, proving that the methods are model selection consistent. Additionally, we propose a Debiased Group LASSO Projection Map, which ensures exact coverage of credible sets. Our methodology is particularly suited for applications in nonparametric additive models, where we apply it with B-spline expansions to capture complex relationships between covariates and response. Extensive simulations validate our theoretical findings, demonstrating the robustness of our approach across different settings. Finally, we illustrate the practical utility of our method with an application to brain MRI volume data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), where our model identifies key brain regions associated with Alzheimer's progression.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Bayesian framework for high-dimensional grouped regression that applies three sparsity-inducing projection maps (Group LASSO, SCAD, and Adaptive Group LASSO) to dense posterior samples, yielding induced posteriors on a lower-dimensional sparse parameter space. It claims to establish optimal posterior contraction rates for estimation and prediction, prove model selection consistency, and introduce a debiased Group LASSO projection map that achieves exact frequentist coverage for credible sets. The approach is extended to nonparametric additive models via B-spline bases, supported by simulation studies and an application to ADNI brain MRI data for identifying regions linked to Alzheimer's progression.
Significance. If the projection maps are shown to control the distance to the original posterior at a rate faster than the claimed contraction rate, the work would supply a computationally tractable route to Bayesian sparse grouped inference with theoretical guarantees, which is relevant for applications such as neuroimaging where grouped predictors arise naturally.
major comments (2)
- [Theoretical results on contraction rates and selection consistency] The central theoretical claims (optimal contraction rates and model selection consistency) rest on the three projection maps immersing the dense posterior into the sparse subspace while keeping total variation or Hellinger distance o(ε_n), where ε_n is the target contraction rate. The manuscript must supply explicit bounds on this projection error (including dependence on dimension p, group size, and sparsity level) in the section containing the main theorems; without such control the optimality statements do not follow.
- [Debiased Group LASSO Projection Map and credible-set coverage] For the Debiased Group LASSO Projection Map, the argument for exact credible-set coverage requires that the debiasing step exactly cancels the bias induced by the projection while preserving the posterior contraction; the manuscript should state the precise condition on the projection map under which this cancellation holds and verify it does not inflate the remainder term beyond the claimed rate.
minor comments (2)
- Notation for the three projection maps (e.g., how the penalty functions are applied to the posterior samples) should be introduced with a single consistent definition early in the methods section rather than scattered across subsections.
- Simulation tables would benefit from reporting both estimation error and selection metrics (e.g., false-positive group rate) side-by-side for all three maps to allow direct comparison of their finite-sample behavior.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the two major comments below.
read point-by-point responses
-
Referee: [Theoretical results on contraction rates and selection consistency] The central theoretical claims (optimal contraction rates and model selection consistency) rest on the three projection maps immersing the dense posterior into the sparse subspace while keeping total variation or Hellinger distance o(ε_n), where ε_n is the target contraction rate. The manuscript must supply explicit bounds on this projection error (including dependence on dimension p, group size, and sparsity level) in the section containing the main theorems; without such control the optimality statements do not follow.
Authors: We agree that the main theorems section should contain explicit bounds on the projection error to make the optimality claims fully transparent. The supplementary proofs already establish that each of the three maps (Group LASSO, SCAD, Adaptive Group LASSO) produces a total-variation distance of order o(ε_n) under the stated assumptions on group size and sparsity; we will move the key bound (with its explicit dependence on p, group dimension, and sparsity level s) into the statement of Theorem 3.1 and the surrounding discussion in the revised manuscript. revision: yes
-
Referee: [Debiased Group LASSO Projection Map and credible-set coverage] For the Debiased Group LASSO Projection Map, the argument for exact credible-set coverage requires that the debiasing step exactly cancels the bias induced by the projection while preserving the posterior contraction; the manuscript should state the precise condition on the projection map under which this cancellation holds and verify it does not inflate the remainder term beyond the claimed rate.
Authors: We will add an explicit statement of the required condition on the projection map (a uniform Lipschitz bound with respect to the group-norm that is satisfied by the debiased Group LASSO map under our eigenvalue and group-size assumptions) immediately before the coverage result. The proof already shows that the additional remainder introduced by debiasing is absorbed into the o(ε_n) term; we will include a short verification paragraph in the main text confirming that the contraction rate is unaffected. revision: yes
Circularity Check
No circularity: rates derived for induced posteriors from explicitly constructed maps
full rationale
The paper defines three sparsity-inducing projection maps from standard penalties (Group LASSO, SCAD, Adaptive Group LASSO) and a debiased variant, then claims to derive contraction rates and selection consistency for the resulting induced posteriors. No equation or step reduces a claimed rate or consistency statement to a fitted quantity or self-citation by construction; the maps are presented as immersing the original posterior into a sparse space, with the theoretical results asserted to follow from that construction under standard Bayesian nonparametric arguments. The provided abstract and skeptic summary contain no load-bearing self-citation chain or self-definitional loop, so the derivation chain is treated as self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,
Michael Komodromos, Marina Evangelou, Sarah Filippi, and Kolyan Ray. Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,
-
[2]
Samhita Pal and Subhashis Ghoshal. Bayesian high-dimensional linear regression with sparse projection-posterior.arXiv preprint arXiv:2410.16577,
-
[3]
Bayesian grouped horseshoe regression with application to additive models
Zemei Xu, Daniel F Schmidt, Enes Makalic, Guoqi Qian, and John L Hopper. Bayesian grouped horseshoe regression with application to additive models. InAI 2016: Advances in Artificial Intelligence: 29th Australasian Joint Conference, Hobart, TAS, Australia, December 5-8, 2016, Proceedings 29, pages 229–240. Springer,
work page 2016
-
[4]
As the immersion posterior forσ 2 described in Section 2.2 is consistent, as in Pal and Ghoshal [2024], it suffices to prove the results conditionally on givenσ ∗, uniformly inσ ∗ in a shrinking neighborhood of the trueσ 0, say,U n, such that Π(σ ∗ ∈ U n|Y)→1 in probability 7 asn→ ∞. Since group sparsity is a special case of the general sparse high-dimens...
work page 2024
-
[5]
The second term √n∥∆DGL∥has been proven to beo(1) in 14 Proposition 3 of Honda [2021]
+∆ DGL∥ ≤n−1/2∥Y−X ˆβR∥ · ∥ ˆΘX T∥op + √n∥∆DGL∥, which iso P (1) because the first term is a product of two quantities,n −1/2∥Y−X ˆβR∥ and∥ ˆΘX T∥op, of which the first vanishes by Lemma 8 and the second can be shown to be bounded using the Proposition 5 in Honda [2021], which says thatn −1 ˆΘX TX ˆΘT → Θblockwise in probability. The second term √n∥∆DGL∥h...
work page 2021
-
[6]
that supσ∗∈Un Π χ2 k ≥p k(1 + a) Y, σ ∗ ≤e −x. Lemma 7(Lemma 1 of Pal and Ghoshal [2024]).Under Assumption 3.3, max j=1,...,n 1− d2 j d2 j +a n = max j=1,...,n an d2 j +a n =o(n −1), whered 1, . . . , dn are the singular values ofX. Lemma 8(Lemma 5 of Pal and Ghoshal [2024]).Under Assumption 3.3,n −1/2∥Y− X ˆθR∥1 =o P (1). Lemma 9(Proposition 3 of Honda [...
work page 2024
-
[7]
Proof of Lemma 2.Using Markov inequality, we have, sup σ∗∈Un Π ∥β∗ S0 −β 0 S0∥∞ >min k∈S0 ∥β0 k∥∞ Y, σ ∗ ≤ supσ∗∈Un E ∥β∗ S0 −β 0 S0∥∞ Y, σ ∗ mink∈S0 ∥β0 k∥∞ ≤ 1 α sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 η∥∞ Y, σ ∗ + λnC −1 n(11)ξS0 ∞ = I + II,say. 19 Now, I = 1 mink∈S0 ∥β0 k∥∞ sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 (η−µ) +n −1C −1 n(11)X T S0µ∥∞ Y, σ ∗ ≤ 1 mink∈S...
work page 2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.