Bayesian High-dimensional Grouped-regression using Sparse Projection-posterior

Samhita Pal; Subhashis Ghosal

arxiv: 2411.15713 · v3 · pith:2NSFY5RCnew · submitted 2024-11-24 · 📊 stat.ME

Bayesian High-dimensional Grouped-regression using Sparse Projection-posterior

Samhita Pal , Subhashis Ghosal This is my paper

Pith reviewed 2026-05-25 08:26 UTC · model grok-4.3

classification 📊 stat.ME

keywords Bayesian high-dimensional regressiongrouped sparsityprojection posteriorposterior contraction ratesmodel selection consistencydebiased credible setsnonparametric additive models

0 comments

The pith

Sparse projection maps convert dense Bayesian samples into sparse posteriors that achieve optimal contraction rates and consistent group selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian method for high-dimensional grouped regression under sparsity by applying sparsity-inducing projection maps to dense posterior samples, producing induced posteriors on lower-dimensional spaces. Three maps are constructed from the Group LASSO, SCAD, and Adaptive Group LASSO penalties to enforce structure while preserving essential posterior features. From this construction the authors derive optimal posterior contraction rates for both estimation and prediction, and establish that the resulting procedures are model selection consistent. A debiased variant of the Group LASSO map is introduced to guarantee exact frequentist coverage of the resulting credible sets. The framework is then specialized to nonparametric additive models via B-spline expansions and tested on simulated data and Alzheimer's brain-imaging measurements.

Core claim

By immersing dense posterior draws into a lower-dimensional sparse space through penalty-based projection maps, the induced posteriors attain the minimax optimal contraction rates for estimation and prediction while also being consistent for selecting the correct groups; the debiased Group LASSO map further ensures that credible sets have exact coverage.

What carries the argument

Sparsity-inducing projection maps (Group LASSO Projection Posterior, Group SCAD Projection Posterior, Adaptive Group LASSO Projection Posterior) that embed dense samples into structured sparse parameter spaces.

Load-bearing premise

The projection maps map dense posterior samples into the sparse space without distorting the contraction rates or selection properties that the proofs rely on.

What would settle it

Empirical posterior contraction rates slower than the minimax rate, or failure of the procedure to select the true groups with probability approaching one, would falsify the optimality and consistency claims.

Figures

Figures reproduced from arXiv: 2411.15713 by Samhita Pal, Subhashis Ghosal.

**Figure 1.** Figure 1: MSE comparisons for the four pairs of (K, n), namely (50,100),(50,500),(100,100) and (100,500) when s0 = 10 groups are active, replicated 100 times 25 [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗

**Figure 2.** Figure 2: F1-score comparisons for the four pairs of ( [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplot of MSEs and Forestplot of F1 scores of all competing methods under the [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗

**Figure 4.** Figure 4: Plot of the distributions of the average prediction error versus the model sparsity [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗

**Figure 5.** Figure 5: Computation time comparison between centralized and distributed implementa [PITH_FULL_IMAGE:figures/full_fig_p042_5.png] view at source ↗

**Figure 6.** Figure 6: F1-score comparisons for the four pairs of ( [PITH_FULL_IMAGE:figures/full_fig_p045_6.png] view at source ↗

read the original abstract

We present a novel Bayesian approach for high-dimensional grouped regression under sparsity. We leverage a sparse projection method that uses a sparsity-inducing map to derive an induced posterior on a lower-dimensional parameter space. Our method introduces three distinct projection maps based on popular penalty functions: the Group LASSO Projection Posterior, Group SCAD Projection Posterior, and Adaptive Group LASSO Projection Posterior. Each projection map is constructed to immerse dense posterior samples into a structured, sparse space, allowing for effective group selection and estimation in high-dimensional settings. We derive optimal posterior contraction rates for estimation and prediction, proving that the methods are model selection consistent. Additionally, we propose a Debiased Group LASSO Projection Map, which ensures exact coverage of credible sets. Our methodology is particularly suited for applications in nonparametric additive models, where we apply it with B-spline expansions to capture complex relationships between covariates and response. Extensive simulations validate our theoretical findings, demonstrating the robustness of our approach across different settings. Finally, we illustrate the practical utility of our method with an application to brain MRI volume data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), where our model identifies key brain regions associated with Alzheimer's progression.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds projection posteriors for high-dimensional grouped regression via group-LASSO, SCAD and adaptive group-LASSO maps, plus a debiased variant, and claims optimal contraction rates plus selection consistency.

read the letter

The central contribution is the construction of three sparsity-inducing projection maps that turn samples from a dense posterior into an induced posterior supported on sparse grouped coefficients. They also add a debiased group-LASSO map meant to give exact credible-set coverage. The theoretical part asserts that these maps deliver optimal posterior contraction rates for estimation and prediction and that the induced posteriors are model-selection consistent. The method is then used inside a B-spline additive model and checked on simulations plus ADNI brain-volume data.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Bayesian framework for high-dimensional grouped regression that applies three sparsity-inducing projection maps (Group LASSO, SCAD, and Adaptive Group LASSO) to dense posterior samples, yielding induced posteriors on a lower-dimensional sparse parameter space. It claims to establish optimal posterior contraction rates for estimation and prediction, prove model selection consistency, and introduce a debiased Group LASSO projection map that achieves exact frequentist coverage for credible sets. The approach is extended to nonparametric additive models via B-spline bases, supported by simulation studies and an application to ADNI brain MRI data for identifying regions linked to Alzheimer's progression.

Significance. If the projection maps are shown to control the distance to the original posterior at a rate faster than the claimed contraction rate, the work would supply a computationally tractable route to Bayesian sparse grouped inference with theoretical guarantees, which is relevant for applications such as neuroimaging where grouped predictors arise naturally.

major comments (2)

[Theoretical results on contraction rates and selection consistency] The central theoretical claims (optimal contraction rates and model selection consistency) rest on the three projection maps immersing the dense posterior into the sparse subspace while keeping total variation or Hellinger distance o(ε_n), where ε_n is the target contraction rate. The manuscript must supply explicit bounds on this projection error (including dependence on dimension p, group size, and sparsity level) in the section containing the main theorems; without such control the optimality statements do not follow.
[Debiased Group LASSO Projection Map and credible-set coverage] For the Debiased Group LASSO Projection Map, the argument for exact credible-set coverage requires that the debiasing step exactly cancels the bias induced by the projection while preserving the posterior contraction; the manuscript should state the precise condition on the projection map under which this cancellation holds and verify it does not inflate the remainder term beyond the claimed rate.

minor comments (2)

Notation for the three projection maps (e.g., how the penalty functions are applied to the posterior samples) should be introduced with a single consistent definition early in the methods section rather than scattered across subsections.
Simulation tables would benefit from reporting both estimation error and selection metrics (e.g., false-positive group rate) side-by-side for all three maps to allow direct comparison of their finite-sample behavior.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments below.

read point-by-point responses

Referee: [Theoretical results on contraction rates and selection consistency] The central theoretical claims (optimal contraction rates and model selection consistency) rest on the three projection maps immersing the dense posterior into the sparse subspace while keeping total variation or Hellinger distance o(ε_n), where ε_n is the target contraction rate. The manuscript must supply explicit bounds on this projection error (including dependence on dimension p, group size, and sparsity level) in the section containing the main theorems; without such control the optimality statements do not follow.

Authors: We agree that the main theorems section should contain explicit bounds on the projection error to make the optimality claims fully transparent. The supplementary proofs already establish that each of the three maps (Group LASSO, SCAD, Adaptive Group LASSO) produces a total-variation distance of order o(ε_n) under the stated assumptions on group size and sparsity; we will move the key bound (with its explicit dependence on p, group dimension, and sparsity level s) into the statement of Theorem 3.1 and the surrounding discussion in the revised manuscript. revision: yes
Referee: [Debiased Group LASSO Projection Map and credible-set coverage] For the Debiased Group LASSO Projection Map, the argument for exact credible-set coverage requires that the debiasing step exactly cancels the bias induced by the projection while preserving the posterior contraction; the manuscript should state the precise condition on the projection map under which this cancellation holds and verify it does not inflate the remainder term beyond the claimed rate.

Authors: We will add an explicit statement of the required condition on the projection map (a uniform Lipschitz bound with respect to the group-norm that is satisfied by the debiased Group LASSO map under our eigenvalue and group-size assumptions) immediately before the coverage result. The proof already shows that the additional remainder introduced by debiasing is absorbed into the o(ε_n) term; we will include a short verification paragraph in the main text confirming that the contraction rate is unaffected. revision: yes

Circularity Check

0 steps flagged

No circularity: rates derived for induced posteriors from explicitly constructed maps

full rationale

The paper defines three sparsity-inducing projection maps from standard penalties (Group LASSO, SCAD, Adaptive Group LASSO) and a debiased variant, then claims to derive contraction rates and selection consistency for the resulting induced posteriors. No equation or step reduces a claimed rate or consistency statement to a fitted quantity or self-citation by construction; the maps are presented as immersing the original posterior into a sparse space, with the theoretical results asserted to follow from that construction under standard Bayesian nonparametric arguments. The provided abstract and skeptic summary contain no load-bearing self-citation chain or self-definitional loop, so the derivation chain is treated as self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.0 · 5736 in / 994 out tokens · 23154 ms · 2026-05-25T08:26:49.046368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,

Michael Komodromos, Marina Evangelou, Sarah Filippi, and Kolyan Ray. Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,

work page arXiv
[2]

Bayesian high-dimensional linear regression with sparse projection-posterior.arXiv preprint arXiv:2410.16577,

Samhita Pal and Subhashis Ghoshal. Bayesian high-dimensional linear regression with sparse projection-posterior.arXiv preprint arXiv:2410.16577,

work page arXiv
[3]

Bayesian grouped horseshoe regression with application to additive models

Zemei Xu, Daniel F Schmidt, Enes Makalic, Guoqi Qian, and John L Hopper. Bayesian grouped horseshoe regression with application to additive models. InAI 2016: Advances in Artificial Intelligence: 29th Australasian Joint Conference, Hobart, TAS, Australia, December 5-8, 2016, Proceedings 29, pages 229–240. Springer,

work page 2016
[4]

Since group sparsity is a special case of the general sparse high-dimensional linear regression problem, the same map used in Pal and Ghoshal

As the immersion posterior forσ 2 described in Section 2.2 is consistent, as in Pal and Ghoshal [2024], it suffices to prove the results conditionally on givenσ ∗, uniformly inσ ∗ in a shrinking neighborhood of the trueσ 0, say,U n, such that Π(σ ∗ ∈ U n|Y)→1 in probability 7 asn→ ∞. Since group sparsity is a special case of the general sparse high-dimens...

work page 2024
[5]

The second term √n∥∆DGL∥has been proven to beo(1) in 14 Proposition 3 of Honda [2021]

+∆ DGL∥ ≤n−1/2∥Y−X ˆβR∥ · ∥ ˆΘX T∥op + √n∥∆DGL∥, which iso P (1) because the first term is a product of two quantities,n −1/2∥Y−X ˆβR∥ and∥ ˆΘX T∥op, of which the first vanishes by Lemma 8 and the second can be shown to be bounded using the Proposition 5 in Honda [2021], which says thatn −1 ˆΘX TX ˆΘT → Θblockwise in probability. The second term √n∥∆DGL∥h...

work page 2021
[6]

Lemma 7(Lemma 1 of Pal and Ghoshal [2024]).Under Assumption 3.3, max j=1,...,n 1− d2 j d2 j +a n = max j=1,...,n an d2 j +a n =o(n −1), whered 1,

that supσ∗∈Un Π χ2 k ≥p k(1 + a) Y, σ ∗ ≤e −x. Lemma 7(Lemma 1 of Pal and Ghoshal [2024]).Under Assumption 3.3, max j=1,...,n 1− d2 j d2 j +a n = max j=1,...,n an d2 j +a n =o(n −1), whered 1, . . . , dn are the singular values ofX. Lemma 8(Lemma 5 of Pal and Ghoshal [2024]).Under Assumption 3.3,n −1/2∥Y− X ˆθR∥1 =o P (1). Lemma 9(Proposition 3 of Honda [...

work page 2024
[7]

Proof of Lemma 2.Using Markov inequality, we have, sup σ∗∈Un Π ∥β∗ S0 −β 0 S0∥∞ >min k∈S0 ∥β0 k∥∞ Y, σ ∗ ≤ supσ∗∈Un E ∥β∗ S0 −β 0 S0∥∞ Y, σ ∗ mink∈S0 ∥β0 k∥∞ ≤ 1 α sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 η∥∞ Y, σ ∗ + λnC −1 n(11)ξS0 ∞ = I + II,say. 19 Now, I = 1 mink∈S0 ∥β0 k∥∞ sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 (η−µ) +n −1C −1 n(11)X T S0µ∥∞ Y, σ ∗ ≤ 1 mink∈S...

work page 2008

[1] [1]

Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,

Michael Komodromos, Marina Evangelou, Sarah Filippi, and Kolyan Ray. Group spike and slab variational Bayes.arXiv preprint arXiv:2309.10378,

work page arXiv

[2] [2]

Bayesian high-dimensional linear regression with sparse projection-posterior.arXiv preprint arXiv:2410.16577,

Samhita Pal and Subhashis Ghoshal. Bayesian high-dimensional linear regression with sparse projection-posterior.arXiv preprint arXiv:2410.16577,

work page arXiv

[3] [3]

Bayesian grouped horseshoe regression with application to additive models

Zemei Xu, Daniel F Schmidt, Enes Makalic, Guoqi Qian, and John L Hopper. Bayesian grouped horseshoe regression with application to additive models. InAI 2016: Advances in Artificial Intelligence: 29th Australasian Joint Conference, Hobart, TAS, Australia, December 5-8, 2016, Proceedings 29, pages 229–240. Springer,

work page 2016

[4] [4]

Since group sparsity is a special case of the general sparse high-dimensional linear regression problem, the same map used in Pal and Ghoshal

As the immersion posterior forσ 2 described in Section 2.2 is consistent, as in Pal and Ghoshal [2024], it suffices to prove the results conditionally on givenσ ∗, uniformly inσ ∗ in a shrinking neighborhood of the trueσ 0, say,U n, such that Π(σ ∗ ∈ U n|Y)→1 in probability 7 asn→ ∞. Since group sparsity is a special case of the general sparse high-dimens...

work page 2024

[5] [5]

The second term √n∥∆DGL∥has been proven to beo(1) in 14 Proposition 3 of Honda [2021]

+∆ DGL∥ ≤n−1/2∥Y−X ˆβR∥ · ∥ ˆΘX T∥op + √n∥∆DGL∥, which iso P (1) because the first term is a product of two quantities,n −1/2∥Y−X ˆβR∥ and∥ ˆΘX T∥op, of which the first vanishes by Lemma 8 and the second can be shown to be bounded using the Proposition 5 in Honda [2021], which says thatn −1 ˆΘX TX ˆΘT → Θblockwise in probability. The second term √n∥∆DGL∥h...

work page 2021

[6] [6]

Lemma 7(Lemma 1 of Pal and Ghoshal [2024]).Under Assumption 3.3, max j=1,...,n 1− d2 j d2 j +a n = max j=1,...,n an d2 j +a n =o(n −1), whered 1,

that supσ∗∈Un Π χ2 k ≥p k(1 + a) Y, σ ∗ ≤e −x. Lemma 7(Lemma 1 of Pal and Ghoshal [2024]).Under Assumption 3.3, max j=1,...,n 1− d2 j d2 j +a n = max j=1,...,n an d2 j +a n =o(n −1), whered 1, . . . , dn are the singular values ofX. Lemma 8(Lemma 5 of Pal and Ghoshal [2024]).Under Assumption 3.3,n −1/2∥Y− X ˆθR∥1 =o P (1). Lemma 9(Proposition 3 of Honda [...

work page 2024

[7] [7]

Proof of Lemma 2.Using Markov inequality, we have, sup σ∗∈Un Π ∥β∗ S0 −β 0 S0∥∞ >min k∈S0 ∥β0 k∥∞ Y, σ ∗ ≤ supσ∗∈Un E ∥β∗ S0 −β 0 S0∥∞ Y, σ ∗ mink∈S0 ∥β0 k∥∞ ≤ 1 α sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 η∥∞ Y, σ ∗ + λnC −1 n(11)ξS0 ∞ = I + II,say. 19 Now, I = 1 mink∈S0 ∥β0 k∥∞ sup σ∗∈Un E ∥n−1C −1 n(11)X T Sc 0 (η−µ) +n −1C −1 n(11)X T S0µ∥∞ Y, σ ∗ ≤ 1 mink∈S...

work page 2008