arxiv: 2605.00835 · v1 · submitted 2026-04-04 · 💻 cs.LG

Sparse Regression under Correlation and Weak Signals: A Reproducible Benchmark of Classical and Bayesian Methods

Hao Xiao This is my paper

Pith reviewed 2026-05-13 18:18 UTC · model grok-4.3

classification 💻 cs.LG

keywords sparse regressionBayesian methodsLassoHorseshoe priorSpike-and-Slabbenchmarkprediction errorvariable selection

0 comments

The pith

Bayesian methods cut prediction error in half versus classical sparse regression when features correlate and signals weaken.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a large-scale reproducible benchmark of six sparse regression methods on synthetic data that deliberately includes correlated predictors up to rho 0.9, four signal-to-noise ratios, and dimensions up to 100, plus one real dataset. Over 2600 experiments show Bayesian approaches deliver markedly lower mean squared error than OLS, Ridge, Lasso, or Elastic Net. The Horseshoe prior additionally produces interval estimates whose coverage stays close to the nominal 95 percent level. For identifying active variables without needing full posteriors, Lasso matches the performance of the more computationally intensive Spike-and-Slab prior.

Core claim

In synthetic data with controlled correlation and weak signals, Bayesian sparse regression achieves MSE of 72 compared with 108-267 for classical methods, the Horseshoe prior reaches 94.8 percent coverage for 95 percent intervals, Spike-and-Slab under-covers at 91.9 percent, and Lasso and Spike-and-Slab tie at F1 score approximately 0.47 for variable selection.

What carries the argument

A controlled simulation benchmark that generates data under three covariance structures, four SNR levels, and increasing p to measure prediction error, coverage, and selection metrics across penalized classical estimators and Bayesian hierarchical priors.

If this is right

Bayesian methods become the default choice when prediction accuracy matters most and MCMC runtime is acceptable.
The Horseshoe prior supplies usable uncertainty estimates without sacrificing predictive performance.
Lasso remains the practical option for variable selection tasks that do not require posterior distributions.
Gaps between the two families widen as feature correlation rises and signal strength falls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the same controlled design to additional real-world datasets with documented correlation patterns would test whether the MSE advantage persists outside synthetic constructions.
Faster Bayesian approximations that retain the accuracy of full MCMC could narrow the runtime gap while preserving the coverage benefits.
The observed undercoverage of Spike-and-Slab suggests targeted adjustments to its continuous relaxation may improve calibration.

Load-bearing premise

The synthetic covariance structures and SNR levels chosen capture the performance trade-offs that would appear in real high-dimensional data with unknown correlation patterns.

What would settle it

New experiments on real datasets with comparable dimensionality, measured correlations, and weak signals that produce higher MSE for Bayesian methods or coverage rates far from 95 percent would disprove the reported superiority.

Figures

Figures reproduced from arXiv: 2605.00835 by Hao Xiao.

**Figure 2.** Figure 2: MSE degradation with increasing ρ. Left: Block Correlated. Right: Toeplitz. OLS falls apart first—its MSE climbs steeply beyond ρ = 0.3. Ridge and Elastic Net hold up reasonably well thanks to their ℓ2 component. Lasso degrades noticeably at ρ ≥ 0.6: correlated features confuse its variable selection, and the resulting coefficient instability hurts prediction. The Horseshoe prior is the most robust here, w… view at source ↗

**Figure 3.** Figure 3: MSE heatmap: model × correlation strength. 5.3 Variable Selection [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Support recovery F1 vs. SNR. A more Bayesian selection rule (e.g., based on credible intervals excluding zero) would likely improve its F1. 20.0 50.0 100.0 Dimensionality (p) OLS Ridge Lasso Elastic Net Horseshoe Spike-and-Slab 6.39 21.84 21.44 3.63 7.26 8.32 2.35 6.37 6.87 2.35 6.36 6.92 2.53 5.89 2.73 6.45 Coefficient L2 Error by Model and Dimensionality 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 [PITH_FULL_I… view at source ↗

**Figure 5.** Figure 5: Coefficient L2 error by model and dimensionality. On raw coefficient estimation ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Empirical 95% HDI coverage. Dashed red line: nominal level. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Mean fit time per experiment (log scale). [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Test MSE spread across random seeds. Classical methods are nearly deterministic given fixed data ( [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Choosing between classical and Bayesian sparse regression methods involves a real trade-off: penalized estimators like Lasso run in milliseconds but give no uncertainty estimates,while Horseshoe and Spike-and-Slab priors produce full posteriors but need MCMC chains that take minutes per fit.Surprisingly few studies compare these two families head-to-head under the conditions that actually make sparse regression hard -- correlated features, weak signals, and growing dimensionality. We benchmark six methods (OLS, Ridge,Lasso, Elastic Net, Horseshoe, Spike-and-Slab) on synthetic data with three covariance structures (rho up to 0.9), four SNR levels, and p in {20, 50, 100}, plus the Diabetes dataset,totalling over 2,600 experiments. The results are clear on some points and nuanced on others. Bayesian methods win on prediction error (MSE 72 vs. 108-267), and the Horseshoe delivers near-nominal 95% coverage (94.8%). But Spike-and-Slab,despite narrower intervals, under-covers at 91.9% -- its continuous relaxation likely plays a role. For variable selection, Lasso and Spike-and-Slab tie at F1 ~ 0.47, making Lasso the practical default when posteriors are not needed. Code and data are available at https://github.com/xiao98/sparse-bayesian-regression-bench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts a large-scale reproducible benchmark of six sparse regression methods (OLS, Ridge, Lasso, Elastic Net, Horseshoe, Spike-and-Slab) under conditions of correlated predictors and weak signals. Using synthetic data generated with three covariance structures (rho up to 0.9), four SNR levels, and p ∈ {20, 50, 100}, along with the Diabetes dataset for a total of over 2,600 experiments, it concludes that Bayesian methods achieve superior prediction performance (MSE of 72 compared to 108-267 for classical methods), the Horseshoe prior yields near-nominal coverage (94.8%), while Lasso and Spike-and-Slab perform similarly in variable selection with F1 scores around 0.47.

Significance. If these empirical findings hold, the work provides practical insights into the trade-offs between computational efficiency and uncertainty quantification in sparse regression, particularly valuable for high-dimensional settings with feature correlations. The open code and data at the GitHub repository constitute a clear strength, enabling direct verification of metrics such as coverage rates and F1 scores.

major comments (2)

The reported MSE advantage for Bayesian methods (72 vs. 108-267) and coverage rates rest on three fixed covariance structures and four SNR levels with p in {20,50,100}; the manuscript does not test robustness to other patterns such as block-diagonal correlations or heavy-tailed noise, which could change the observed rankings.
Validation on the Diabetes dataset: only one real dataset is included, providing weak external support for generalizability of the Bayesian advantages to high-dimensional data with unknown correlation structures.

minor comments (2)

Abstract: missing space after 'fit.' ('minutes per fit.Surprisingly' should read 'minutes per fit. Surprisingly').
The exact default hyperparameter values for Horseshoe and Spike-and-Slab (beyond the GitHub link) should be stated explicitly in the methods to support full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation for minor revision. We address each major comment below and have incorporated revisions to strengthen the discussion of scope and limitations.

read point-by-point responses

Referee: The reported MSE advantage for Bayesian methods (72 vs. 108-267) and coverage rates rest on three fixed covariance structures and four SNR levels with p in {20,50,100}; the manuscript does not test robustness to other patterns such as block-diagonal correlations or heavy-tailed noise, which could change the observed rankings.

Authors: We agree that additional correlation structures and noise distributions would provide a more complete robustness check. The three covariance structures (including equicorrelated designs with rho up to 0.9) were selected to represent common high-correlation regimes that challenge sparse recovery. In the revised manuscript we add a dedicated limitations paragraph that explicitly acknowledges the absence of block-diagonal and heavy-tailed settings and identifies them as valuable directions for future work. revision: partial
Referee: Validation on the Diabetes dataset: only one real dataset is included, providing weak external support for generalizability of the Bayesian advantages to high-dimensional data with unknown correlation structures.

Authors: We acknowledge that a single real dataset offers limited external validation. The Diabetes data set was chosen because it is a standard benchmark in the sparse regression literature and exhibits moderate correlation. Our primary contribution rests on the controlled synthetic experiments that systematically vary correlation strength and SNR. We will revise the discussion section to note this limitation and to emphasize that broader real-data validation remains an important avenue for follow-up studies. revision: partial

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study uses standard linear regression assumptions and off-the-shelf prior implementations; no new parameters are fitted to produce the reported claims, and no new entities are postulated.

pith-pipeline@v0.9.0 · 5547 in / 1063 out tokens · 40274 ms · 2026-05-13T18:18:25.796958+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

PyMC: a modern, and comprehensive probabilistic programming framework in Python

Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Osvaldo A Martin, et al. PyMC: a modern, and comprehensive probabilistic programming framework in Python. PeerJ Computer Science, 9:e1516, 2023

work page 2023
[2]

Lasso meets horseshoe: A survey.Statistical Science, 34(3):405–427, 2019

Anindya Bhadra, Jyotishka Datta, Nicholas G Polson, and Brandon Willard. Lasso meets horseshoe: A survey.Statistical Science, 34(3):405–427, 2019

work page 2019
[3]

Springer Science & Business Media, 2011

Peter Bühlmann and Sara Van De Geer.Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Science & Business Media, 2011

work page 2011
[4]

Stan: Aprobabilistic programming language.Journal of Statistical Software, 76(1):1–32, 2017

Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, MarcusBrambor, JiqiangGuo, PeterLi, andAllenRiddell. Stan: Aprobabilistic programming language.Journal of Statistical Software, 76(1):1–32, 2017

work page 2017
[5]

Handling sparsity via the horseshoe.Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 73–80, 2009

Carlos M Carvalho, Nicholas G Polson, and James G Scott. Handling sparsity via the horseshoe.Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 73–80, 2009

work page 2009
[6]

The horseshoe estimator for sparse signals.Biometrika, 97(2):465–480, 2010

Carlos M Carvalho, Nicholas G Polson, and James G Scott. The horseshoe estimator for sparse signals.Biometrika, 97(2):465–480, 2010

work page 2010
[7]

Least angle regression

Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–499, 2004

work page 2004
[8]

Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881–889, 1993

Edward I George and Robert E McCulloch. Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881–889, 1993

work page 1993
[9]

Ridge regression: Biased estimation for nonorthog- onal problems.Technometrics, 12(1):55–67, 1970

Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthog- onal problems.Technometrics, 12(1):55–67, 1970

work page 1970
[10]

The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.Journal of Machine Learning Research, 15(1): 1593–1623, 2014

Matthew D Hoffman and Andrew Gelman. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.Journal of Machine Learning Research, 15(1): 1593–1623, 2014

work page 2014
[11]

ArviZ: a unified library for exploratory analysis of Bayesian models in Python.Journal of Open Source Software, 4 (33):1143, 2019

Ravin Kumar, Colin Carroll, Ari Hartikainen, and Osvaldo Martin. ArviZ: a unified library for exploratory analysis of Bayesian models in Python.Journal of Open Source Software, 4 (33):1143, 2019

work page 2019
[12]

Bayesian variable selection in linear regression

Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988

work page 1988
[13]

The Bayesian lasso.Journal of the American Statistical Association, 103(482):681–686, 2008

Trevor Park and George Casella. The Bayesian lasso.Journal of the American Statistical Association, 103(482):681–686, 2008

work page 2008
[14]

Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12: 2825–2830, 2011

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12: 2825–2830, 2011

work page 2011
[15]

Sparsity information and regularization in the horseshoe and other shrinkage priors.Electronic Journal of Statistics, 11(2):5018–5051, 2017

Juho Piironen and Aki Vehtari. Sparsity information and regularization in the horseshoe and other shrinkage priors.Electronic Journal of Statistics, 11(2):5018–5051, 2017

work page 2017
[16]

Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996

Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996. 12

work page 1996
[17]

The horseshoe estimator: Posterior concentration around nearly black vectors.Electronic Journal of Statistics, 8(2):2585–2618, 2014

Stéphanie L Van der Pas, Bas JK Kleijn, and Aad W Van der Vaart. The horseshoe estimator: Posterior concentration around nearly black vectors.Electronic Journal of Statistics, 8(2):2585–2618, 2014

work page 2014
[18]

Shrinkage priors for Bayesian penalized regression.Journal of Mathematical Psychology, 89:31–50, 2019

Sara Van Erp, Daniel L Oberski, and Joris Mulder. Shrinkage priors for Bayesian penalized regression.Journal of Mathematical Psychology, 89:31–50, 2019

work page 2019
[19]

Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005

Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005. A Additional Tables A.1 Correlation Sensitivity Table 5 shows the mean test MSE for each model at each correlation level, aggregated across all other experimental axes. Tabl...

work page 2005