Spectral Collapsed Gibbs Sampler for Bayesian Sparse Regression
Pith reviewed 2026-05-08 08:17 UTC · model grok-4.3
The pith
Bayesian sparse regression can draw the global scale parameter directly from its collapsed conditional density.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
While updating the global scale τ after marginalizing the coefficients had previously required a Metropolis step, careful linear algebraic manipulations together with a single spectral decomposition performed at the beginning of each Gibbs scan permit rapid evaluation of the collapsed density across hundreds of values of τ. These evaluations are combined with adaptive quadrature and inverse transform sampling to generate exact draws from the collapsed conditional distribution of τ, removing any need for proposal tuning.
What carries the argument
per-Gibbs-scan spectral decomposition that reduces repeated collapsed-density evaluations to negligible cost
If this is right
- No Metropolis proposal tuning is required for the global scale update
- The overall Gibbs chain exhibits faster convergence and reduced autocorrelation
- Posterior inference remains feasible for design matrices with tens or hundreds of thousands of rows
- The same linear-algebraic device applies to logistic regression under the horseshoe prior
Where Pith is reading between the lines
- The same decomposition strategy could accelerate collapsed updates in other hierarchical models that marginalize large parameter blocks
- Direct sampling of τ may shorten required burn-in periods enough to change practical workflow for very large problems
- Numerical stability of the spectral step for ill-conditioned design matrices remains an open implementation question
Load-bearing premise
The collapsed conditional density of τ is sufficiently regular that numerical integration followed by inverse transform sampling produces reliable exact draws without instability or hidden computational expense.
What would settle it
On a small simulated dataset whose true marginal posterior for τ can be computed exactly, draw thousands of independent samples with the proposed method and verify that their empirical distribution matches the known target.
Figures
read the original abstract
Sparse regression based on global-local shrinkage priors are increasingly used for Bayesian modeling of modern high-dimensional data, but scaling up the Gibbs sampler for posterior inference remains a challenge. While much effort has gone into speeding up the high-dimensional coefficient update step, insufficient attention has been given to the potential poor mixing of the global scale parameter $\tau$ and of the overall sampler. One proposed remedy has been to marginalize out the coefficients when updating $\tau$. Here we show that, while this collapsed update was previously thought to require a Metropolis step, we can in fact sample directly and efficiently from the collapsed density. This is made possible by careful linear algebraic manipulations and a strategic per-Gibbs-scan spectral decomposition, allowing subsequent evaluations of the collapsed density across hundreds of values of $\tau$ at negligible cost. We combine this computational trick with adaptive numerical integration and inverse transform sampling to construct a direct sampler. This eliminates the need to tune Metropolis proposals and yields faster convergence and improved mixing. We demonstrate our method on two big data applications, fitting logistic regression under the horseshoe prior to datasets with design matrices of size 120,000 x 1,379 and 1,980 x 17,848.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a spectral collapsed Gibbs sampler for Bayesian sparse regression under global-local shrinkage priors such as the horseshoe. The central algorithmic claim is that the collapsed conditional density p(τ | ·) for the global scale can be sampled directly via a per-Gibbs-scan spectral decomposition of the scaled Gram matrix (yielding eigenvalues μ_i), which permits O(1) evaluations of the log-density log p(τ) ∝ const − ½ ∑ log(1 + τ² μ_i) − ½ ∑ (u_i' y)² / (1 + τ² μ_i) for hundreds of τ values after one O(min(n,p)³) factorization; this is then combined with adaptive numerical integration and inverse-transform sampling to produce exact draws without Metropolis tuning, yielding improved mixing. The method is illustrated on two large logistic-regression examples (n=120k/p=1.4k and n=2k/p=18k).
Significance. If the numerical claims hold, the work supplies a practical and theoretically grounded advance for scaling collapsed Gibbs sampling in high-dimensional sparse regression. The strategic reuse of a single spectral factorization per scan to enable cheap, repeated density evaluations is a clear computational contribution that directly addresses the documented poor mixing of τ. The approach relies only on standard linear-algebra identities and the form of the shrinkage prior, requires no additional tuning parameters, and is shown to deliver faster convergence on realistically sized data sets. These strengths would make the sampler attractive for practitioners working with global-local priors.
major comments (2)
- [§3.2] §3.2 (spectral representation of the collapsed density): the claim that subsequent density evaluations occur “at negligible cost” and support reliable inverse-transform sampling rests on the assumption that the eigenvalue-based formula remains numerically stable when some μ_i approach zero (as occurs when D^{1/2}X'X D^{1/2} becomes rank-deficient). No floating-point error analysis, condition-number monitoring, or safeguards against cancellation/overflow are provided, yet the two reported data sets are precisely those in which conditioning varies most across iterations.
- [§4.3] §4.3 (adaptive quadrature and inverse-transform step): the manuscript states that adaptive numerical integration is used to obtain the CDF for inverse-transform sampling but supplies neither error bounds, tail-mass diagnostics, nor verification that the resulting draws exactly target the collapsed density. Because any systematic bias in the τ draws would undermine the claimed mixing improvement, this verification is load-bearing for the central algorithmic claim.
minor comments (2)
- [§3.1] The dual versus primal forms of the Gram matrix are used interchangeably in the text without an explicit statement of when each is computationally preferable (n ≪ p versus p ≪ n).
- [§5] Figure 3 (trace plots) would benefit from overlaying the effective sample size or autocorrelation time for both the new sampler and the Metropolis baseline to quantify the mixing gain.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. The comments highlight important numerical considerations that we will address in revision to strengthen the presentation of the sampler.
read point-by-point responses
-
Referee: [§3.2] §3.2 (spectral representation of the collapsed density): the claim that subsequent density evaluations occur “at negligible cost” and support reliable inverse-transform sampling rests on the assumption that the eigenvalue-based formula remains numerically stable when some μ_i approach zero (as occurs when D^{1/2}X'X D^{1/2} becomes rank-deficient). No floating-point error analysis, condition-number monitoring, or safeguards against cancellation/overflow are provided, yet the two reported data sets are precisely those in which conditioning varies most across iterations.
Authors: We agree that numerical stability of the eigenvalue formula requires explicit discussion. While the algebraic identity is exact and our experiments on the reported data sets showed no visible artifacts, floating-point cancellation can occur for small μ_i when τ is large. In revision we will add a short subsection on numerical considerations, including monitoring of the smallest eigenvalues, use of stable primitives such as log1p, and a simple safeguard that clamps negligible μ_i to a small positive threshold. This will be accompanied by a brief condition-number diagnostic in the implementation. revision: yes
-
Referee: [§4.3] §4.3 (adaptive quadrature and inverse-transform step): the manuscript states that adaptive numerical integration is used to obtain the CDF for inverse-transform sampling but supplies neither error bounds, tail-mass diagnostics, nor verification that the resulting draws exactly target the collapsed density. Because any systematic bias in the τ draws would undermine the claimed mixing improvement, this verification is load-bearing for the central algorithmic claim.
Authors: We concur that explicit verification of the quadrature and inverse-transform procedure is necessary to substantiate the claim of exact sampling from the collapsed density. The current manuscript relies on a standard adaptive quadrature routine with default tolerances but does not report error bounds or empirical checks. In the revision we will include (i) quadrature error bounds derived from the adaptive integrator, (ii) tail-mass diagnostics, and (iii) simulation-based validation (e.g., Kolmogorov-Smirnov tests against rejection sampling in low-dimensional cases) confirming that the generated τ draws are unbiased for the target. These additions will appear in §4.3 and the supplementary material. revision: yes
Circularity Check
No significant circularity; derivation is a standard linear-algebraic optimization
full rationale
The paper's central construction derives an efficient expression for the collapsed conditional p(τ | ·) by rewriting the marginal likelihood in terms of the eigenvalues of the scaled Gram matrix (X D X' or dual), then precomputing those eigenvalues once per Gibbs scan. This step rests on the matrix determinant lemma and Woodbury identity applied to the Gaussian integral, which are external facts independent of the target sampler. No parameter is fitted to data and then renamed as a prediction; no self-citation supplies a uniqueness theorem or ansatz that the present work merely renames; and the subsequent adaptive quadrature plus inverse-transform sampling is a generic numerical procedure whose correctness does not presuppose the sampler's output. The method is therefore self-contained against external linear-algebra benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The joint posterior admits a collapsed conditional for the global scale τ after integrating out the regression coefficients.
- domain assumption A strategic per-Gibbs-scan spectral decomposition renders evaluations of the collapsed density for many τ values computationally cheap.
Reference graph
Works this paper leans on
-
[1]
2019, Journal of Open Source Software, 4, 1143, doi: 10.21105/joss.01143
Ravin Kumar and Colin Carroll and Ari Hartikainen and Osvaldo Martin , title =. 2019 , publisher =. doi:10.21105/joss.01143 , url =
-
[2]
Andrieu, Christophe and Thoms, Johannes , journal=. A tutorial on adaptive. 2008 , publisher=
work page 2008
-
[3]
Bhattacharya, Anirban and Chakraborty, Antik and Mallick, Bani K , journal=. Fast sampling with. 2016 , publisher=
work page 2016
-
[4]
Bhattacharya, Suman and Khare, Kshitij and Pal, Subhadip , journal=. Geometric ergodicity of. 2022 , publisher=
work page 2022
-
[5]
James O. Berger and Jos. The Annals of Statistics , number =. 2009 , doi =
work page 2009
-
[6]
Artificial Intelligence and Statistics , pages=
Handling sparsity via the horseshoe , author=. Artificial Intelligence and Statistics , pages=. 2009 , organization=
work page 2009
-
[7]
Nature Reviews Genetics , volume=
Developing and evaluating polygenic risk prediction models for stratified disease prevention , author=. Nature Reviews Genetics , volume=. 2016 , publisher=
work page 2016
-
[8]
Journal of the American Statistical Association , volume=
Computational and inferential difficulties with mixture posterior distributions , author=. Journal of the American Statistical Association , volume=. 2000 , publisher=
work page 2000
-
[9]
ACM Transactions on Modeling and Computer Simulation (TOMACS) , volume=
Random variate generation by numerical inversion when only the density is known , author=. ACM Transactions on Modeling and Computer Simulation (TOMACS) , volume=. 2010 , publisher=
work page 2010
- [10]
- [11]
-
[12]
Filippone, Maurizio and Girolami, Mark , journal=. Pseudo-marginal. 2014 , publisher=
work page 2014
- [13]
-
[14]
Weak convergence and optimal scaling of random walk
Gelman, Andrew and Gilks, Walter R and Roberts, Gareth O , journal=. Weak convergence and optimal scaling of random walk. 1997 , publisher=
work page 1997
-
[15]
Gelman, Andrew and Carlin, John B. and Stern, Hal S. and Dunson, David B. and Vehtari, Aki and Rubin, Donald B. , year =. Bayesian Data Analysis , edition =
-
[16]
Geman, Stuart and Geman, Donald , journal=. Stochastic relaxation,. 1984 , publisher=
work page 1984
-
[17]
Hahn, P Richard and He, Jingyu and Lopes, Hedibert F , journal=. Efficient sampling for. 2019 , publisher=
work page 2019
- [18]
-
[19]
Journal of Machine Learning Research , year =
James Johndrow and Paulo Orenstein and Anirban Bhattacharya , title =. Journal of Machine Learning Research , year =
-
[20]
and Arshad, Faaizah and Blacketer, Clair and Bowring, Mary G
Khera, Rohan and Dhingra, Lovedeep Singh and Aminorroaya, Arya and Li, Kelly and Zhou, Jin J. and Arshad, Faaizah and Blacketer, Clair and Bowring, Mary G. and Bu, Fan and Cook, Michael and Dorr, David A. and Duarte-Salles, Talita and DuVall, Scott L. and Falconer, Thomas and French, Tina E. and Hanchrow, Elizabeth E. and Horban, Scott and Lau, Wallis C. ...
-
[21]
Lee, Se Yoon and Lei, Bowen and Mallick, Bani , journal=. Estimation of. 2020 , publisher=
work page 2020
-
[22]
IEEE Signal Processing Letters , volume=
A simple sampler for the horseshoe estimator , author=. IEEE Signal Processing Letters , volume=. 2015 , publisher=
work page 2015
-
[23]
The Journal of Chemical Physics , volume=
Equation of state calculations by fast computing machines , author=. The Journal of Chemical Physics , volume=. 1953 , publisher=
work page 1953
-
[24]
Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in “large n, large p” Bayesian sparse regression , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=
work page 2023
-
[25]
Shrinkage with shrunken shoulders:
Nishimura, Akihiko and Suchard, Marc A , journal=. Shrinkage with shrunken shoulders:. 2023 , publisher=
work page 2023
-
[26]
doi:10.5281/zenodo.7773502 , url =
Oliver Pain , title =. doi:10.5281/zenodo.7773502 , url =
-
[27]
Technical University of Denmark , volume=
The matrix cookbook , author=. Technical University of Denmark , volume=
-
[28]
Electronic Journal of Statistics , number =
Juho Piironen and Aki Vehtari , title =. Electronic Journal of Statistics , number =. 2017 , doi =
work page 2017
-
[29]
Plummer, Martyn and Best, Nicky and Cowles, Kate and Vines, Karen and others , journal=
-
[30]
Bayesian inference for logistic models using
Polson, Nicholas G and Scott, James G and Windle, Jesse , journal=. Bayesian inference for logistic models using. 2013 , publisher=
work page 2013
-
[31]
Journal of the Royal Statistical Society: Series B: Statistical Methodology , pages=
The bayesian bridge , author=. Journal of the Royal Statistical Society: Series B: Statistical Methodology , pages=. 2014 , publisher=
work page 2014
-
[32]
The American Journal of Human Genetics , year =
Making the Most of Clumping and Thresholding for Polygenic Scores , author =. The American Journal of Human Genetics , year =. doi:10.1016/j.ajhg.2019.11.001 , issn =
-
[33]
Numerical recipes 3rd edition: The art of scientific computing , author=. 2007 , publisher=
work page 2007
-
[34]
The central role of the propensity score in observational studies for causal effects , author=. Biometrika , volume=. 1983 , publisher=
work page 1983
- [35]
-
[36]
International Journal of Epidemiology , volume=
Evaluating large-scale propensity score performance through real-world and synthetic data experiments , author=. International Journal of Epidemiology , volume=. 2018 , publisher=
work page 2018
-
[37]
Journal of the Royal Statistical Society: Series B: Statistical Methodology , volume=
Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society: Series B: Statistical Methodology , volume=. 1996 , publisher=
work page 1996
- [38]
-
[39]
Comprehensive proteomics and platform validation of urinary biomarkers for bladder cancer diagnosis and staging , author=. BMC Medicine , volume=. 2023 , publisher=
work page 2023
-
[40]
Aki Vehtari and Andrew Gelman and Daniel Simpson and Bob Carpenter and Paul-Christian B. Bayesian Analysis , number =. 2021 , doi =
work page 2021
-
[41]
Cell Reports Medicine , volume=
Gene-environment interactions explain a substantial portion of variability of common neuropsychiatric disorders , author=. Cell Reports Medicine , volume=. 2022 , publisher=
work page 2022
-
[42]
Zhang, Haoyu , publisher =. 2022 , version =. doi:10.7910/DVN/COXHAP , url =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.