pith. sign in

arxiv: 2605.05528 · v1 · submitted 2026-05-07 · 📊 stat.ME · stat.CO

Spectral Collapsed Gibbs Sampler for Bayesian Sparse Regression

Pith reviewed 2026-05-08 08:17 UTC · model grok-4.3

classification 📊 stat.ME stat.CO
keywords Bayesian sparse regressioncollapsed Gibbs samplerglobal-local shrinkagespectral decompositionhorseshoe priordirect samplinginverse transform samplinghigh-dimensional data
0
0 comments X

The pith

Bayesian sparse regression can draw the global scale parameter directly from its collapsed conditional density.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the global scale parameter τ in models using global-local shrinkage priors can be sampled directly and efficiently once the high-dimensional regression coefficients are integrated out. This direct sampling becomes practical through linear algebraic rearrangements that support one spectral decomposition per Gibbs scan, after which the collapsed density can be evaluated for many candidate values of τ at almost no extra cost. Adaptive numerical integration then converts those evaluations into exact draws via inverse transform sampling. The resulting sampler requires no Metropolis proposal tuning and produces faster convergence together with improved mixing of the full Markov chain. The approach is demonstrated on logistic regression problems whose design matrices reach sizes such as 120,000 by 1,379.

Core claim

While updating the global scale τ after marginalizing the coefficients had previously required a Metropolis step, careful linear algebraic manipulations together with a single spectral decomposition performed at the beginning of each Gibbs scan permit rapid evaluation of the collapsed density across hundreds of values of τ. These evaluations are combined with adaptive quadrature and inverse transform sampling to generate exact draws from the collapsed conditional distribution of τ, removing any need for proposal tuning.

What carries the argument

per-Gibbs-scan spectral decomposition that reduces repeated collapsed-density evaluations to negligible cost

If this is right

  • No Metropolis proposal tuning is required for the global scale update
  • The overall Gibbs chain exhibits faster convergence and reduced autocorrelation
  • Posterior inference remains feasible for design matrices with tens or hundreds of thousands of rows
  • The same linear-algebraic device applies to logistic regression under the horseshoe prior

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition strategy could accelerate collapsed updates in other hierarchical models that marginalize large parameter blocks
  • Direct sampling of τ may shorten required burn-in periods enough to change practical workflow for very large problems
  • Numerical stability of the spectral step for ill-conditioned design matrices remains an open implementation question

Load-bearing premise

The collapsed conditional density of τ is sufficiently regular that numerical integration followed by inverse transform sampling produces reliable exact draws without instability or hidden computational expense.

What would settle it

On a small simulated dataset whose true marginal posterior for τ can be computed exactly, draw thousands of independent samples with the proposed method and verify that their empirical distribution matches the known target.

Figures

Figures reproduced from arXiv: 2605.05528 by Akihiko Nishimura, Andrew Chin, Xiyu Ding.

Figure 1
Figure 1. Figure 1: Comparison of the collapsed and uncollapsed Gibbs samplers view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the adaptive numerical integration scheme view at source ↗
Figure 3
Figure 3. Figure 3: Relative costs of the spectral and Metropolis updates of view at source ↗
Figure 4
Figure 4. Figure 4: Evolution of the convergence diagnostic Rˆ (left) and trace plots of τ from differ￾ent initializations (right). The spectral collapsed sampler reaches R <ˆ 1.01 and converges to a high density region in fewer iterations than the optimally tuned Metropolis collapsed sampler. The two collapsed samplers have comparable per-iteration costs in this exam￾ple, with the spectral sampler costing no more than 5% add… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the collapsed and uncollapsed densities when view at source ↗
Figure 6
Figure 6. Figure 6: As in Figure 4, the spectral collapsed method finds the high view at source ↗
Figure 7
Figure 7. Figure 7: Trace plots of the global scale over the last 10% of iteratio view at source ↗
read the original abstract

Sparse regression based on global-local shrinkage priors are increasingly used for Bayesian modeling of modern high-dimensional data, but scaling up the Gibbs sampler for posterior inference remains a challenge. While much effort has gone into speeding up the high-dimensional coefficient update step, insufficient attention has been given to the potential poor mixing of the global scale parameter $\tau$ and of the overall sampler. One proposed remedy has been to marginalize out the coefficients when updating $\tau$. Here we show that, while this collapsed update was previously thought to require a Metropolis step, we can in fact sample directly and efficiently from the collapsed density. This is made possible by careful linear algebraic manipulations and a strategic per-Gibbs-scan spectral decomposition, allowing subsequent evaluations of the collapsed density across hundreds of values of $\tau$ at negligible cost. We combine this computational trick with adaptive numerical integration and inverse transform sampling to construct a direct sampler. This eliminates the need to tune Metropolis proposals and yields faster convergence and improved mixing. We demonstrate our method on two big data applications, fitting logistic regression under the horseshoe prior to datasets with design matrices of size 120,000 x 1,379 and 1,980 x 17,848.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a spectral collapsed Gibbs sampler for Bayesian sparse regression under global-local shrinkage priors such as the horseshoe. The central algorithmic claim is that the collapsed conditional density p(τ | ·) for the global scale can be sampled directly via a per-Gibbs-scan spectral decomposition of the scaled Gram matrix (yielding eigenvalues μ_i), which permits O(1) evaluations of the log-density log p(τ) ∝ const − ½ ∑ log(1 + τ² μ_i) − ½ ∑ (u_i' y)² / (1 + τ² μ_i) for hundreds of τ values after one O(min(n,p)³) factorization; this is then combined with adaptive numerical integration and inverse-transform sampling to produce exact draws without Metropolis tuning, yielding improved mixing. The method is illustrated on two large logistic-regression examples (n=120k/p=1.4k and n=2k/p=18k).

Significance. If the numerical claims hold, the work supplies a practical and theoretically grounded advance for scaling collapsed Gibbs sampling in high-dimensional sparse regression. The strategic reuse of a single spectral factorization per scan to enable cheap, repeated density evaluations is a clear computational contribution that directly addresses the documented poor mixing of τ. The approach relies only on standard linear-algebra identities and the form of the shrinkage prior, requires no additional tuning parameters, and is shown to deliver faster convergence on realistically sized data sets. These strengths would make the sampler attractive for practitioners working with global-local priors.

major comments (2)
  1. [§3.2] §3.2 (spectral representation of the collapsed density): the claim that subsequent density evaluations occur “at negligible cost” and support reliable inverse-transform sampling rests on the assumption that the eigenvalue-based formula remains numerically stable when some μ_i approach zero (as occurs when D^{1/2}X'X D^{1/2} becomes rank-deficient). No floating-point error analysis, condition-number monitoring, or safeguards against cancellation/overflow are provided, yet the two reported data sets are precisely those in which conditioning varies most across iterations.
  2. [§4.3] §4.3 (adaptive quadrature and inverse-transform step): the manuscript states that adaptive numerical integration is used to obtain the CDF for inverse-transform sampling but supplies neither error bounds, tail-mass diagnostics, nor verification that the resulting draws exactly target the collapsed density. Because any systematic bias in the τ draws would undermine the claimed mixing improvement, this verification is load-bearing for the central algorithmic claim.
minor comments (2)
  1. [§3.1] The dual versus primal forms of the Gram matrix are used interchangeably in the text without an explicit statement of when each is computationally preferable (n ≪ p versus p ≪ n).
  2. [§5] Figure 3 (trace plots) would benefit from overlaying the effective sample size or autocorrelation time for both the new sampler and the Metropolis baseline to quantify the mixing gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The comments highlight important numerical considerations that we will address in revision to strengthen the presentation of the sampler.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (spectral representation of the collapsed density): the claim that subsequent density evaluations occur “at negligible cost” and support reliable inverse-transform sampling rests on the assumption that the eigenvalue-based formula remains numerically stable when some μ_i approach zero (as occurs when D^{1/2}X'X D^{1/2} becomes rank-deficient). No floating-point error analysis, condition-number monitoring, or safeguards against cancellation/overflow are provided, yet the two reported data sets are precisely those in which conditioning varies most across iterations.

    Authors: We agree that numerical stability of the eigenvalue formula requires explicit discussion. While the algebraic identity is exact and our experiments on the reported data sets showed no visible artifacts, floating-point cancellation can occur for small μ_i when τ is large. In revision we will add a short subsection on numerical considerations, including monitoring of the smallest eigenvalues, use of stable primitives such as log1p, and a simple safeguard that clamps negligible μ_i to a small positive threshold. This will be accompanied by a brief condition-number diagnostic in the implementation. revision: yes

  2. Referee: [§4.3] §4.3 (adaptive quadrature and inverse-transform step): the manuscript states that adaptive numerical integration is used to obtain the CDF for inverse-transform sampling but supplies neither error bounds, tail-mass diagnostics, nor verification that the resulting draws exactly target the collapsed density. Because any systematic bias in the τ draws would undermine the claimed mixing improvement, this verification is load-bearing for the central algorithmic claim.

    Authors: We concur that explicit verification of the quadrature and inverse-transform procedure is necessary to substantiate the claim of exact sampling from the collapsed density. The current manuscript relies on a standard adaptive quadrature routine with default tolerances but does not report error bounds or empirical checks. In the revision we will include (i) quadrature error bounds derived from the adaptive integrator, (ii) tail-mass diagnostics, and (iii) simulation-based validation (e.g., Kolmogorov-Smirnov tests against rejection sampling in low-dimensional cases) confirming that the generated τ draws are unbiased for the target. These additions will appear in §4.3 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is a standard linear-algebraic optimization

full rationale

The paper's central construction derives an efficient expression for the collapsed conditional p(τ | ·) by rewriting the marginal likelihood in terms of the eigenvalues of the scaled Gram matrix (X D X' or dual), then precomputing those eigenvalues once per Gibbs scan. This step rests on the matrix determinant lemma and Woodbury identity applied to the Gaussian integral, which are external facts independent of the target sampler. No parameter is fitted to data and then renamed as a prediction; no self-citation supplies a uniqueness theorem or ansatz that the present work merely renames; and the subsequent adaptive quadrature plus inverse-transform sampling is a generic numerical procedure whose correctness does not presuppose the sampler's output. The method is therefore self-contained against external linear-algebra benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; the method rests on standard properties of the multivariate normal and horseshoe prior plus the novel claim that a single spectral decomposition per scan suffices for cheap density evaluations.

axioms (2)
  • standard math The joint posterior admits a collapsed conditional for the global scale τ after integrating out the regression coefficients.
    Foundation for the collapsed update described in the abstract.
  • domain assumption A strategic per-Gibbs-scan spectral decomposition renders evaluations of the collapsed density for many τ values computationally cheap.
    Central efficiency claim enabling direct sampling.

pith-pipeline@v0.9.0 · 5508 in / 1397 out tokens · 96652 ms · 2026-05-08T08:17:30.080857+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    2019, Journal of Open Source Software, 4, 1143, doi: 10.21105/joss.01143

    Ravin Kumar and Colin Carroll and Ari Hartikainen and Osvaldo Martin , title =. 2019 , publisher =. doi:10.21105/joss.01143 , url =

  2. [2]

    A tutorial on adaptive

    Andrieu, Christophe and Thoms, Johannes , journal=. A tutorial on adaptive. 2008 , publisher=

  3. [3]

    Fast sampling with

    Bhattacharya, Anirban and Chakraborty, Antik and Mallick, Bani K , journal=. Fast sampling with. 2016 , publisher=

  4. [4]

    Geometric ergodicity of

    Bhattacharya, Suman and Khare, Kshitij and Pal, Subhadip , journal=. Geometric ergodicity of. 2022 , publisher=

  5. [5]

    Berger and Jos

    James O. Berger and Jos. The Annals of Statistics , number =. 2009 , doi =

  6. [6]

    Artificial Intelligence and Statistics , pages=

    Handling sparsity via the horseshoe , author=. Artificial Intelligence and Statistics , pages=. 2009 , organization=

  7. [7]

    Nature Reviews Genetics , volume=

    Developing and evaluating polygenic risk prediction models for stratified disease prevention , author=. Nature Reviews Genetics , volume=. 2016 , publisher=

  8. [8]

    Journal of the American Statistical Association , volume=

    Computational and inferential difficulties with mixture posterior distributions , author=. Journal of the American Statistical Association , volume=. 2000 , publisher=

  9. [9]

    ACM Transactions on Modeling and Computer Simulation (TOMACS) , volume=

    Random variate generation by numerical inversion when only the density is known , author=. ACM Transactions on Modeling and Computer Simulation (TOMACS) , volume=. 2010 , publisher=

  10. [10]

    1986 , Address =

    Non-Uniform Random Variate Generation , Author =. 1986 , Address =

  11. [11]

    A Robust

    Dun, Yuzheng and Chatterjee, Nilanjan and Jin, Jin and Nishimura, Akihiko , journal=. A Robust

  12. [12]

    Pseudo-marginal

    Filippone, Maurizio and Girolami, Mark , journal=. Pseudo-marginal. 2014 , publisher=

  13. [13]

    Efficient

    Gelman, Andrew and Roberts, Gareth O and Gilks, Walter R , journal=. Efficient. 1996 , publisher=

  14. [14]

    Weak convergence and optimal scaling of random walk

    Gelman, Andrew and Gilks, Walter R and Roberts, Gareth O , journal=. Weak convergence and optimal scaling of random walk. 1997 , publisher=

  15. [15]

    and Stern, Hal S

    Gelman, Andrew and Carlin, John B. and Stern, Hal S. and Dunson, David B. and Vehtari, Aki and Rubin, Donald B. , year =. Bayesian Data Analysis , edition =

  16. [16]

    Stochastic relaxation,

    Geman, Stuart and Geman, Donald , journal=. Stochastic relaxation,. 1984 , publisher=

  17. [17]

    Efficient sampling for

    Hahn, P Richard and He, Jingyu and Lopes, Hedibert F , journal=. Efficient sampling for. 2019 , publisher=

  18. [18]

    Harville , title =

    David A. Harville , title =

  19. [19]

    Journal of Machine Learning Research , year =

    James Johndrow and Paulo Orenstein and Anirban Bhattacharya , title =. Journal of Machine Learning Research , year =

  20. [20]

    and Arshad, Faaizah and Blacketer, Clair and Bowring, Mary G

    Khera, Rohan and Dhingra, Lovedeep Singh and Aminorroaya, Arya and Li, Kelly and Zhou, Jin J. and Arshad, Faaizah and Blacketer, Clair and Bowring, Mary G. and Bu, Fan and Cook, Michael and Dorr, David A. and Duarte-Salles, Talita and DuVall, Scott L. and Falconer, Thomas and French, Tina E. and Hanchrow, Elizabeth E. and Horban, Scott and Lau, Wallis C. ...

  21. [21]

    Estimation of

    Lee, Se Yoon and Lei, Bowen and Mallick, Bani , journal=. Estimation of. 2020 , publisher=

  22. [22]

    IEEE Signal Processing Letters , volume=

    A simple sampler for the horseshoe estimator , author=. IEEE Signal Processing Letters , volume=. 2015 , publisher=

  23. [23]

    The Journal of Chemical Physics , volume=

    Equation of state calculations by fast computing machines , author=. The Journal of Chemical Physics , volume=. 1953 , publisher=

  24. [24]

    large n, large p

    Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in “large n, large p” Bayesian sparse regression , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

  25. [25]

    Shrinkage with shrunken shoulders:

    Nishimura, Akihiko and Suchard, Marc A , journal=. Shrinkage with shrunken shoulders:. 2023 , publisher=

  26. [26]

    doi:10.5281/zenodo.7773502 , url =

    Oliver Pain , title =. doi:10.5281/zenodo.7773502 , url =

  27. [27]

    Technical University of Denmark , volume=

    The matrix cookbook , author=. Technical University of Denmark , volume=

  28. [28]

    Electronic Journal of Statistics , number =

    Juho Piironen and Aki Vehtari , title =. Electronic Journal of Statistics , number =. 2017 , doi =

  29. [29]

    Plummer, Martyn and Best, Nicky and Cowles, Kate and Vines, Karen and others , journal=

  30. [30]

    Bayesian inference for logistic models using

    Polson, Nicholas G and Scott, James G and Windle, Jesse , journal=. Bayesian inference for logistic models using. 2013 , publisher=

  31. [31]

    Journal of the Royal Statistical Society: Series B: Statistical Methodology , pages=

    The bayesian bridge , author=. Journal of the Royal Statistical Society: Series B: Statistical Methodology , pages=. 2014 , publisher=

  32. [32]

    The American Journal of Human Genetics , year =

    Making the Most of Clumping and Thresholding for Polygenic Scores , author =. The American Journal of Human Genetics , year =. doi:10.1016/j.ajhg.2019.11.001 , issn =

  33. [33]

    2007 , publisher=

    Numerical recipes 3rd edition: The art of scientific computing , author=. 2007 , publisher=

  34. [34]

    Biometrika , volume=

    The central role of the propensity score in observational studies for causal effects , author=. Biometrika , volume=. 1983 , publisher=

  35. [35]

    Handbook of

    Tadesse, Mahlet G and Vannucci, Marina , year=. Handbook of

  36. [36]

    International Journal of Epidemiology , volume=

    Evaluating large-scale propensity score performance through real-world and synthetic data experiments , author=. International Journal of Epidemiology , volume=. 2018 , publisher=

  37. [37]

    Journal of the Royal Statistical Society: Series B: Statistical Methodology , volume=

    Regression shrinkage and selection via the lasso , author=. Journal of the Royal Statistical Society: Series B: Statistical Methodology , volume=. 1996 , publisher=

  38. [38]

    1997 , publisher=

    Numerical Linear Algebra , author=. 1997 , publisher=

  39. [39]

    BMC Medicine , volume=

    Comprehensive proteomics and platform validation of urinary biomarkers for bladder cancer diagnosis and staging , author=. BMC Medicine , volume=. 2023 , publisher=

  40. [40]

    Bayesian Analysis , number =

    Aki Vehtari and Andrew Gelman and Daniel Simpson and Bob Carpenter and Paul-Christian B. Bayesian Analysis , number =. 2021 , doi =

  41. [41]

    Cell Reports Medicine , volume=

    Gene-environment interactions explain a substantial portion of variability of common neuropsychiatric disorders , author=. Cell Reports Medicine , volume=. 2022 , publisher=

  42. [42]

    2022 , version =

    Zhang, Haoyu , publisher =. 2022 , version =. doi:10.7910/DVN/COXHAP , url =