Variable Fusion and Selection via a Spike-and-Slab Approach with Nonlocal Priors
Pith reviewed 2026-05-07 15:34 UTC · model grok-4.3
The pith
A spike-and-slab Bayesian method performs variable fusion and selection together in linear regression by using a tailored nonlocal prior as the slab component within the BMA framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a prior on latent variables representing the model structure enables efficient Gibbs sampling over the discrete space that accommodates both variable selection and fusion, while a nonlocal prior constructed explicitly for variable fusion serves as the slab distribution and delivers desirable model selection properties inside Bayesian model averaging.
What carries the argument
The prior distribution placed on latent variables that encode the fusion and selection pattern, together with the nonlocal slab prior adapted for fusion.
If this is right
- Both fusion and selection are handled inside a single BMA procedure rather than sequentially.
- The nonlocal slab construction preserves the strong model-selection advantages of nonlocal priors while supporting equality constraints among coefficients.
- Gibbs sampling guided by the latent-variable prior makes enumeration of the combined model space feasible.
- Theoretical results on model selection properties carry over from the nonlocal prior literature to the fused setting.
Where Pith is reading between the lines
- The approach may reduce effective parameter count in data with redundant predictors, potentially improving prediction stability without explicit regularization tuning.
- Extension to generalized linear models would require only re-deriving the likelihood while retaining the same latent-variable prior and nonlocal slab.
- In settings with many candidate fusions the Gibbs sampler's mixing rate becomes the practical bottleneck, suggesting diagnostics on chain convergence as a necessary check.
Load-bearing premise
The devised prior on latent variables allows the Gibbs sampler to explore the joint model space efficiently enough to reach good models, and the tailored nonlocal slab prior produces appropriate model selection behavior.
What would settle it
A controlled simulation in which the true data-generating process has known groups of identical nonzero coefficients; if the posterior mass fails to concentrate on those exact fused groups and instead spreads to incorrect fusions or selections, the practical performance claim would be refuted.
Figures
read the original abstract
Variable fusion in linear regression models is a statistical method that identifies covariates making similar contributions to the response variable and imposes the same coefficient values on them. Many methods for variable fusion also incorporate variable selection for practical reasons. In this paper, within the Bayesian model averaging (BMA) framework, we propose a spike-and-slab-based Bayesian method that performs both variable fusion and selection. This is challenging in the BMA framework because one must construct a discrete model space that accommodates both selection and fusion and assign suitable priors over that space. In the proposed method, we present a way to explore a model space for variable fusion and selection based on Gibbs sampling by devising a prior distribution for latent variables representing the model. Furthermore, among non-local priors with superior model selection properties, we construct a prior tailored for variable fusion and use it as the slab distribution. We examine the effectiveness of the proposed method through theoretical and empirical studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a spike-and-slab Bayesian method for simultaneous variable selection and fusion in linear regression within the BMA framework. It defines a prior on latent variables to encode both inclusion/exclusion and equality constraints across coefficients, uses Gibbs sampling to traverse the resulting discrete model space, and adapts a nonlocal prior as the slab component to exploit its selection properties. Theoretical analysis and empirical studies are invoked to support the method's effectiveness.
Significance. If the Gibbs sampler mixes reliably and the tailored nonlocal slab prior preserves its consistency advantages under fusion constraints, the work would contribute a principled Bayesian approach to grouped variable selection that integrates fusion directly into the model space rather than via post-processing. The explicit handling of the combined discrete space via latent indicators is a conceptual strength, though its value depends on computational tractability.
major comments (3)
- [Abstract] Abstract: The claim that 'theoretical and empirical studies examine effectiveness' is not supported by any details on derivations, simulation designs, or results. Without these, it is impossible to verify whether the math and data back the central assertions about model selection properties and practical performance.
- [Gibbs sampling procedure] Section describing the Gibbs sampling procedure: The central construction places a prior on latent indicators that jointly encodes selection (spike/slab) and fusion (equality constraints), then relies on Gibbs sampling to explore the space. The cardinality is the product of the Bell number (for partitions) and 2^p (for selection patterns), yet the manuscript provides no mixing-time bounds, no high-dimensional diagnostics (e.g., autocorrelation of the number of distinct groups or effective sample sizes), and no comparison to standard spike-and-slab samplers. Slow mixing would render the BMA posterior approximation unreliable and undermine both the theoretical claims and the empirical results.
- [Nonlocal slab prior construction] Section on the nonlocal slab prior: The paper constructs a nonlocal prior 'tailored for variable fusion' to serve as the slab distribution and asserts superior model selection properties. However, no explicit form, adaptation steps from standard nonlocal priors, or proof that the fusion constraints preserve the desired consistency or selection behavior are supplied, leaving the load-bearing claim about improved properties unsupported.
minor comments (1)
- [Abstract] Abstract: The sentence 'devise a prior distribution for latent variables representing the model' is imprecise; it should explicitly state what the latent variables encode (inclusion indicators and partition indicators) to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions made to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'theoretical and empirical studies examine effectiveness' is not supported by any details on derivations, simulation designs, or results. Without these, it is impossible to verify whether the math and data back the central assertions about model selection properties and practical performance.
Authors: We agree that the abstract would benefit from greater specificity to support the stated claims. In the revised manuscript, we have updated the abstract to briefly outline the key theoretical results on posterior consistency for both selection and fusion, along with the main features of the simulation designs (e.g., varying dimensions and performance metrics) and real-data applications. This provides the necessary context while remaining concise. revision: yes
-
Referee: [Gibbs sampling procedure] Section describing the Gibbs sampling procedure: The central construction places a prior on latent indicators that jointly encodes selection (spike/slab) and fusion (equality constraints), then relies on Gibbs sampling to explore the space. The cardinality is the product of the Bell number (for partitions) and 2^p (for selection patterns), yet the manuscript provides no mixing-time bounds, no high-dimensional diagnostics (e.g., autocorrelation of the number of distinct groups or effective sample sizes), and no comparison to standard spike-and-slab samplers. Slow mixing would render the BMA posterior approximation unreliable and undermine both the theoretical claims and the empirical results.
Authors: We acknowledge the referee's concern regarding the large cardinality of the model space and the need to assess sampler reliability. Theoretical mixing-time bounds are not derived in this work, as obtaining such bounds for this discrete space is technically challenging and lies beyond the current scope; we note this limitation explicitly. In the revision, we have added empirical mixing diagnostics in Section 4, including autocorrelation functions and effective sample sizes for the number of distinct groups, as well as a direct comparison of mixing behavior to a standard spike-and-slab sampler. These additions support the practical performance of the sampler in the reported settings. revision: partial
-
Referee: [Nonlocal slab prior construction] Section on the nonlocal slab prior: The paper constructs a nonlocal prior 'tailored for variable fusion' to serve as the slab distribution and asserts superior model selection properties. However, no explicit form, adaptation steps from standard nonlocal priors, or proof that the fusion constraints preserve the desired consistency or selection behavior are supplied, leaving the load-bearing claim about improved properties unsupported.
Authors: We thank the referee for highlighting this gap in presentation. The original manuscript provided only a high-level description of the tailored prior. In the revised version, we have expanded the relevant section to give the explicit mathematical form (an adaptation of the product-moment nonlocal prior applied to the distinct coefficient values within each partition), detail the adaptation steps from standard nonlocal priors, and include a proof that the consistency and selection properties are preserved under the fusion constraints, since the prior still places zero mass at the origin for the slab components of included variables. revision: yes
- Deriving theoretical mixing-time bounds for the Gibbs sampler over the combined selection-and-fusion model space.
Circularity Check
No circularity: new prior constructions are definitional, not reductions to inputs
full rationale
The paper proposes a new spike-and-slab model with a latent-variable prior enabling Gibbs exploration of the combined fusion-selection space and a tailored nonlocal slab prior. These are explicit constructions of priors and samplers rather than derivations that reduce by construction to fitted parameters, self-citations, or renamed known results. No load-bearing step equates a claimed prediction or uniqueness result to its own inputs; the method is self-contained as a modeling proposal whose properties are then studied theoretically and empirically.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The data follow a linear regression model with additive Gaussian noise.
- domain assumption Nonlocal priors possess superior model selection properties compared with local priors.
Reference graph
Works this paper leans on
-
[1]
Bartlett, M. S. (1957). A comment on d. v. lindley’s statistical paradox. Biometrika, 44(3- 4):533–534
work page 1957
-
[2]
Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses. Stat. Sci., 2(3):317–335. Bühlmann, P., Drineas, P., Kane, M., and van der Laan, M. (2016). Handbook of big data . CRC Press
work page 1987
-
[3]
Clyde, M. A. and George, E. I. (2004). Model uncertainty. Stat. Sci. , 19(1):81–94
work page 2004
-
[4]
Dawid, A. P. (1999). The trouble with bayes factors. Technical report, University College
work page 1999
-
[5]
Fan, J., Li, R., Zhang, C.-H., and Zou, H. (2020). Statistical foundations of data science . Chapman and Hall/CRC
work page 2020
-
[6]
Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science, 296(5576):2225–2229
work page 2002
-
[7]
George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. J. Am. Stat. Assoc., 88(423):881–889
work page 1993
-
[8]
Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Stat. Sci. , 14(4):382–417
work page 1999
-
[9]
Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. , 33(2):730–773. 25
work page 2005
-
[10]
Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 72(2):143–170
work page 2010
-
[11]
Kakikawa, Y., Shimamura, K., and Kawano, S. (2023). Bayesian fused lasso modeling via horseshoe prior. Jpn. J. Stat. Data Sci. , 6(2):705–727
work page 2023
-
[12]
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Am. Stat. Assoc. , 90(430):773–795
work page 1995
-
[13]
Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and bayesian lassos. Bayesian Anal., 5(2):369–412
work page 2010
-
[14]
Land, S. and Friedman, J. (1996). Variable fusion: a new method of adaptive signal regres- sion. Technical report, Department of Statistics, Stanford University
work page 1996
-
[15]
Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1-2):187–192
work page 1957
-
[16]
W., Pershad, Y., and Altman, R
McInnes, G., Yee, S. W., Pershad, Y., and Altman, R. B. (2021). Genomewide association studies in pharmacogenomics. Clin. Pharmacol. Ther. , 110(3):637–648
work page 2021
-
[17]
Osborne, B. G., Fearn, T., Miller, A. R., and Douglas, S. (1984). Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs. J. Sci. Food Agric. , 35(1):99–105. Ročková, V. and George, E. I. (2018). The spike-and-slab lasso. J. Am. Stat. Assoc. , 113(521):431–444
work page 1984
-
[18]
Rossell, D. and Telesca, D. (2017). Nonlocal priors for high-dimensional estimation. J. Am. Stat. Assoc., 112(517):254–265
work page 2017
-
[19]
Rossell, D., Telesca, D., and Johnson, V. E. (2013). High-dimensional bayesian classifiers using non-local priors. In Stat. Models Data Anal. XV , pages 305–314. Springer
work page 2013
-
[20]
Shi, G., Lim, C. Y., and Maiti, T. (2019). Model selection using mass-nonlocal prior. Stat. Probab. Lett., 147:36–44
work page 2019
-
[21]
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 67(1):91–108
work page 2005
-
[22]
Walker, A. M. (1969). On the asymptotic behavior of posterior distributions. J. R. Stat. Soc. Ser. B Methodol. , 31(1):80–88
work page 1969
-
[23]
Wu, S., Shimamura, K., Yoshikawa, K., Murayama, K., and Kawano, S. (2021). Variable fusion for bayesian linear regression via spike-and-slab priors. In Proc. 13th KES Int. Conf. Intell. Decis. Technol. , pages 491–501. Springer. 26 A Impossibility of a Uniform Prior under Componen- twise Independence A proof of Proposition 1 is as follows. Proof. It suffic...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.