Variable Fusion and Selection via a Spike-and-Slab Approach with Nonlocal Priors

Akira Okazaki; Junya Miyake; Shuichi Kawano

arxiv: 2604.25268 · v1 · submitted 2026-04-28 · 📊 stat.ME

Variable Fusion and Selection via a Spike-and-Slab Approach with Nonlocal Priors

Junya Miyake , Akira Okazaki , Shuichi Kawano This is my paper

Pith reviewed 2026-05-07 15:34 UTC · model grok-4.3

classification 📊 stat.ME

keywords variable fusionvariable selectionspike-and-slabnonlocal priorsBayesian model averaginglinear regressionGibbs sampling

0 comments

The pith

A spike-and-slab Bayesian method performs variable fusion and selection together in linear regression by using a tailored nonlocal prior as the slab component within the BMA framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a Bayesian approach that identifies groups of covariates with similar effects on the response and assigns them identical coefficients while also choosing which covariates to retain. It works inside Bayesian model averaging by building a discrete space of possible fused-and-selected models and placing priors over that space. The authors devise a distribution on latent variables that encode the fusion and selection pattern, which lets Gibbs sampling traverse the space. They further adapt nonlocal priors, which have strong selection properties, into a slab distribution designed specifically for the fusion task. Theoretical analysis and simulations support that the combined procedure yields models with good selection behavior.

Core claim

The central claim is that a prior on latent variables representing the model structure enables efficient Gibbs sampling over the discrete space that accommodates both variable selection and fusion, while a nonlocal prior constructed explicitly for variable fusion serves as the slab distribution and delivers desirable model selection properties inside Bayesian model averaging.

What carries the argument

The prior distribution placed on latent variables that encode the fusion and selection pattern, together with the nonlocal slab prior adapted for fusion.

If this is right

Both fusion and selection are handled inside a single BMA procedure rather than sequentially.
The nonlocal slab construction preserves the strong model-selection advantages of nonlocal priors while supporting equality constraints among coefficients.
Gibbs sampling guided by the latent-variable prior makes enumeration of the combined model space feasible.
Theoretical results on model selection properties carry over from the nonlocal prior literature to the fused setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may reduce effective parameter count in data with redundant predictors, potentially improving prediction stability without explicit regularization tuning.
Extension to generalized linear models would require only re-deriving the likelihood while retaining the same latent-variable prior and nonlocal slab.
In settings with many candidate fusions the Gibbs sampler's mixing rate becomes the practical bottleneck, suggesting diagnostics on chain convergence as a necessary check.

Load-bearing premise

The devised prior on latent variables allows the Gibbs sampler to explore the joint model space efficiently enough to reach good models, and the tailored nonlocal slab prior produces appropriate model selection behavior.

What would settle it

A controlled simulation in which the true data-generating process has known groups of identical nonzero coefficients; if the posterior mass fails to concentrate on those exact fused groups and instead spreads to incorrect fusions or selections, the practical performance claim would be refuted.

Figures

Figures reproduced from arXiv: 2604.25268 by Akira Okazaki, Junya Miyake, Shuichi Kawano.

**Figure 1.** Figure 1: A comparison of the nonlocal priors (pMOM, piMOM, peMOM) with the standard normal density under σ 2 = 1. It is known that, regardless of whether local priors or nonlocal priors are used, for a model M that excludes some or all covariates contributing to the response, the posterior probability P(M | y) decays at an exponential order under some regularity conditions when the prior probability P(M) and p are … view at source ↗

read the original abstract

Variable fusion in linear regression models is a statistical method that identifies covariates making similar contributions to the response variable and imposes the same coefficient values on them. Many methods for variable fusion also incorporate variable selection for practical reasons. In this paper, within the Bayesian model averaging (BMA) framework, we propose a spike-and-slab-based Bayesian method that performs both variable fusion and selection. This is challenging in the BMA framework because one must construct a discrete model space that accommodates both selection and fusion and assign suitable priors over that space. In the proposed method, we present a way to explore a model space for variable fusion and selection based on Gibbs sampling by devising a prior distribution for latent variables representing the model. Furthermore, among non-local priors with superior model selection properties, we construct a prior tailored for variable fusion and use it as the slab distribution. We examine the effectiveness of the proposed method through theoretical and empirical studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper builds a latent-variable Gibbs sampler for joint fusion and selection inside spike-and-slab BMA and adapts a nonlocal prior as the slab, but gives no diagnostics showing the chain mixes beyond small p.

read the letter

The main thing to know is that the authors have made fusion and selection compatible inside Bayesian model averaging by defining latent indicators for both the zero/nonzero status and the equality constraints across coefficients, then sampling the resulting discrete space with Gibbs. They also replace the usual slab with a nonlocal prior that is adjusted to the fused-coefficient setting. This is a direct extension of existing spike-and-slab work rather than a wholesale new framework, but the latent construction is a practical way to avoid enumerating partitions explicitly. The nonlocal slab choice is sensible because those priors already have better selection consistency than local alternatives in simpler settings. The paper does a reasonable job of writing down the model and the sampler in usable form. The abstract states that theoretical and empirical checks were done, which at least shows the authors took the properties seriously. The soft spot is exactly the one the stress-test flags: the space of admissible fusion patterns grows with the Bell number and is multiplied by the usual 2^p selection patterns, so even a well-designed Gibbs step can mix slowly once p reaches the teens or twenties. Nothing in the provided material shows autocorrelation plots, effective sample sizes, or results for anything beyond toy dimensions, so it is still unclear whether the reported BMA weights over fused models are reliable. This is not a load-bearing contradiction, just an unaddressed computational gap. The work is aimed at statisticians already using BMA who need to handle grouped predictors for interpretability. A reader in that niche can extract the modeling idea and the sampler outline, though they would have to add their own convergence checks. It is solid enough on its own terms to deserve a serious referee rather than a desk rejection; the core construction is coherent and the nonlocal adaptation is a legitimate move. I would send it out, with the main request being evidence on mixing behavior in moderate dimensions.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a spike-and-slab Bayesian method for simultaneous variable selection and fusion in linear regression within the BMA framework. It defines a prior on latent variables to encode both inclusion/exclusion and equality constraints across coefficients, uses Gibbs sampling to traverse the resulting discrete model space, and adapts a nonlocal prior as the slab component to exploit its selection properties. Theoretical analysis and empirical studies are invoked to support the method's effectiveness.

Significance. If the Gibbs sampler mixes reliably and the tailored nonlocal slab prior preserves its consistency advantages under fusion constraints, the work would contribute a principled Bayesian approach to grouped variable selection that integrates fusion directly into the model space rather than via post-processing. The explicit handling of the combined discrete space via latent indicators is a conceptual strength, though its value depends on computational tractability.

major comments (3)

[Abstract] Abstract: The claim that 'theoretical and empirical studies examine effectiveness' is not supported by any details on derivations, simulation designs, or results. Without these, it is impossible to verify whether the math and data back the central assertions about model selection properties and practical performance.
[Gibbs sampling procedure] Section describing the Gibbs sampling procedure: The central construction places a prior on latent indicators that jointly encodes selection (spike/slab) and fusion (equality constraints), then relies on Gibbs sampling to explore the space. The cardinality is the product of the Bell number (for partitions) and 2^p (for selection patterns), yet the manuscript provides no mixing-time bounds, no high-dimensional diagnostics (e.g., autocorrelation of the number of distinct groups or effective sample sizes), and no comparison to standard spike-and-slab samplers. Slow mixing would render the BMA posterior approximation unreliable and undermine both the theoretical claims and the empirical results.
[Nonlocal slab prior construction] Section on the nonlocal slab prior: The paper constructs a nonlocal prior 'tailored for variable fusion' to serve as the slab distribution and asserts superior model selection properties. However, no explicit form, adaptation steps from standard nonlocal priors, or proof that the fusion constraints preserve the desired consistency or selection behavior are supplied, leaving the load-bearing claim about improved properties unsupported.

minor comments (1)

[Abstract] Abstract: The sentence 'devise a prior distribution for latent variables representing the model' is imprecise; it should explicitly state what the latent variables encode (inclusion indicators and partition indicators) to improve clarity.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions made to strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'theoretical and empirical studies examine effectiveness' is not supported by any details on derivations, simulation designs, or results. Without these, it is impossible to verify whether the math and data back the central assertions about model selection properties and practical performance.

Authors: We agree that the abstract would benefit from greater specificity to support the stated claims. In the revised manuscript, we have updated the abstract to briefly outline the key theoretical results on posterior consistency for both selection and fusion, along with the main features of the simulation designs (e.g., varying dimensions and performance metrics) and real-data applications. This provides the necessary context while remaining concise. revision: yes
Referee: [Gibbs sampling procedure] Section describing the Gibbs sampling procedure: The central construction places a prior on latent indicators that jointly encodes selection (spike/slab) and fusion (equality constraints), then relies on Gibbs sampling to explore the space. The cardinality is the product of the Bell number (for partitions) and 2^p (for selection patterns), yet the manuscript provides no mixing-time bounds, no high-dimensional diagnostics (e.g., autocorrelation of the number of distinct groups or effective sample sizes), and no comparison to standard spike-and-slab samplers. Slow mixing would render the BMA posterior approximation unreliable and undermine both the theoretical claims and the empirical results.

Authors: We acknowledge the referee's concern regarding the large cardinality of the model space and the need to assess sampler reliability. Theoretical mixing-time bounds are not derived in this work, as obtaining such bounds for this discrete space is technically challenging and lies beyond the current scope; we note this limitation explicitly. In the revision, we have added empirical mixing diagnostics in Section 4, including autocorrelation functions and effective sample sizes for the number of distinct groups, as well as a direct comparison of mixing behavior to a standard spike-and-slab sampler. These additions support the practical performance of the sampler in the reported settings. revision: partial
Referee: [Nonlocal slab prior construction] Section on the nonlocal slab prior: The paper constructs a nonlocal prior 'tailored for variable fusion' to serve as the slab distribution and asserts superior model selection properties. However, no explicit form, adaptation steps from standard nonlocal priors, or proof that the fusion constraints preserve the desired consistency or selection behavior are supplied, leaving the load-bearing claim about improved properties unsupported.

Authors: We thank the referee for highlighting this gap in presentation. The original manuscript provided only a high-level description of the tailored prior. In the revised version, we have expanded the relevant section to give the explicit mathematical form (an adaptation of the product-moment nonlocal prior applied to the distinct coefficient values within each partition), detail the adaptation steps from standard nonlocal priors, and include a proof that the consistency and selection properties are preserved under the fusion constraints, since the prior still places zero mass at the origin for the slab components of included variables. revision: yes

standing simulated objections not resolved

Deriving theoretical mixing-time bounds for the Gibbs sampler over the combined selection-and-fusion model space.

Circularity Check

0 steps flagged

No circularity: new prior constructions are definitional, not reductions to inputs

full rationale

The paper proposes a new spike-and-slab model with a latent-variable prior enabling Gibbs exploration of the combined fusion-selection space and a tailored nonlocal slab prior. These are explicit constructions of priors and samplers rather than derivations that reduce by construction to fitted parameters, self-citations, or renamed known results. No load-bearing step equates a claimed prediction or uniqueness result to its own inputs; the method is self-contained as a modeling proposal whose properties are then studied theoretically and empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard linear regression assumptions and the claimed superior model selection properties of nonlocal priors; specific prior hyperparameters and the exact form of the latent-variable prior are not detailed in the abstract and would constitute free parameters.

axioms (2)

domain assumption The data follow a linear regression model with additive Gaussian noise.
Implicit in the setup of variable fusion and selection for linear models.
domain assumption Nonlocal priors possess superior model selection properties compared with local priors.
Stated as motivation for choosing the slab distribution.

pith-pipeline@v0.9.0 · 5460 in / 1339 out tokens · 83060 ms · 2026-05-07T15:34:53.029670+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Bartlett, M. S. (1957). A comment on d. v. lindley’s statistical paradox. Biometrika, 44(3- 4):533–534

work page 1957
[2]

Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses. Stat. Sci., 2(3):317–335. Bühlmann, P., Drineas, P., Kane, M., and van der Laan, M. (2016). Handbook of big data . CRC Press

work page 1987
[3]

Clyde, M. A. and George, E. I. (2004). Model uncertainty. Stat. Sci. , 19(1):81–94

work page 2004
[4]

Dawid, A. P. (1999). The trouble with bayes factors. Technical report, University College

work page 1999
[5]

Fan, J., Li, R., Zhang, C.-H., and Zou, H. (2020). Statistical foundations of data science . Chapman and Hall/CRC

work page 2020
[6]

B., Schaffner, S

Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science, 296(5576):2225–2229

work page 2002
[7]

George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. J. Am. Stat. Assoc., 88(423):881–889

work page 1993
[8]

A., Madigan, D., Raftery, A

Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Stat. Sci. , 14(4):382–417

work page 1999
[9]

and Rao, J

Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. , 33(2):730–773. 25

work page 2005
[10]

Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 72(2):143–170

work page 2010
[11]

Kakikawa, Y., Shimamura, K., and Kawano, S. (2023). Bayesian fused lasso modeling via horseshoe prior. Jpn. J. Stat. Data Sci. , 6(2):705–727

work page 2023
[12]

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Am. Stat. Assoc. , 90(430):773–795

work page 1995
[13]

Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and bayesian lassos. Bayesian Anal., 5(2):369–412

work page 2010
[14]

and Friedman, J

Land, S. and Friedman, J. (1996). Variable fusion: a new method of adaptive signal regres- sion. Technical report, Department of Statistics, Stanford University

work page 1996
[15]

Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1-2):187–192

work page 1957
[16]

W., Pershad, Y., and Altman, R

McInnes, G., Yee, S. W., Pershad, Y., and Altman, R. B. (2021). Genomewide association studies in pharmacogenomics. Clin. Pharmacol. Ther. , 110(3):637–648

work page 2021
[17]

G., Fearn, T., Miller, A

Osborne, B. G., Fearn, T., Miller, A. R., and Douglas, S. (1984). Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs. J. Sci. Food Agric. , 35(1):99–105. Ročková, V. and George, E. I. (2018). The spike-and-slab lasso. J. Am. Stat. Assoc. , 113(521):431–444

work page 1984
[18]

and Telesca, D

Rossell, D. and Telesca, D. (2017). Nonlocal priors for high-dimensional estimation. J. Am. Stat. Assoc., 112(517):254–265

work page 2017
[19]

Rossell, D., Telesca, D., and Johnson, V. E. (2013). High-dimensional bayesian classifiers using non-local priors. In Stat. Models Data Anal. XV , pages 305–314. Springer

work page 2013
[20]

Y., and Maiti, T

Shi, G., Lim, C. Y., and Maiti, T. (2019). Model selection using mass-nonlocal prior. Stat. Probab. Lett., 147:36–44

work page 2019
[21]

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 67(1):91–108

work page 2005
[22]

Walker, A. M. (1969). On the asymptotic behavior of posterior distributions. J. R. Stat. Soc. Ser. B Methodol. , 31(1):80–88

work page 1969
[23]

Wu, S., Shimamura, K., Yoshikawa, K., Murayama, K., and Kawano, S. (2021). Variable fusion for bayesian linear regression via spike-and-slab priors. In Proc. 13th KES Int. Conf. Intell. Decis. Technol. , pages 491–501. Springer. 26 A Impossibility of a Uniform Prior under Componen- twise Independence A proof of Proposition 1 is as follows. Proof. It suﬀic...

work page 2021

[1] [1]

Bartlett, M. S. (1957). A comment on d. v. lindley’s statistical paradox. Biometrika, 44(3- 4):533–534

work page 1957

[2] [2]

Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses. Stat. Sci., 2(3):317–335. Bühlmann, P., Drineas, P., Kane, M., and van der Laan, M. (2016). Handbook of big data . CRC Press

work page 1987

[3] [3]

Clyde, M. A. and George, E. I. (2004). Model uncertainty. Stat. Sci. , 19(1):81–94

work page 2004

[4] [4]

Dawid, A. P. (1999). The trouble with bayes factors. Technical report, University College

work page 1999

[5] [5]

Fan, J., Li, R., Zhang, C.-H., and Zou, H. (2020). Statistical foundations of data science . Chapman and Hall/CRC

work page 2020

[6] [6]

B., Schaffner, S

Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science, 296(5576):2225–2229

work page 2002

[7] [7]

George, E. I. and McCulloch, R. E. (1993). Variable selection via gibbs sampling. J. Am. Stat. Assoc., 88(423):881–889

work page 1993

[8] [8]

A., Madigan, D., Raftery, A

Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Stat. Sci. , 14(4):382–417

work page 1999

[9] [9]

and Rao, J

Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and bayesian strategies. Ann. Stat. , 33(2):730–773. 25

work page 2005

[10] [10]

Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 72(2):143–170

work page 2010

[11] [11]

Kakikawa, Y., Shimamura, K., and Kawano, S. (2023). Bayesian fused lasso modeling via horseshoe prior. Jpn. J. Stat. Data Sci. , 6(2):705–727

work page 2023

[12] [12]

Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Am. Stat. Assoc. , 90(430):773–795

work page 1995

[13] [13]

Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and bayesian lassos. Bayesian Anal., 5(2):369–412

work page 2010

[14] [14]

and Friedman, J

Land, S. and Friedman, J. (1996). Variable fusion: a new method of adaptive signal regres- sion. Technical report, Department of Statistics, Stanford University

work page 1996

[15] [15]

Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1-2):187–192

work page 1957

[16] [16]

W., Pershad, Y., and Altman, R

McInnes, G., Yee, S. W., Pershad, Y., and Altman, R. B. (2021). Genomewide association studies in pharmacogenomics. Clin. Pharmacol. Ther. , 110(3):637–648

work page 2021

[17] [17]

G., Fearn, T., Miller, A

Osborne, B. G., Fearn, T., Miller, A. R., and Douglas, S. (1984). Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs. J. Sci. Food Agric. , 35(1):99–105. Ročková, V. and George, E. I. (2018). The spike-and-slab lasso. J. Am. Stat. Assoc. , 113(521):431–444

work page 1984

[18] [18]

and Telesca, D

Rossell, D. and Telesca, D. (2017). Nonlocal priors for high-dimensional estimation. J. Am. Stat. Assoc., 112(517):254–265

work page 2017

[19] [19]

Rossell, D., Telesca, D., and Johnson, V. E. (2013). High-dimensional bayesian classifiers using non-local priors. In Stat. Models Data Anal. XV , pages 305–314. Springer

work page 2013

[20] [20]

Y., and Maiti, T

Shi, G., Lim, C. Y., and Maiti, T. (2019). Model selection using mass-nonlocal prior. Stat. Probab. Lett., 147:36–44

work page 2019

[21] [21]

Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. , 67(1):91–108

work page 2005

[22] [22]

Walker, A. M. (1969). On the asymptotic behavior of posterior distributions. J. R. Stat. Soc. Ser. B Methodol. , 31(1):80–88

work page 1969

[23] [23]

Wu, S., Shimamura, K., Yoshikawa, K., Murayama, K., and Kawano, S. (2021). Variable fusion for bayesian linear regression via spike-and-slab priors. In Proc. 13th KES Int. Conf. Intell. Decis. Technol. , pages 491–501. Springer. 26 A Impossibility of a Uniform Prior under Componen- twise Independence A proof of Proposition 1 is as follows. Proof. It suﬀic...

work page 2021