pith. sign in

arxiv: 2604.27791 · v1 · submitted 2026-04-30 · 📊 stat.ME

Reversible Jump MCMC With No Regrets: Bayesian Variable Selection Using Mixtures of Mutually Singular Distributions

Pith reviewed 2026-05-07 05:44 UTC · model grok-4.3

classification 📊 stat.ME
keywords Bayesian variable selectionreversible jump MCMCmixtures of distributionsspike-and-slab priorfixed-dimensional MCMCmodel selectionMetropolis-Hastings
0
0 comments X

The pith

Bayesian variable selection can be performed with standard fixed-dimensional MCMC by representing models as mixtures of mutually singular distributions in one parameter space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Bayesian variable selection, which normally requires reversible jump MCMC to handle changing model dimensions, can instead be done by embedding all models into a single fixed-dimensional space. It does this by partitioning the space into mutually singular subspaces, one for each model, so that sampling from the resulting mixture distribution exactly recovers the desired posterior over models and parameters. This formulation matches the classic spike-and-slab prior interpretation exactly and, when constructed properly, produces the same Metropolis-Hastings acceptance probabilities as reversible jump methods. On a ten-predictor benchmark the approach matches the exact posterior inclusion probabilities from full enumeration while delivering comparable or better effective sample size per second than a tuned reversible jump sampler. The method is further demonstrated on mixed-effects logistic regression and factor-loading selection, showing that variable selection becomes feasible inside ordinary MCMC software.

Core claim

Mixtures of mutually singular distributions embed competing models inside a fixed-dimensional parameter space by partitioning it into subspaces that are mutually singular, one per model. When the subspaces and densities are chosen appropriately, the resulting mixture posterior is identical to the target posterior that places discrete probability on models and continuous probability on their parameters. Under these same constructions the Metropolis-Hastings acceptance ratio for moves between models is identical to the ratio used in reversible jump MCMC. Consequently, standard fixed-dimensional MCMC algorithms can be used for Bayesian variable selection without any dimension-changing jumps.

What carries the argument

Mixtures of mutually singular distributions (MoMS), which partition a single fixed-dimensional parameter space into mutually singular subspaces (one per model) so that sampling from the mixture recovers the exact model-and-parameter posterior.

If this is right

  • The MoMS posterior is identical to the spike-and-slab posterior used in Bayesian variable selection.
  • When the subspaces and proposal kernels are aligned correctly, MoMS and reversible jump MCMC share the identical Metropolis-Hastings acceptance probability.
  • On a ten-predictor linear regression benchmark both MoMS and a carefully tuned reversible jump sampler recover posterior inclusion probabilities that match full enumeration.
  • MoMS achieves effective sample size per second that is comparable to or higher than that of a well-engineered reversible jump implementation.
  • The same fixed-dimensional construction applies directly to mixed-effects logistic regression and to factor-loading selection in multidimensional item response models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners can now implement Bayesian variable selection inside existing fixed-dimensional MCMC packages without writing custom dimension-changing proposals.
  • The approach may extend to other model-selection settings that currently rely on reversible jump, such as change-point detection or graphical model selection.
  • Because the method stays inside a single fixed-dimensional space, it may simplify convergence diagnostics and parallelization relative to reversible jump samplers.

Load-bearing premise

The parameter space can be partitioned into mutually singular subspaces, one per model, such that the resulting mixture posterior is exactly the target posterior over models and parameters.

What would settle it

On a small problem whose true posterior can be obtained by full enumeration, draw samples from the MoMS posterior and check whether the estimated model probabilities and parameter marginals differ from the enumerated values beyond Monte Carlo error.

read the original abstract

Bayesian variable selection requires sampling from a posterior distribution that combines discrete model indicators with continuously varying parameters, a challenge often addressed through reversible jump Markov chain Monte Carlo (RJMCMC). Despite its generality, RJMCMC is widely regarded as difficult to design and implement correctly. We present mixtures of mutually singular (MoMS) distributions as a transparent alternative in which competing models are represented within a single fixed-dimensional parameter space partitioned into mutually singular subspaces. We show that this formulation reproduces the exact spike-and-slab interpretation of Bayesian variable selection and that, under appropriate constructions, MoMS and RJMCMC share the same Metropolis--Hastings acceptance probability. On a benchmark dataset with ten predictors, both methods recover posterior inclusion probabilities that match full enumeration, while MoMS achieves comparable or superior effective sample size per second relative to a carefully engineered RJMCMC scheme. We further illustrate the approach in a mixed-effects logistic regression for a sleep-and-memory experiment and in factor-loading selection for a multidimensional generalized partial credit model. Together, these results show that Bayesian variable selection can be carried out within standard fixed-dimensional Markov chain Monte Carlo methodology -- without regret.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes mixtures of mutually singular distributions (MoMS) as an alternative to reversible jump MCMC for Bayesian variable selection. It asserts that MoMS exactly recovers the spike-and-slab posterior in a fixed-dimensional space and that, with appropriately constructed proposals, the Metropolis-Hastings acceptance probabilities coincide with those of RJMCMC. The approach is validated empirically on a benchmark with ten predictors where posterior inclusion probabilities match those from full enumeration, and demonstrated in two applied settings: a mixed-effects logistic regression and factor loading selection in a generalized partial credit model.

Significance. Should the theoretical claims be substantiated with complete derivations, the contribution would be notable for potentially simplifying the implementation of Bayesian variable selection by embedding it in standard MCMC samplers without explicit dimension jumping. The paper earns credit for the direct empirical comparison to exhaustive enumeration on the benchmark dataset and for extending the method to mixed-effects and item-response models. However, the measure-theoretic challenges in transitioning between singular subspaces may limit the extent to which the method truly avoids the complexities of RJMCMC.

major comments (2)
  1. [Abstract] The central claim that MoMS reproduces the exact spike-and-slab interpretation and shares the Metropolis-Hastings acceptance probability with RJMCMC is presented without an accompanying derivation or proof sketch. A detailed proof is load-bearing for the equivalence assertion and should be provided explicitly, perhaps in a dedicated theoretical section.
  2. [MoMS target and proposal kernels] The partition of the parameter space into mutually singular subspaces implies that any proposal absolutely continuous w.r.t. Lebesgue measure on R^p assigns zero probability to other model subspaces. Therefore, the 'appropriate constructions' for proposals must incorporate singular components analogous to RJMCMC's dimension-matching proposals. This raises the question whether the method genuinely permits 'standard fixed-dimensional MCMC' or merely relocates the engineering effort into the proposal design. Clarify the explicit form of the proposal kernel and demonstrate that it does not require the same Jacobian and matching calculations as RJMCMC.
minor comments (1)
  1. [Empirical results] Specify the exact benchmark dataset used (e.g., its name or source) to facilitate reproducibility of the posterior inclusion probability comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough and constructive review. The comments highlight important aspects of the theoretical presentation and implementation details that we will address to improve clarity and rigor. We respond to each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] The central claim that MoMS reproduces the exact spike-and-slab interpretation and shares the Metropolis-Hastings acceptance probability with RJMCMC is presented without an accompanying derivation or proof sketch. A detailed proof is load-bearing for the equivalence assertion and should be provided explicitly, perhaps in a dedicated theoretical section.

    Authors: We agree that the equivalence claims are central and benefit from a more explicit, self-contained derivation. Although the current manuscript develops the key arguments across Sections 2 and 3, we acknowledge that a consolidated proof would make the contribution more accessible. In the revised version we will add a dedicated theoretical section (new Section 2.3) containing a complete, step-by-step derivation: first showing that the MoMS target measure exactly recovers the spike-and-slab posterior, and second proving that the Metropolis-Hastings acceptance probability equals that of the corresponding RJMCMC transition under the stated proposal constructions. A concise proof sketch will also be inserted into the abstract to foreground the main result. revision: yes

  2. Referee: [MoMS target and proposal kernels] The partition of the parameter space into mutually singular subspaces implies that any proposal absolutely continuous w.r.t. Lebesgue measure on R^p assigns zero probability to other model subspaces. Therefore, the 'appropriate constructions' for proposals must incorporate singular components analogous to RJMCMC's dimension-matching proposals. This raises the question whether the method genuinely permits 'standard fixed-dimensional MCMC' or merely relocates the engineering effort into the proposal design. Clarify the explicit form of the proposal kernel and demonstrate that it does not require the same Jacobian and matching calculations as RJMCMC.

    Authors: We appreciate this incisive observation. The ambient parameter space remains fixed-dimensional (R^p), so the sampler operates entirely within a single Euclidean space without explicit dimension changes; this is what permits the use of standard fixed-dimensional MCMC code and software. The proposal kernels are mixtures that place positive mass on the singular subspaces via Dirac components (e.g., by deterministically setting coordinates corresponding to excluded variables to zero), but because the dimension never changes there are no Jacobian determinants or dimension-matching functions to compute. In the revision we will add an explicit subsection (new Section 3.2) that writes the proposal kernel in closed form, supplies pseudocode for its implementation, and derives the resulting acceptance ratio to show that it simplifies without the auxiliary variables or Jacobian terms required by RJMCMC. This demonstrates that the proposal engineering is both simpler and fully contained within ordinary fixed-dimensional MCMC machinery. revision: yes

Circularity Check

0 steps flagged

No significant circularity: equivalence follows from explicit construction of the MoMS measure

full rationale

The paper defines the MoMS target as a mixture of mutually singular components supported on subspaces whose dimensions match the model sizes, then shows by direct construction that this measure coincides with the standard spike-and-slab posterior and that the Metropolis-Hastings ratio matches that of RJMCMC when the proposal kernels are correspondingly dimension-matching. These steps are algebraic identities derived from the chosen partition and the definition of the mixture density with respect to the appropriate dominating measure; they do not reduce to a fitted parameter renamed as a prediction, nor to a load-bearing self-citation whose own justification is internal to the present work. The central claim therefore remains a self-contained re-expression of the target posterior rather than a tautology. No equations or sections in the manuscript reduce the reported equivalence to an input that is itself defined in terms of the output.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the existence of a partition of the parameter space into mutually singular subspaces whose mixture recovers the exact target posterior; this is a domain assumption about measure-theoretic construction rather than a fitted parameter.

axioms (2)
  • domain assumption The target posterior over models and parameters can be represented exactly as a mixture of distributions supported on mutually singular subspaces.
    Invoked when the authors state that MoMS reproduces the spike-and-slab interpretation.
  • standard math Standard Metropolis-Hastings theory applies directly once the state space is fixed-dimensional.
    Used to claim that MoMS and RJMCMC share the same acceptance probability under appropriate constructions.
invented entities (1)
  • Mixtures of mutually singular distributions (MoMS) no independent evidence
    purpose: Represent competing models inside a single fixed-dimensional parameter space for MCMC sampling.
    New representational device introduced to avoid dimension-changing moves.

pith-pipeline@v0.9.0 · 5516 in / 1423 out tokens · 23037 ms · 2026-05-07T05:44:03.440547+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 5 canonical work pages

  1. [1]

    These assumptions rarely hold in high-dimensional settings, necessitating MCMC-based approaches

    gh to be tractable, and (ii) the marginal likelihoodsp(y| Mm)are available in closed form or can be efficiently approximated. These assumptions rarely hold in high-dimensional settings, necessitating MCMC-based approaches. Such approaches construct a Markov chain whose stationary distribution is the joint posteriorp(θ,γ|y). This yields a sequence of draws...

  2. [2]

    These implementations are also used in JASP (Love et al., 2019; van den Bergh et al., 2021)

    in R. These implementations are also used in JASP (Love et al., 2019; van den Bergh et al., 2021). 2.1.2 The Diabetes Data We use the diabetes dataset originally analyzed by Efron, Hastie, Johnstone, and Tibshirani (2004), which has been widely used in Bayesian regression studies. The dataset contains measurements fromN= 442diabetes patients, with the out...

  3. [3]

    Models are different configurations of the effect indicatorsγ

    extends the Metropolis–Hastings algorithm to variable selection problems in which the dimension of the parameter vector depends on the model. Models are different configurations of the effect indicatorsγ. RJMCMC targets the joint posterior distribution over the effect indicators and within-model parameters, p(γ,β|y,X)∝p(y|X,β,γ)π(β|γ)π(γ). Correctness of ...

  4. [4]

    spike-and-slab

    Within- model updates of(σ2,βγ)are carried out in separate Gibbs or Metropolis–Hastings steps and are not specific to the reversible jump mechanism. 2.4 Mixtures of Mutually Singular Distributions Reversible jump MCMC (RJMCMC) and the method of mixtures of mutually sin- gular (MoMS) distributions both use Metropolis-Hastings to sample posterior distributi...

  5. [5]

    critical lure

    and recasting the pseudo-prior approach of Carlin and Chib (1995) as a special case. From a MoMS perspective, this corresponds to a Gibbs step (see also Example 4 in Gottardo & Raftery, 2008). The full conditional is again a spike and slab distribution of the form p(βi|y,X,β−i) =p(γi = 0|y,X,β−i)1{0}(βi) +p(γi = 1|y,X,β−i)p(βi|y,X)1R\{0}(βi), Here, the co...

  6. [6]

    Standard normal prior distributions were assigned to all fixed effects parameters

    was used, which yields Gaussian updates for the regression coefficients. Standard normal prior distributions were assigned to all fixed effects parameters. For the random effects, zero-mean normal prior dis- tributions were specified, with the variance drawn from a positive Cauchy distribution with location0and scale2.5(following, Gelman, 2006, but see Ge...

  7. [7]

    Suppose we observe scores fromPparticipants onIitems, where the response of participantpon itemitakes valuesy pi∈{0,1,...,Ki−1}

    to multi- ple latent dimensions. Suppose we observe scores fromPparticipants onIitems, where the response of participantpon itemitakes valuesy pi∈{0,1,...,Ki−1}. Letθp∈RD denote the vector of latent traits for participantp, and letαi∈RD denote the factor loadings of itemi. The probability of observing a response in categorykis P(ypi =k|θp,αi,βi) = exp (∑k...

  8. [8]

    Excluded loadings are set to zero

    (9) Figure 3 Factor Loadings under the Median Probability Model Item 1 5 10 15 Posterior mean 0.0 0.5 1.0 Factor Loadings Median Probability Model <1/30 1/10 1/3 1 3 10 30 >30.0 Dimension 1 Dimension 2 Dimension 3 Note.Estimates under the median probability model, retaining loadings with posterior inclusion probability exceeding 1/2. Excluded loadings are...

  9. [9]

    In each case, the parameter space is partitioned into exclusive subspaces, and posterior inference proceeds across those subspaces rather than within a single unconstrained model

    as variable selection; or a finite set of point predictions versus a free continuous alternative. In each case, the parameter space is partitioned into exclusive subspaces, and posterior inference proceeds across those subspaces rather than within a single unconstrained model. These examples are best viewed as sketches, but they illustrate that the same l...

  10. [10]

    J., & Ntzoufras, I

    Dellaportas, P., Forster, J. J., & Ntzoufras, I. (2002). On Bayesian model and variable selection using MCMC.Statistics and Computing,12, 27–36. doi: 10.1023/A:1013164120801199 Edefonti, V., & Parmigiani, G. (2017). Combinatorial mixtures of multiparameter distributions: An application to bivariate data.The International Journal of Biostatistics,13(201500...

  11. [11]

    Annals of Statistics32(2), 407–499 (2004) https://doi.org/10.1214/009053604000000067 27

    doi: 10.1214/009053604000000067 Fan, Y., Sisson, S. A., & Davies, L. (2026). Reversible jump Markov chain Monte Carlo and multi- model samplers. InHandbook of markov chain monte carlo(pp. 170–202). Chapman and Hall/CRC. Forster, J. J., Gill, R. C., & Overstall, A. M. (2012). Reversible jump methods for generalised linear models and generalised linear mixe...

  12. [12]

    a registered report testing the effect of sleep on drm false memory: Greater lure and veridical recall but fewer intrusions after sleep

    doi: 10.3390/e19100555 George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling.Journal of the American Statistical Association,88(423), 881-889. doi: 10.1080/01621459.1993.10476353 Ghosh, J. (2015). Bayesian model selection using the median probability model.Wiley Interdisci- plinary Reviews: Computational Statistics,7(3), 185–193....

  13. [13]

    Rossel, D., Telesca, D., & Johnson, V. E. (2013). Statistical models for data analysis. In P. Giudici, S. Ingrassia, & M. Vichi (Eds.), (pp. 305–313). Springer. doi: 10.1007/978-3-319-00032-935 Savitsky, T. (2010).Generalized Gaussian process models with Bayesian variable selection(Unpub- lished doctoral dissertation). Rice University. Savitsky, T., Vanuc...

  14. [14]

    Zanella, G. (2020). Informed proposals for local MCMC in discrete spaces.Journal of the American Statistical Association,115(530), 852–865. Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics (pp. 585–603). Valencia: Universit...

  15. [15]

    Appendix C Reversible Jump Markov Chain Monte Carlo in Detail This appendix provides a detailed explanation of reversible jump Markov chain Monte Carlo (RJMCMC)

    This symmetric solution also satisfies the detailed balance condition but generally yields lower acceptance probabilities than the Metropolis solution. Appendix C Reversible Jump Markov Chain Monte Carlo in Detail This appendix provides a detailed explanation of reversible jump Markov chain Monte Carlo (RJMCMC). The goal is to make explicit how transdimen...

  16. [16]

    We useϕ= 0.75, which provides a balance between rapid initial adaptation and stable convergence of the proposal variances

    −ϕsatisfies the standard Robbins-Monro conditions provided thatϕ∈(1/2,1], ensuring diminishing adaptation and stability of the updates (Robbins & Monro, 1951). We useϕ= 0.75, which provides a balance between rapid initial adaptation and stable convergence of the proposal variances. In the add–delete scheme for variable selection under MoMS, a single compo...

  17. [17]

    To compute the lag-kautocorrelation, note that the eigenvalues ofPare1and 1−a−b(see Section 5.5, Privault, 2013)

    Solving the balance equationπ0a=π1byields π0 = b a+b , π1 = a a+b . To compute the lag-kautocorrelation, note that the eigenvalues ofPare1and 1−a−b(see Section 5.5, Privault, 2013). The lag-kautocorrelation is therefore ρ(k) = (1−a−b)k. The integrated autocorrelation time is τint = 1 + 2 ∞∑ k=1 ρ(k). Because this is a geometric series, τint = 2−(a+b) a+b ...