Reversible Jump MCMC With No Regrets: Bayesian Variable Selection Using Mixtures of Mutually Singular Distributions
Pith reviewed 2026-05-07 05:44 UTC · model grok-4.3
The pith
Bayesian variable selection can be performed with standard fixed-dimensional MCMC by representing models as mixtures of mutually singular distributions in one parameter space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mixtures of mutually singular distributions embed competing models inside a fixed-dimensional parameter space by partitioning it into subspaces that are mutually singular, one per model. When the subspaces and densities are chosen appropriately, the resulting mixture posterior is identical to the target posterior that places discrete probability on models and continuous probability on their parameters. Under these same constructions the Metropolis-Hastings acceptance ratio for moves between models is identical to the ratio used in reversible jump MCMC. Consequently, standard fixed-dimensional MCMC algorithms can be used for Bayesian variable selection without any dimension-changing jumps.
What carries the argument
Mixtures of mutually singular distributions (MoMS), which partition a single fixed-dimensional parameter space into mutually singular subspaces (one per model) so that sampling from the mixture recovers the exact model-and-parameter posterior.
If this is right
- The MoMS posterior is identical to the spike-and-slab posterior used in Bayesian variable selection.
- When the subspaces and proposal kernels are aligned correctly, MoMS and reversible jump MCMC share the identical Metropolis-Hastings acceptance probability.
- On a ten-predictor linear regression benchmark both MoMS and a carefully tuned reversible jump sampler recover posterior inclusion probabilities that match full enumeration.
- MoMS achieves effective sample size per second that is comparable to or higher than that of a well-engineered reversible jump implementation.
- The same fixed-dimensional construction applies directly to mixed-effects logistic regression and to factor-loading selection in multidimensional item response models.
Where Pith is reading between the lines
- Practitioners can now implement Bayesian variable selection inside existing fixed-dimensional MCMC packages without writing custom dimension-changing proposals.
- The approach may extend to other model-selection settings that currently rely on reversible jump, such as change-point detection or graphical model selection.
- Because the method stays inside a single fixed-dimensional space, it may simplify convergence diagnostics and parallelization relative to reversible jump samplers.
Load-bearing premise
The parameter space can be partitioned into mutually singular subspaces, one per model, such that the resulting mixture posterior is exactly the target posterior over models and parameters.
What would settle it
On a small problem whose true posterior can be obtained by full enumeration, draw samples from the MoMS posterior and check whether the estimated model probabilities and parameter marginals differ from the enumerated values beyond Monte Carlo error.
read the original abstract
Bayesian variable selection requires sampling from a posterior distribution that combines discrete model indicators with continuously varying parameters, a challenge often addressed through reversible jump Markov chain Monte Carlo (RJMCMC). Despite its generality, RJMCMC is widely regarded as difficult to design and implement correctly. We present mixtures of mutually singular (MoMS) distributions as a transparent alternative in which competing models are represented within a single fixed-dimensional parameter space partitioned into mutually singular subspaces. We show that this formulation reproduces the exact spike-and-slab interpretation of Bayesian variable selection and that, under appropriate constructions, MoMS and RJMCMC share the same Metropolis--Hastings acceptance probability. On a benchmark dataset with ten predictors, both methods recover posterior inclusion probabilities that match full enumeration, while MoMS achieves comparable or superior effective sample size per second relative to a carefully engineered RJMCMC scheme. We further illustrate the approach in a mixed-effects logistic regression for a sleep-and-memory experiment and in factor-loading selection for a multidimensional generalized partial credit model. Together, these results show that Bayesian variable selection can be carried out within standard fixed-dimensional Markov chain Monte Carlo methodology -- without regret.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes mixtures of mutually singular distributions (MoMS) as an alternative to reversible jump MCMC for Bayesian variable selection. It asserts that MoMS exactly recovers the spike-and-slab posterior in a fixed-dimensional space and that, with appropriately constructed proposals, the Metropolis-Hastings acceptance probabilities coincide with those of RJMCMC. The approach is validated empirically on a benchmark with ten predictors where posterior inclusion probabilities match those from full enumeration, and demonstrated in two applied settings: a mixed-effects logistic regression and factor loading selection in a generalized partial credit model.
Significance. Should the theoretical claims be substantiated with complete derivations, the contribution would be notable for potentially simplifying the implementation of Bayesian variable selection by embedding it in standard MCMC samplers without explicit dimension jumping. The paper earns credit for the direct empirical comparison to exhaustive enumeration on the benchmark dataset and for extending the method to mixed-effects and item-response models. However, the measure-theoretic challenges in transitioning between singular subspaces may limit the extent to which the method truly avoids the complexities of RJMCMC.
major comments (2)
- [Abstract] The central claim that MoMS reproduces the exact spike-and-slab interpretation and shares the Metropolis-Hastings acceptance probability with RJMCMC is presented without an accompanying derivation or proof sketch. A detailed proof is load-bearing for the equivalence assertion and should be provided explicitly, perhaps in a dedicated theoretical section.
- [MoMS target and proposal kernels] The partition of the parameter space into mutually singular subspaces implies that any proposal absolutely continuous w.r.t. Lebesgue measure on R^p assigns zero probability to other model subspaces. Therefore, the 'appropriate constructions' for proposals must incorporate singular components analogous to RJMCMC's dimension-matching proposals. This raises the question whether the method genuinely permits 'standard fixed-dimensional MCMC' or merely relocates the engineering effort into the proposal design. Clarify the explicit form of the proposal kernel and demonstrate that it does not require the same Jacobian and matching calculations as RJMCMC.
minor comments (1)
- [Empirical results] Specify the exact benchmark dataset used (e.g., its name or source) to facilitate reproducibility of the posterior inclusion probability comparisons.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review. The comments highlight important aspects of the theoretical presentation and implementation details that we will address to improve clarity and rigor. We respond to each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [Abstract] The central claim that MoMS reproduces the exact spike-and-slab interpretation and shares the Metropolis-Hastings acceptance probability with RJMCMC is presented without an accompanying derivation or proof sketch. A detailed proof is load-bearing for the equivalence assertion and should be provided explicitly, perhaps in a dedicated theoretical section.
Authors: We agree that the equivalence claims are central and benefit from a more explicit, self-contained derivation. Although the current manuscript develops the key arguments across Sections 2 and 3, we acknowledge that a consolidated proof would make the contribution more accessible. In the revised version we will add a dedicated theoretical section (new Section 2.3) containing a complete, step-by-step derivation: first showing that the MoMS target measure exactly recovers the spike-and-slab posterior, and second proving that the Metropolis-Hastings acceptance probability equals that of the corresponding RJMCMC transition under the stated proposal constructions. A concise proof sketch will also be inserted into the abstract to foreground the main result. revision: yes
-
Referee: [MoMS target and proposal kernels] The partition of the parameter space into mutually singular subspaces implies that any proposal absolutely continuous w.r.t. Lebesgue measure on R^p assigns zero probability to other model subspaces. Therefore, the 'appropriate constructions' for proposals must incorporate singular components analogous to RJMCMC's dimension-matching proposals. This raises the question whether the method genuinely permits 'standard fixed-dimensional MCMC' or merely relocates the engineering effort into the proposal design. Clarify the explicit form of the proposal kernel and demonstrate that it does not require the same Jacobian and matching calculations as RJMCMC.
Authors: We appreciate this incisive observation. The ambient parameter space remains fixed-dimensional (R^p), so the sampler operates entirely within a single Euclidean space without explicit dimension changes; this is what permits the use of standard fixed-dimensional MCMC code and software. The proposal kernels are mixtures that place positive mass on the singular subspaces via Dirac components (e.g., by deterministically setting coordinates corresponding to excluded variables to zero), but because the dimension never changes there are no Jacobian determinants or dimension-matching functions to compute. In the revision we will add an explicit subsection (new Section 3.2) that writes the proposal kernel in closed form, supplies pseudocode for its implementation, and derives the resulting acceptance ratio to show that it simplifies without the auxiliary variables or Jacobian terms required by RJMCMC. This demonstrates that the proposal engineering is both simpler and fully contained within ordinary fixed-dimensional MCMC machinery. revision: yes
Circularity Check
No significant circularity: equivalence follows from explicit construction of the MoMS measure
full rationale
The paper defines the MoMS target as a mixture of mutually singular components supported on subspaces whose dimensions match the model sizes, then shows by direct construction that this measure coincides with the standard spike-and-slab posterior and that the Metropolis-Hastings ratio matches that of RJMCMC when the proposal kernels are correspondingly dimension-matching. These steps are algebraic identities derived from the chosen partition and the definition of the mixture density with respect to the appropriate dominating measure; they do not reduce to a fitted parameter renamed as a prediction, nor to a load-bearing self-citation whose own justification is internal to the present work. The central claim therefore remains a self-contained re-expression of the target posterior rather than a tautology. No equations or sections in the manuscript reduce the reported equivalence to an input that is itself defined in terms of the output.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The target posterior over models and parameters can be represented exactly as a mixture of distributions supported on mutually singular subspaces.
- standard math Standard Metropolis-Hastings theory applies directly once the state space is fixed-dimensional.
invented entities (1)
-
Mixtures of mutually singular distributions (MoMS)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
These assumptions rarely hold in high-dimensional settings, necessitating MCMC-based approaches
gh to be tractable, and (ii) the marginal likelihoodsp(y| Mm)are available in closed form or can be efficiently approximated. These assumptions rarely hold in high-dimensional settings, necessitating MCMC-based approaches. Such approaches construct a Markov chain whose stationary distribution is the joint posteriorp(θ,γ|y). This yields a sequence of draws...
2008
-
[2]
These implementations are also used in JASP (Love et al., 2019; van den Bergh et al., 2021)
in R. These implementations are also used in JASP (Love et al., 2019; van den Bergh et al., 2021). 2.1.2 The Diabetes Data We use the diabetes dataset originally analyzed by Efron, Hastie, Johnstone, and Tibshirani (2004), which has been widely used in Bayesian regression studies. The dataset contains measurements fromN= 442diabetes patients, with the out...
2019
-
[3]
Models are different configurations of the effect indicatorsγ
extends the Metropolis–Hastings algorithm to variable selection problems in which the dimension of the parameter vector depends on the model. Models are different configurations of the effect indicatorsγ. RJMCMC targets the joint posterior distribution over the effect indicators and within-model parameters, p(γ,β|y,X)∝p(y|X,β,γ)π(β|γ)π(γ). Correctness of ...
2012
-
[4]
spike-and-slab
Within- model updates of(σ2,βγ)are carried out in separate Gibbs or Metropolis–Hastings steps and are not specific to the reversible jump mechanism. 2.4 Mixtures of Mutually Singular Distributions Reversible jump MCMC (RJMCMC) and the method of mixtures of mutually sin- gular (MoMS) distributions both use Metropolis-Hastings to sample posterior distributi...
1995
-
[5]
critical lure
and recasting the pseudo-prior approach of Carlin and Chib (1995) as a special case. From a MoMS perspective, this corresponds to a Gibbs step (see also Example 4 in Gottardo & Raftery, 2008). The full conditional is again a spike and slab distribution of the form p(βi|y,X,β−i) =p(γi = 0|y,X,β−i)1{0}(βi) +p(γi = 1|y,X,β−i)p(βi|y,X)1R\{0}(βi), Here, the co...
1995
-
[6]
Standard normal prior distributions were assigned to all fixed effects parameters
was used, which yields Gaussian updates for the regression coefficients. Standard normal prior distributions were assigned to all fixed effects parameters. For the random effects, zero-mean normal prior dis- tributions were specified, with the variance drawn from a positive Cauchy distribution with location0and scale2.5(following, Gelman, 2006, but see Ge...
2006
-
[7]
Suppose we observe scores fromPparticipants onIitems, where the response of participantpon itemitakes valuesy pi∈{0,1,...,Ki−1}
to multi- ple latent dimensions. Suppose we observe scores fromPparticipants onIitems, where the response of participantpon itemitakes valuesy pi∈{0,1,...,Ki−1}. Letθp∈RD denote the vector of latent traits for participantp, and letαi∈RD denote the factor loadings of itemi. The probability of observing a response in categorykis P(ypi =k|θp,αi,βi) = exp (∑k...
2014
-
[8]
Excluded loadings are set to zero
(9) Figure 3 Factor Loadings under the Median Probability Model Item 1 5 10 15 Posterior mean 0.0 0.5 1.0 Factor Loadings Median Probability Model <1/30 1/10 1/3 1 3 10 30 >30.0 Dimension 1 Dimension 2 Dimension 3 Note.Estimates under the median probability model, retaining loadings with posterior inclusion probability exceeding 1/2. Excluded loadings are...
2026
-
[9]
as variable selection; or a finite set of point predictions versus a free continuous alternative. In each case, the parameter space is partitioned into exclusive subspaces, and posterior inference proceeds across those subspaces rather than within a single unconstrained model. These examples are best viewed as sketches, but they illustrate that the same l...
-
[10]
Dellaportas, P., Forster, J. J., & Ntzoufras, I. (2002). On Bayesian model and variable selection using MCMC.Statistics and Computing,12, 27–36. doi: 10.1023/A:1013164120801199 Edefonti, V., & Parmigiani, G. (2017). Combinatorial mixtures of multiparameter distributions: An application to bivariate data.The International Journal of Biostatistics,13(201500...
-
[11]
Annals of Statistics32(2), 407–499 (2004) https://doi.org/10.1214/009053604000000067 27
doi: 10.1214/009053604000000067 Fan, Y., Sisson, S. A., & Davies, L. (2026). Reversible jump Markov chain Monte Carlo and multi- model samplers. InHandbook of markov chain monte carlo(pp. 170–202). Chapman and Hall/CRC. Forster, J. J., Gill, R. C., & Overstall, A. M. (2012). Reversible jump methods for generalised linear models and generalised linear mixe...
-
[12]
doi: 10.3390/e19100555 George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling.Journal of the American Statistical Association,88(423), 881-889. doi: 10.1080/01621459.1993.10476353 Ghosh, J. (2015). Bayesian model selection using the median probability model.Wiley Interdisci- plinary Reviews: Computational Statistics,7(3), 185–193....
-
[13]
Rossel, D., Telesca, D., & Johnson, V. E. (2013). Statistical models for data analysis. In P. Giudici, S. Ingrassia, & M. Vichi (Eds.), (pp. 305–313). Springer. doi: 10.1007/978-3-319-00032-935 Savitsky, T. (2010).Generalized Gaussian process models with Bayesian variable selection(Unpub- lished doctoral dissertation). Rice University. Savitsky, T., Vanuc...
-
[14]
Zanella, G. (2020). Informed proposals for local MCMC in discrete spaces.Journal of the American Statistical Association,115(530), 852–865. Zellner, A., & Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics (pp. 585–603). Valencia: Universit...
2020
-
[15]
Appendix C Reversible Jump Markov Chain Monte Carlo in Detail This appendix provides a detailed explanation of reversible jump Markov chain Monte Carlo (RJMCMC)
This symmetric solution also satisfies the detailed balance condition but generally yields lower acceptance probabilities than the Metropolis solution. Appendix C Reversible Jump Markov Chain Monte Carlo in Detail This appendix provides a detailed explanation of reversible jump Markov chain Monte Carlo (RJMCMC). The goal is to make explicit how transdimen...
1997
-
[16]
We useϕ= 0.75, which provides a balance between rapid initial adaptation and stable convergence of the proposal variances
−ϕsatisfies the standard Robbins-Monro conditions provided thatϕ∈(1/2,1], ensuring diminishing adaptation and stability of the updates (Robbins & Monro, 1951). We useϕ= 0.75, which provides a balance between rapid initial adaptation and stable convergence of the proposal variances. In the add–delete scheme for variable selection under MoMS, a single compo...
1951
-
[17]
To compute the lag-kautocorrelation, note that the eigenvalues ofPare1and 1−a−b(see Section 5.5, Privault, 2013)
Solving the balance equationπ0a=π1byields π0 = b a+b , π1 = a a+b . To compute the lag-kautocorrelation, note that the eigenvalues ofPare1and 1−a−b(see Section 5.5, Privault, 2013). The lag-kautocorrelation is therefore ρ(k) = (1−a−b)k. The integrated autocorrelation time is τint = 1 + 2 ∞∑ k=1 ρ(k). Because this is a geometric series, τint = 2−(a+b) a+b ...
2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.