pith. sign in

arxiv: 2601.21765 · v2 · pith:3KA772X6new · submitted 2026-01-29 · 📊 stat.CO · stat.ME· stat.ML

Mean-field Variational Bayes for Sparse Probit Regression

Pith reviewed 2026-05-21 14:55 UTC · model grok-4.3

classification 📊 stat.CO stat.MEstat.ML
keywords variational Bayesprobit regressionvariable selectionspike-and-slab priormean-field approximationBayesian computationcoordinate ascent
0
0 comments X

The pith

Mean-field variational Bayes yields closed-form updates for spike-and-slab probit regression, delivering posterior inclusion probabilities and estimates orders of magnitude faster than MCMC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a mean-field variational Bayes approximation to the posterior in a probit regression model that places a spike-and-slab prior on the coefficients. All variational factors receive closed-form updates and the evidence lower bound is available in closed form, which permits an efficient coordinate-ascent algorithm. A reader would care because the method supplies both variable-selection probabilities and coefficient estimates inside a single scalable framework, addressing the prohibitive run times of MCMC in high-dimensional binary data. Simulations and real-data examples show that the resulting selections and predictions remain comparable to MCMC while running far more quickly.

Core claim

We consider Bayesian variable selection for binary outcomes under a probit link with a spike-and-slab prior on the regression coefficients. We develop a mean-field variational Bayes approximation in which all variational factors admit closed-form updates and the evidence lower bound is available in closed form. This allows an efficient coordinate ascent variational inference algorithm whose output produces posterior inclusion probabilities and parameter estimates, enabling interpretable selection and prediction.

What carries the argument

Mean-field variational family with independent factors for each regression coefficient and its inclusion indicator, admitting closed-form coordinate ascent updates for the probit spike-and-slab model.

If this is right

  • The algorithm returns both posterior inclusion probabilities and point estimates inside one run, supporting joint selection and prediction.
  • Because every update is closed form, the procedure scales to regimes where full MCMC sampling becomes impractical.
  • The availability of a closed-form evidence lower bound supplies a built-in criterion for comparing models or tuning hyperparameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variational construction could be adapted to logistic regression or other generalized linear models by deriving analogous closed-form updates.
  • One could examine whether the variational inclusion probabilities remain calibrated when the number of predictors greatly exceeds the sample size beyond the regimes tested in the paper.
  • Because the evidence lower bound is tractable, the method might be embedded inside a larger search over different prior hyperparameters without additional MCMC cost.

Load-bearing premise

The chosen mean-field family is flexible enough and the coordinate ascent updates converge to a sufficiently accurate approximation of the true posterior.

What would settle it

A dataset in which MCMC and the variational method produce materially different sets of selected variables or markedly different out-of-sample predictive performance would falsify the claim of comparable accuracy.

Figures

Figures reproduced from arXiv: 2601.21765 by Augusto Fasano, Giovanni Rebaudo.

Figure 1
Figure 1. Figure 1: For p = 200, n = 1000, posterior inclusion probabilities (PIPs) as a function of the true parameter values γ 0 j β 0 j estimated by MCMC and MFVB across the 50 simulated datasets. For graphical purposes, the distance between 0 and 1 is not to scale and we spread the true γ 0 j β 0 j ∈ {±3, ±1, 0} in a neighborhood of their actual values. 9 [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: For p = 1000, n = 500, posterior inclusion probabilities (PIPs) as a function of the true parameter values γ 0 j β 0 j estimated by MCMC and MFVB across the 20 simulated datasets. For graphical purposes, the distance between 0 and 1 is not to scale and we spread the true regression parameters γ 0 j β 0 j in a neighborhood of their actual values. 10 [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
read the original abstract

We consider Bayesian variable selection for binary outcomes under a probit link with a spike-and-slab prior on the regression coefficients. Motivated by the computational challenges encountered by Markov chain Monte Carlo (MCMC) samplers in high-dimensional regimes, we develop a mean-field variational Bayes approximation in which all variational factors admit closed-form updates, and the evidence lower bound is available in closed form. This, in turn, allows the development of an efficient coordinate ascent variational inference algorithm to find the optimal values of the variational parameters. The approach produces posterior inclusion probabilities and parameter estimates, enabling interpretable selection and prediction within a single framework. As shown in both simulated and real data applications, the proposed method successfully identifies the important variables and is orders of magnitude faster than MCMC, while maintaining comparable accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript develops a mean-field variational Bayes procedure for Bayesian variable selection in probit regression under a spike-and-slab prior. All variational factors are chosen to admit closed-form coordinate-ascent updates, the ELBO is available in closed form, and the resulting algorithm yields posterior inclusion probabilities together with point estimates. The authors report that the procedure identifies important variables on both simulated and real data, runs orders of magnitude faster than MCMC, and achieves comparable predictive accuracy.

Significance. If the variational inclusion probabilities remain reliable under realistic correlation structures, the method supplies a practical, scalable alternative to MCMC for high-dimensional binary regression with interpretable selection. The closed-form updates and explicit ELBO constitute a clear computational advantage over sampling-based approaches.

major comments (1)
  1. [§3.2] §3.2 (Variational family and coordinate ascent): The mean-field factorization q(β,γ,z) = ∏ q(β_j) q(γ_j) q(z_i) enforces independence among the inclusion indicators γ_j. When design-matrix columns are linearly dependent or highly correlated, the true spike-and-slab posterior exhibits negative dependence between γ_j and γ_k; the variational marginals can therefore systematically over- or under-state inclusion probabilities. This directly affects the headline claim that the method “successfully identifies the important variables.” A concrete diagnostic—e.g., comparison of variational versus MCMC inclusion probabilities on a design with pairwise correlations >0.7—would be needed to substantiate the claim.
minor comments (2)
  1. [§4] The abstract and §4 state that the method is “orders of magnitude faster” and maintains “comparable accuracy,” yet no tables report wall-clock times, effective sample sizes, or quantitative error metrics (e.g., AUC, Brier score) with standard errors across replications. Adding such summaries would strengthen the empirical section.
  2. Notation for the variational parameters (e.g., μ_j, σ_j^2, π_j) is introduced without an explicit summary table; a short table collecting all variational parameters and their update equations would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and insightful comments on our manuscript. We address the major comment below and plan to incorporate revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Variational family and coordinate ascent): The mean-field factorization q(β,γ,z) = ∏ q(β_j) q(γ_j) q(z_i) enforces independence among the inclusion indicators γ_j. When design-matrix columns are linearly dependent or highly correlated, the true spike-and-slab posterior exhibits negative dependence between γ_j and γ_k; the variational marginals can therefore systematically over- or under-state inclusion probabilities. This directly affects the headline claim that the method “successfully identifies the important variables.” A concrete diagnostic—e.g., comparison of variational versus MCMC inclusion probabilities on a design with pairwise correlations >0.7—would be needed to substantiate the claim.

    Authors: We agree that the mean-field assumption of independence among the γ_j is a limitation that may lead to inaccuracies in inclusion probabilities when predictors are highly correlated. This is an inherent feature of the mean-field variational family. Our current numerical experiments primarily use designs with moderate correlations, where the method performs well in identifying important variables. To directly address this concern, we will add a new simulation study featuring a design matrix with pairwise correlations greater than 0.7. In this study, we will compare the variational inclusion probabilities against those obtained from long MCMC runs, providing the requested diagnostic. This addition will help evaluate the robustness of the approach under realistic correlation structures and clarify the conditions under which the method reliably identifies important variables. revision: yes

Circularity Check

0 steps flagged

Standard mean-field coordinate ascent derivation with no circular reductions to inputs

full rationale

The paper derives closed-form variational updates for the mean-field factors in a spike-and-slab probit model by applying the standard coordinate ascent variational inference algorithm to the evidence lower bound; these updates follow directly from the chosen factorized variational family and the model likelihood/prior without any fitted parameters being relabeled as predictions or any load-bearing steps reducing to self-citations or self-definitions. Performance claims on variable selection and speed are evaluated empirically on simulated and real data rather than being forced by the derivation equations themselves. No instances of the enumerated circularity patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities are stated beyond standard variational inference assumptions and the spike-and-slab prior structure.

pith-pipeline@v0.9.0 · 5658 in / 997 out tokens · 40727 ms · 2026-05-21T14:55:58.912945+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Bayesian analysis of binary and polychotomous response data

    Albert, J. H. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data”.J. Am. Stat. Assoc.88, 669–679

  2. [2]

    Bayesian conjugacy in probit, tobit, multinomial probit and extensions: a review and new results

    Anceschi, N., Fasano, A., Durante, D., and Zanella, G. (2023). “Bayesian conjugacy in probit, tobit, multinomial probit and extensions: a review and new results”.J. Am. Stat. Assoc. 118, 1451–1469

  3. [3]

    Variational inference: a review for statisticians

    Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). “Variational inference: a review for statisticians”.J. Am. Stat. Assoc.112, 859–877

  4. [4]

    Leave Pima Indians alone: binary regression as a bench- mark for Bayesian computation

    Chopin, N. and Ridgway, J. (2017). “Leave Pima Indians alone: binary regression as a bench- mark for Bayesian computation”.Stat. Sci.32, 64–87

  5. [5]

    Mean-field variational approximate Bayesian infer- ence for latent variable models

    Consonni, G. and Marin, J.-M. (2007). “Mean-field variational approximate Bayesian infer- ence for latent variable models”.Comput. Stat. Data Anal.52, 790–798

  6. [6]

    Conjugate Bayes for probit regression via unified skew-normal distri- butions

    Durante, D. (2019). “Conjugate Bayes for probit regression via unified skew-normal distri- butions”.Biometrika106, 765–779

  7. [7]

    Scalable and accurate variational Bayes for high-dimensional binary regression models

    Fasano, A., Durante, D., and Zanella, G. (2022). “Scalable and accurate variational Bayes for high-dimensional binary regression models”.Biometrika109, 901–919

  8. [8]

    Bayesian auxiliary variable models for binary and multinomial regression

    Holmes, C. C. and Held, L. (2006). “Bayesian auxiliary variable models for binary and multinomial regression”.Bayesian Anal.1, 145–168

  9. [9]

    Bayesian parameter estimation via variational methods

    Jaakkola, T. S. and Jordan, M. I. (2000). “Bayesian parameter estimation via variational methods”.Stat. Comput.10, 25–37

  10. [10]

    A variational Bayes approach to variable selection

    Ormerod, J. T., You, C., and M¨ uller, S. (2017). “A variational Bayes approach to variable selection”.Electron. J. Stat.11, 3549–3594

  11. [11]

    Bayesian inference for logistic models using P´ olya–Gamma latent variables

    Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian inference for logistic models using P´ olya–Gamma latent variables”.J. Am. Stat. Assoc.108, 1339–1349

  12. [12]

    Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease

    Tsanas, A., Little, M. A., Fox, C., and Ramig, L. O. (2013). “Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease”.IEEE Trans. Neural Syst. Rehabil. Eng.22, 181–190. SUPPLEMENTARY MATERIALS S.7

  13. [13]

    Ultimate P´ olya gamma samplers– efficient MCMC for possibly imbalanced binary and categorical data

    Zens, G., Fr¨ uhwirth-Schnatter, S., and Wagner, H. (2024). “Ultimate P´ olya gamma samplers– efficient MCMC for possibly imbalanced binary and categorical data”.J. Am. Stat. Assoc. 119, 2548–2559

  14. [14]

    Scalable variable selection and model averaging for latent regression models using approximate variational Bayes

    Zens, G. and Steel, M. F. J. (2025). “Scalable variable selection and model averaging for latent regression models using approximate variational Bayes”.Preprint at arXiv:2509.11751

  15. [15]

    Scalable expectation propagation for mixed-effects regression

    Zhou, J., Ormerod, J. T., and Grazian, C. (2024). “Scalable expectation propagation for mixed-effects regression”.Preprint at arXiv:2409.14646