Mean-field Variational Bayes for Sparse Probit Regression
Pith reviewed 2026-05-21 14:55 UTC · model grok-4.3
The pith
Mean-field variational Bayes yields closed-form updates for spike-and-slab probit regression, delivering posterior inclusion probabilities and estimates orders of magnitude faster than MCMC.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We consider Bayesian variable selection for binary outcomes under a probit link with a spike-and-slab prior on the regression coefficients. We develop a mean-field variational Bayes approximation in which all variational factors admit closed-form updates and the evidence lower bound is available in closed form. This allows an efficient coordinate ascent variational inference algorithm whose output produces posterior inclusion probabilities and parameter estimates, enabling interpretable selection and prediction.
What carries the argument
Mean-field variational family with independent factors for each regression coefficient and its inclusion indicator, admitting closed-form coordinate ascent updates for the probit spike-and-slab model.
If this is right
- The algorithm returns both posterior inclusion probabilities and point estimates inside one run, supporting joint selection and prediction.
- Because every update is closed form, the procedure scales to regimes where full MCMC sampling becomes impractical.
- The availability of a closed-form evidence lower bound supplies a built-in criterion for comparing models or tuning hyperparameters.
Where Pith is reading between the lines
- The same variational construction could be adapted to logistic regression or other generalized linear models by deriving analogous closed-form updates.
- One could examine whether the variational inclusion probabilities remain calibrated when the number of predictors greatly exceeds the sample size beyond the regimes tested in the paper.
- Because the evidence lower bound is tractable, the method might be embedded inside a larger search over different prior hyperparameters without additional MCMC cost.
Load-bearing premise
The chosen mean-field family is flexible enough and the coordinate ascent updates converge to a sufficiently accurate approximation of the true posterior.
What would settle it
A dataset in which MCMC and the variational method produce materially different sets of selected variables or markedly different out-of-sample predictive performance would falsify the claim of comparable accuracy.
Figures
read the original abstract
We consider Bayesian variable selection for binary outcomes under a probit link with a spike-and-slab prior on the regression coefficients. Motivated by the computational challenges encountered by Markov chain Monte Carlo (MCMC) samplers in high-dimensional regimes, we develop a mean-field variational Bayes approximation in which all variational factors admit closed-form updates, and the evidence lower bound is available in closed form. This, in turn, allows the development of an efficient coordinate ascent variational inference algorithm to find the optimal values of the variational parameters. The approach produces posterior inclusion probabilities and parameter estimates, enabling interpretable selection and prediction within a single framework. As shown in both simulated and real data applications, the proposed method successfully identifies the important variables and is orders of magnitude faster than MCMC, while maintaining comparable accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a mean-field variational Bayes procedure for Bayesian variable selection in probit regression under a spike-and-slab prior. All variational factors are chosen to admit closed-form coordinate-ascent updates, the ELBO is available in closed form, and the resulting algorithm yields posterior inclusion probabilities together with point estimates. The authors report that the procedure identifies important variables on both simulated and real data, runs orders of magnitude faster than MCMC, and achieves comparable predictive accuracy.
Significance. If the variational inclusion probabilities remain reliable under realistic correlation structures, the method supplies a practical, scalable alternative to MCMC for high-dimensional binary regression with interpretable selection. The closed-form updates and explicit ELBO constitute a clear computational advantage over sampling-based approaches.
major comments (1)
- [§3.2] §3.2 (Variational family and coordinate ascent): The mean-field factorization q(β,γ,z) = ∏ q(β_j) q(γ_j) q(z_i) enforces independence among the inclusion indicators γ_j. When design-matrix columns are linearly dependent or highly correlated, the true spike-and-slab posterior exhibits negative dependence between γ_j and γ_k; the variational marginals can therefore systematically over- or under-state inclusion probabilities. This directly affects the headline claim that the method “successfully identifies the important variables.” A concrete diagnostic—e.g., comparison of variational versus MCMC inclusion probabilities on a design with pairwise correlations >0.7—would be needed to substantiate the claim.
minor comments (2)
- [§4] The abstract and §4 state that the method is “orders of magnitude faster” and maintains “comparable accuracy,” yet no tables report wall-clock times, effective sample sizes, or quantitative error metrics (e.g., AUC, Brier score) with standard errors across replications. Adding such summaries would strengthen the empirical section.
- Notation for the variational parameters (e.g., μ_j, σ_j^2, π_j) is introduced without an explicit summary table; a short table collecting all variational parameters and their update equations would improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading and insightful comments on our manuscript. We address the major comment below and plan to incorporate revisions to strengthen the paper.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Variational family and coordinate ascent): The mean-field factorization q(β,γ,z) = ∏ q(β_j) q(γ_j) q(z_i) enforces independence among the inclusion indicators γ_j. When design-matrix columns are linearly dependent or highly correlated, the true spike-and-slab posterior exhibits negative dependence between γ_j and γ_k; the variational marginals can therefore systematically over- or under-state inclusion probabilities. This directly affects the headline claim that the method “successfully identifies the important variables.” A concrete diagnostic—e.g., comparison of variational versus MCMC inclusion probabilities on a design with pairwise correlations >0.7—would be needed to substantiate the claim.
Authors: We agree that the mean-field assumption of independence among the γ_j is a limitation that may lead to inaccuracies in inclusion probabilities when predictors are highly correlated. This is an inherent feature of the mean-field variational family. Our current numerical experiments primarily use designs with moderate correlations, where the method performs well in identifying important variables. To directly address this concern, we will add a new simulation study featuring a design matrix with pairwise correlations greater than 0.7. In this study, we will compare the variational inclusion probabilities against those obtained from long MCMC runs, providing the requested diagnostic. This addition will help evaluate the robustness of the approach under realistic correlation structures and clarify the conditions under which the method reliably identifies important variables. revision: yes
Circularity Check
Standard mean-field coordinate ascent derivation with no circular reductions to inputs
full rationale
The paper derives closed-form variational updates for the mean-field factors in a spike-and-slab probit model by applying the standard coordinate ascent variational inference algorithm to the evidence lower bound; these updates follow directly from the chosen factorized variational family and the model likelihood/prior without any fitted parameters being relabeled as predictions or any load-bearing steps reducing to self-citations or self-definitions. Performance claims on variable selection and speed are evaluated empirically on simulated and real data rather than being forced by the derivation equations themselves. No instances of the enumerated circularity patterns are present.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop a mean-field variational Bayes approximation in which all variational factors admit closed-form updates... coordinate ascent variational inference algorithm
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
optimal variational factor q(γⱼ) ... wⱼ = expit(ηⱼ) with ηⱼ defined by (6)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bayesian analysis of binary and polychotomous response data
Albert, J. H. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data”.J. Am. Stat. Assoc.88, 669–679
work page 1993
-
[2]
Bayesian conjugacy in probit, tobit, multinomial probit and extensions: a review and new results
Anceschi, N., Fasano, A., Durante, D., and Zanella, G. (2023). “Bayesian conjugacy in probit, tobit, multinomial probit and extensions: a review and new results”.J. Am. Stat. Assoc. 118, 1451–1469
work page 2023
-
[3]
Variational inference: a review for statisticians
Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). “Variational inference: a review for statisticians”.J. Am. Stat. Assoc.112, 859–877
work page 2017
-
[4]
Leave Pima Indians alone: binary regression as a bench- mark for Bayesian computation
Chopin, N. and Ridgway, J. (2017). “Leave Pima Indians alone: binary regression as a bench- mark for Bayesian computation”.Stat. Sci.32, 64–87
work page 2017
-
[5]
Mean-field variational approximate Bayesian infer- ence for latent variable models
Consonni, G. and Marin, J.-M. (2007). “Mean-field variational approximate Bayesian infer- ence for latent variable models”.Comput. Stat. Data Anal.52, 790–798
work page 2007
-
[6]
Conjugate Bayes for probit regression via unified skew-normal distri- butions
Durante, D. (2019). “Conjugate Bayes for probit regression via unified skew-normal distri- butions”.Biometrika106, 765–779
work page 2019
-
[7]
Scalable and accurate variational Bayes for high-dimensional binary regression models
Fasano, A., Durante, D., and Zanella, G. (2022). “Scalable and accurate variational Bayes for high-dimensional binary regression models”.Biometrika109, 901–919
work page 2022
-
[8]
Bayesian auxiliary variable models for binary and multinomial regression
Holmes, C. C. and Held, L. (2006). “Bayesian auxiliary variable models for binary and multinomial regression”.Bayesian Anal.1, 145–168
work page 2006
-
[9]
Bayesian parameter estimation via variational methods
Jaakkola, T. S. and Jordan, M. I. (2000). “Bayesian parameter estimation via variational methods”.Stat. Comput.10, 25–37
work page 2000
-
[10]
A variational Bayes approach to variable selection
Ormerod, J. T., You, C., and M¨ uller, S. (2017). “A variational Bayes approach to variable selection”.Electron. J. Stat.11, 3549–3594
work page 2017
-
[11]
Bayesian inference for logistic models using P´ olya–Gamma latent variables
Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian inference for logistic models using P´ olya–Gamma latent variables”.J. Am. Stat. Assoc.108, 1339–1349
work page 2013
-
[12]
Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease
Tsanas, A., Little, M. A., Fox, C., and Ramig, L. O. (2013). “Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease”.IEEE Trans. Neural Syst. Rehabil. Eng.22, 181–190. SUPPLEMENTARY MATERIALS S.7
work page 2013
-
[13]
Ultimate P´ olya gamma samplers– efficient MCMC for possibly imbalanced binary and categorical data
Zens, G., Fr¨ uhwirth-Schnatter, S., and Wagner, H. (2024). “Ultimate P´ olya gamma samplers– efficient MCMC for possibly imbalanced binary and categorical data”.J. Am. Stat. Assoc. 119, 2548–2559
work page 2024
-
[14]
Zens, G. and Steel, M. F. J. (2025). “Scalable variable selection and model averaging for latent regression models using approximate variational Bayes”.Preprint at arXiv:2509.11751
-
[15]
Scalable expectation propagation for mixed-effects regression
Zhou, J., Ormerod, J. T., and Grazian, C. (2024). “Scalable expectation propagation for mixed-effects regression”.Preprint at arXiv:2409.14646
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.