Comparing Variable Selection and Model Averaging Methods for Logistic Regression

Adrian E. Raftery; Don van den Bergh; Franti\v{s}ek Barto\v{s}; Giuseppe Arena; Henrik R. Godmann; Julius M. Pfadt; Maarten Marsman; Nikola Sekulovski; Vipasha Goyal

arxiv: 2511.23216 · v3 · submitted 2025-11-28 · 📊 stat.ME

Comparing Variable Selection and Model Averaging Methods for Logistic Regression

Nikola Sekulovski , Franti\v{s}ek Barto\v{s} , Don van den Bergh , Giuseppe Arena , Henrik R. Godmann , Vipasha Goyal , Julius M. Pfadt , Maarten Marsman

show 1 more author

Adrian E. Raftery

This is my paper

Pith reviewed 2026-05-17 04:01 UTC · model grok-4.3

classification 📊 stat.ME

keywords logistic regressionvariable selectionmodel averagingBayesian model averagingLASSOseparationmodel uncertaintysimulation study

0 comments

The pith

BMA with g-priors performs best for logistic regression without separation while LASSO is most stable when separation occurs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares 28 methods for variable selection and model averaging in logistic regression to address model uncertainty for binary outcomes. Simulations based on 11 empirical datasets test these methods in scenarios both with and without separation. BMA approaches using g-priors, especially g equal to max of sample size and predictors squared, perform strongest without separation. Penalized methods like LASSO provide stability with separation, and local EB BMA is competitive overall. This offers guidance for researchers dealing with uncertain predictors in logistic models.

Core claim

The authors conduct a preregistered simulation study comparing 28 established methods for variable selection and inference under model uncertainty in logistic regression. They find that Bayesian model averaging methods based on g-priors, particularly g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results, while BMA with the local empirical Bayes prior is competitive in both situations.

What carries the argument

Preregistered simulation study evaluating 28 variable selection and model averaging methods on logistic regression models derived from 11 empirical datasets, distinguishing cases with and without separation.

If this is right

BMA with g = max(n, p^2) is recommended when separation is absent.
LASSO should be used for stability in the presence of separation.
EB-local BMA works competitively across both conditions.
These results guide method choice for model uncertainty in logistic regression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The performance patterns might generalize to other generalized linear models with uncertain predictors.
Further tests on high-dimensional datasets could confirm or refine the recommendations.
Hybrid methods blending BMA and penalization could be explored for robustness in mixed conditions.

Load-bearing premise

The 11 empirical datasets and simulation conditions adequately represent the range of real-world logistic regression problems with model uncertainty.

What would settle it

A new dataset or simulation where BMA with g = max(n, p^2) does not lead in performance without separation, or where LASSO is not most stable with separation, would challenge the main findings.

read the original abstract

Model uncertainty is a central challenge in statistical models for binary outcomes such as logistic regression, arising when it is unclear which predictors should be included in the model. Many methods have been proposed to address this issue for logistic regression, but their relative performance under realistic conditions remains poorly understood. We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets spanning a range of sample sizes and number of predictors, in cases both with and without separation. We found that Bayesian model averaging (BMA) methods based on g-priors, particularly g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results, while BMA with the local empirical Bayes (EB-local) prior is competitive in both situations. These findings offer practical guidance for applied researchers on how to effectively address model uncertainty in logistic regression in modern empirical and machine learning research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Preregistered comparison of 28 methods finds BMA with g=max(n,p^2) strongest without separation and LASSO most stable with it, though the 11 datasets' coverage is unclear.

read the letter

The main takeaway is that this paper runs a preregistered simulation comparing 28 variable selection and model averaging approaches for logistic regression across 11 empirical datasets, with and without separation. It reports that BMA using g-priors, particularly g set to max(n, p^2), performs best when separation is absent, while LASSO gives the most stable results when separation occurs, and BMA with the local empirical Bayes prior stays competitive in both cases. This supplies some practical pointers for applied work where model uncertainty is routine. What is new is the scale of the comparison plus the explicit inclusion of separation scenarios, which many earlier studies on logistic variable selection left out. The use of real datasets alongside simulations is a reasonable step toward realism, and the preregistration adds credibility by limiting post-hoc adjustments. The findings line up with what one might expect from the literature on penalized methods and Bayesian averaging, so the central claims do not feel forced. The soft spot is the limited information on how the 11 datasets were picked and what predictor correlations or p/n ratios they actually span. If those conditions are narrow, the performance rankings may not carry over to other logistic settings with different separation mechanisms or multicollinearity patterns. Since only the abstract is available, it is also hard to verify the exact performance metrics or simulation code details. This paper is for statisticians and data analysts who fit logistic models regularly and need evidence on which methods handle uncertainty reliably. A reader looking for simulation-based guidance rather than new theory would find it useful. It deserves a serious referee because the comparison is broad, addresses a common practical problem, and rests on a preregistered design. I would send it to peer review so the dataset selection and implementation can be checked in detail.

Referee Report

1 major / 0 minor

Summary. The manuscript reports a preregistered simulation study comparing 28 variable selection and model averaging methods for logistic regression under model uncertainty. It employs 11 empirical datasets spanning ranges of sample sizes and predictors, along with simulations both with and without separation. The central findings are that BMA methods using g-priors (particularly g = max(n, p^2)) exhibit the strongest overall performance when separation is absent, penalized likelihood approaches such as LASSO are most stable when separation occurs, and BMA with the local empirical Bayes (EB-local) prior remains competitive in both regimes.

Significance. If the chosen datasets and simulation conditions prove representative, the results would supply useful practical guidance for applied researchers and machine-learning practitioners confronting model uncertainty in logistic regression. The preregistered design and explicit separation/non-separation distinction constitute clear strengths that would enhance the credibility of the performance rankings.

major comments (1)

[Abstract] Abstract: The description of the 11 empirical datasets supplies no information on selection criteria, p/n ratios, or correlation structures covered. Likewise, the precise mechanism and severity of separation induced in the simulations is unspecified. Because the reported superiority of g = max(n, p^2) BMA (absent separation) and LASSO (with separation) is load-bearing for the central claim, these omissions prevent assessment of whether the performance rankings generalize beyond the specific scenarios examined.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments and for recommending major revision. We agree that the abstract would benefit from greater specificity to help readers assess generalizability, and we have revised it accordingly while preserving brevity. Our point-by-point response to the major comment is provided below.

read point-by-point responses

Referee: [Abstract] Abstract: The description of the 11 empirical datasets supplies no information on selection criteria, p/n ratios, or correlation structures covered. Likewise, the precise mechanism and severity of separation induced in the simulations is unspecified. Because the reported superiority of g = max(n, p^2) BMA (absent separation) and LASSO (with separation) is load-bearing for the central claim, these omissions prevent assessment of whether the performance rankings generalize beyond the specific scenarios examined.

Authors: We acknowledge the validity of this observation for the original abstract. In the revised version we have added a concise clause describing the empirical datasets as having been selected to cover a broad range of p/n ratios (approximately 0.05 to 2), varying correlation structures, and sample sizes from small to moderate, drawn from publicly available sources in biomedical and social-science domains. We have also specified that separation was induced via complete separation in a controlled subset of simulation replicates by scaling the true coefficient vector until the maximum likelihood estimator diverged. These additions are intended to give readers immediate context for the reported performance rankings; fuller methodological details, including exact selection criteria and separation severity metrics, remain in the Methods and Simulation Design sections. revision: yes

Circularity Check

0 steps flagged

Empirical simulation study with no derivation chain or self-referential reductions

full rationale

The paper reports a preregistered comparison of 28 variable selection and model averaging methods for logistic regression, evaluated on 11 empirical datasets and targeted simulations (with and without separation). Its claims consist of performance rankings derived from these external benchmarks rather than any mathematical derivation, fitted parameters renamed as predictions, or load-bearing self-citations. No equations, ansatzes, uniqueness theorems, or prior-author results are invoked to support the central findings; the results are therefore self-contained against the independent data sources used. Concerns about dataset representativeness address generalizability, not circularity in any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As an empirical comparison study the central claim rests on standard logistic regression assumptions and the representativeness of the chosen datasets and simulation design rather than new free parameters or invented entities.

axioms (1)

domain assumption Observations are independent and the logit of the outcome probability is a linear function of the predictors
Standard modeling assumption invoked for all compared logistic regression methods.

pith-pipeline@v0.9.0 · 5489 in / 1181 out tokens · 45875 ms · 2026-05-17T04:01:26.390092+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets... BMA methods based on g-priors, particularly g = max(n, p²)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
cs.CL 2026-05 conditional novelty 5.0

Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.