A Bayes-Factor-Guided Approach to Post-Double Selection with Bootstrapped Multiple Imputation
Pith reviewed 2026-05-10 14:38 UTC · model grok-4.3
The pith
Treating detections of each variable across bootstrap-imputation iterations as Bernoulli trials lets a likelihood ratio accumulate into an approximate Bayes factor that supplies both an inclusion threshold and an automatic stopping rule.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that detection outcomes across bootstrap and multiple-imputation iterations can be treated as independent Bernoulli trials, so that a sequential likelihood-ratio statistic can be formed whose value supplies both a variable-inclusion criterion and a stopping rule for further iterations, all under an approximate Bayes-factor interpretation of the accumulated evidence.
What carries the argument
The sequential likelihood-ratio process on Bernoulli detection outcomes, read as an approximate Bayes factor for variable relevance.
If this is right
- The selected model contains only those variables whose accumulated evidence crosses a pre-chosen Bayes-factor threshold.
- The total number of bootstrap-imputation iterations is determined by the evidence process itself rather than fixed in advance.
- Final models are less dense than those produced by taking the union of selected variables across all iterations.
- Performance relative to union and other aggregation methods is demonstrated across 126 simulation scenarios and one real-data example.
Where Pith is reading between the lines
- The same Bernoulli-trial evidence accumulation could be applied to other repeated-perturbation schemes such as cross-validation folds or random feature subsamples.
- In high-dimensional or computationally expensive settings the data-driven stopping rule may reduce total run time by halting once evidence is sufficient.
- The approach offers a template for turning any repeated selection procedure into a sequential evidence-monitoring method.
Load-bearing premise
Detection outcomes across successive bootstrap-imputation iterations behave as independent Bernoulli trials with constant probability.
What would settle it
A simulation study in which the true relevant variables are known shows that the Bayes-factor procedure selects sets whose out-of-sample performance is no better than, or whose size differs markedly from, the sets obtained by simple union of selections across the same iterations.
Figures
read the original abstract
When variable selection methods are applied to bootstrapped and multiply imputed datasets, the set of selected variables typically varies across iterations. Aggregating results via the union rule can lead to overly dense models. We propose a sequential evidence aggregation procedure that models detection outcomes across perturbation iterations as Bernoulli trials and accumulates evidence for variable relevance through a likelihood-ratio process admitting an approximate Bayes-factor interpretation. The procedure provides both a variable inclusion criterion and a stopping rule that eliminates the need to fix the number of bootstrap-imputation iterations ex ante. A Monte Carlo study across 126 scenarios and an empirical illustration demonstrate the method's performance relative to existing aggregation approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a sequential evidence aggregation method for post-double selection on bootstrapped multiply-imputed data. Selection indicators across iterations are modeled as i.i.d. Bernoulli trials; a likelihood-ratio process is accumulated and interpreted as an approximate Bayes factor that supplies both a variable-inclusion threshold and a data-driven stopping rule, removing the need to pre-specify the number of bootstrap-imputation replicates. Performance is assessed via a Monte Carlo study over 126 scenarios and one empirical example, with comparisons to union-rule and other aggregation baselines.
Significance. If the approximate Bayes-factor construction remains valid, the procedure supplies a principled, adaptive alternative to fixed-iteration or union-based aggregation in missing-data variable selection, potentially improving sparsity without sacrificing detection power. The scale of the simulation design (126 scenarios) is a clear strength and allows broad exploration of operating characteristics.
major comments (3)
- [Sequential aggregation procedure] The modeling of detection outcomes as i.i.d. Bernoulli trials whose likelihood ratio yields an approximate Bayes factor (described in the sequential aggregation procedure) is load-bearing for both the inclusion criterion and the stopping rule. Because every replicate is generated from the same observed sample via resampling and draws from the identical posterior predictive, the indicators share finite-sample structure, imputation uncertainty, and collinearity; they are therefore dependent. Under dependence the product of marginal likelihoods no longer equals the joint likelihood ratio, the martingale property required for the stopping time fails, and the numerical value no longer corresponds to a Bayes factor. The Monte Carlo study reports no diagnostic for serial dependence and no sensitivity analysis that replaces the i.i.d. assumption with a Markov or exchangeable model.
- [Method description] No derivation or explicit justification of the Bayes-factor approximation is supplied, nor are the numerical thresholds for inclusion (e.g., BF > k) or stopping (e.g., BF crossing a boundary) stated. Without these quantities the Monte Carlo results cannot be reproduced or interpreted as evidence that the procedure controls error rates or improves sparsity relative to fixed-iteration baselines.
- [Monte Carlo study] The Monte Carlo design does not report standard errors or confidence bands on the reported performance metrics, nor does it examine sensitivity of the results to the specific choices of p0 < 0.5 and p1 > 0.5 used in the Bernoulli likelihoods. These omissions make it impossible to judge whether the claimed superiority over existing aggregation approaches is robust.
minor comments (2)
- Notation for the per-iteration selection indicator and the accumulated likelihood ratio should be introduced with a clear equation or algorithm box to improve readability.
- The abstract states that the procedure 'eliminates the need to fix the number of bootstrap-imputation iterations ex ante,' but the manuscript should clarify whether a maximum iteration cap is still imposed in practice and how it interacts with the stopping rule.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our paper. We address each of the major comments point by point below, providing clarifications and outlining the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Sequential aggregation procedure] The modeling of detection outcomes as i.i.d. Bernoulli trials whose likelihood ratio yields an approximate Bayes factor (described in the sequential aggregation procedure) is load-bearing for both the inclusion criterion and the stopping rule. Because every replicate is generated from the same observed sample via resampling and draws from the identical posterior predictive, the indicators share finite-sample structure, imputation uncertainty, and collinearity; they are therefore dependent. Under dependence the product of marginal likelihoods no longer equals the joint likelihood ratio, the martingale property required for the stopping time fails, and the numerical value no longer corresponds to a Bayes factor. The Monte Carlo study reports no diagnostic for serial dependence and no sensitivity analysis that replaces the i.i.d. assumption with a Markov
Authors: We acknowledge that the detection indicators are dependent due to the shared data source. The i.i.d. Bernoulli model is presented as an approximation that enables the sequential likelihood ratio accumulation and its interpretation as an approximate Bayes factor. While the strict martingale property may not hold, the procedure is designed to provide a practical stopping rule and inclusion threshold that performs well in finite samples, as evidenced by our simulations. In the revision, we will add a discussion of the dependence issue, include diagnostics for serial correlation in the indicators, and perform a sensitivity analysis using a first-order Markov model for the sequence of detections. revision: partial
-
Referee: [Method description] No derivation or explicit justification of the Bayes-factor approximation is supplied, nor are the numerical thresholds for inclusion (e.g., BF > k) or stopping (e.g., BF crossing a boundary) stated. Without these quantities the Monte Carlo results cannot be reproduced or interpreted as evidence that the procedure controls error rates or improves sparsity relative to fixed-iteration baselines.
Authors: We will include a detailed derivation of the likelihood-ratio process and its approximate Bayes factor interpretation in the revised methods section. We will also explicitly state the numerical thresholds used for variable inclusion and the stopping criterion (e.g., the specific BF values for moderate and strong evidence). This will enhance reproducibility and allow readers to better interpret the Monte Carlo results. revision: yes
-
Referee: [Monte Carlo study] The Monte Carlo design does not report standard errors or confidence bands on the reported performance metrics, nor does it examine sensitivity of the results to the specific choices of p0 < 0.5 and p1 > 0.5 used in the Bernoulli likelihoods. These omissions make it impossible to judge whether the claimed superiority over existing aggregation approaches is robust.
Authors: We agree that reporting standard errors and confidence intervals for the performance metrics would strengthen the presentation. We will add these to the Monte Carlo results. Additionally, we will include a sensitivity analysis varying the values of p0 and p1 to demonstrate the robustness of our findings to these choices. revision: yes
Circularity Check
No circularity: new sequential procedure rests on explicit modeling assumptions with external Monte Carlo validation
full rationale
The paper proposes a sequential evidence-aggregation method that explicitly models post-double-selection indicators across bootstrap-imputation replicates as Bernoulli trials and accumulates a likelihood-ratio statistic given an approximate Bayes-factor reading. This construction is introduced as a modeling choice rather than derived from the data; the stopping rule and inclusion threshold follow directly from the assumed i.i.d. Bernoulli likelihoods and the chosen p0/p1 thresholds. No equation reduces a claimed prediction to a parameter fitted on the same quantity, no self-citation supplies a uniqueness theorem, and the Monte Carlo study across 126 scenarios supplies independent performance checks. The derivation chain therefore remains self-contained and does not collapse to its inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bainter, S. A., McCauley, T. G., Fahmy, M. M., Goodman, Z. T., Kupis, L. and Rao, J. S.: 2023, Comparing bayesian variable selection to lasso approaches for applications in psychology,Psychometrikapp. 1–24. Belloni, A., Chernozhukov, V. and Hansen, C.: 2014a, High-dimensional methods and inference on structural and treatment effects,The Journal of Economi...
work page 2023
-
[2]
Chen, Q. and Wang, S.: 2013, Variable selection for multiply-imputed data with application to dioxin exposure study,Statistics in Medicine32(21), 3646–3659. Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., Batterman, S. A., Feldman, E. L. and Mukherjee, B.: 2020, Variable selection with multiply-imputed datasets: Choosing betwee...
work page 2013
-
[3]
Sellke, T., Bayarri, M. J. and Berger, J. O.: 2001, Calibration of p values for testing precise null hypotheses,The American Statistician55(1), 62–71. Wald, A.: 1945, Sequential tests of statistical hypotheses,The Annals of Mathematical Statistics16(2), 117–186. 40 Wood, A. M., White, I. R. and Royston, P.: 2008, How should variable selection be performed...
work page 2001
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.