Convergence fragility in probit Bayesian kernel machine regression implemented in the bkmr R package for binary-outcome environmental mixture analyses: a simulation study
Pith reviewed 2026-07-03 03:15 UTC · model grok-4.3
The pith
Completion of a probit BKMR fit in bkmr does not ensure MCMC convergence of the retained draws.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Of 431 prespecified simulation tasks using family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, M = 4 and X = 3*cos(z1) + 2*rnorm(n), 430 returned fitted objects but only 30 achieved rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS both ≥ 400. The study therefore concludes that completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws and that analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS.
What carries the argument
Four-chain MCMC simulation with bkmr::SimData() for binary data generation and bkmr::kmbayes() for probit model fitting, evaluated by rank-normalized R-hat, bulk-ESS and tail-ESS.
If this is right
- Fit completion alone is not a reliable indicator that retained draws carry adequate effective posterior information in probit BKMR.
- Fixed iteration counts or default settings may leave many analyses without converged chains for binary outcomes.
- Reporting only successful model fits without diagnostics risks basing environmental mixture conclusions on under-converged samples.
- Users must document the number of chains, warmup and retained iterations plus the three convergence statistics for reproducible results.
Where Pith is reading between the lines
- Convergence shortfalls may be more common with binary than continuous outcomes under the same kernel machine setup.
- Package maintainers could add automatic post-fit diagnostic summaries and warnings when thresholds are not met.
- Re-running non-converged tasks with altered initial values or longer chains offers a direct way to test whether the fragility is fixable within the current sampler.
Load-bearing premise
The chosen simulation parameters and four-chain setup with the given seed produce representative cases for typical binary-outcome environmental mixture analyses.
What would settle it
Repeating the identical simulation protocol but with substantially more iterations per chain and observing that nearly all 431 tasks then satisfy the combined R-hat and ESS criteria would falsify the reported fragility.
read the original abstract
Background. Bayesian kernel machine regression (BKMR) is widely used for exposure-mixture analyses with binary outcomes through a probit extension. Because a bkmr fit can complete without providing adequate effective posterior information, simulation studies should separate execution success from MCMC convergence diagnostics. Methods. We evaluated the public bkmr probit workflow using bkmr::SimData() for data generation, bkmr::kmbayes() for model fitting, and posterior for convergence diagnostics. The balanced generator used family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, and M = 4. SimData() generated the covariate as X = 3*cos(z1) + 2*rnorm(n). Four chains were initialized with chain-specific randomized starting values generated reproducibly from the fixed initial-value base seed 20260621. These values affected only the initial state of the sampler and did not alter the BKMR model, default priors, or Metropolis-Hastings proposals. Results. Of 431 prespecified tasks, 430 returned fitted objects and one task had a numerical non-completion. Diagnostic adequacy was limited: rank-normalized R-hat <= 1.01 threshold was achieved in 55/431 tasks, bulk-ESS >= 400 in 85/431, tail-ESS >= 400 in 44/431, and both ESS criteria in 44/431. The primary diagnostic criterion, R-hat at or below the 1.01 threshold with both bulk-ESS and tail-ESS >= 400, was met in 30/431 prespecified tasks, corresponding to 30/430 completed fits. Conclusions. Completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws. Applied analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS rather than rely on a fixed iteration count or on fit completion alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a simulation study of the probit BKMR implementation in the bkmr R package. Using bkmr::SimData() with family="binomial", hfun=2, beta.true=0.5, ind=1:2, M=4 and X=3*cos(z1)+2*rnorm(n), and bkmr::kmbayes() with four chains initialized from seed 20260621, the authors executed 431 prespecified tasks. Of the 430 completed fits, only 30 satisfied the joint convergence criteria of rank-normalized R-hat ≤ 1.01 together with bulk-ESS ≥ 400 and tail-ESS ≥ 400. The central claim is that fit completion alone does not guarantee adequate MCMC convergence of retained draws and that applied analyses should report chain count, iterations, R-hat, and both ESS diagnostics.
Significance. The result, if it holds, supplies direct, reproducible evidence that standard convergence diagnostics are frequently not met even when bkmr probit fits complete. The simulation design relies on externally generated data and independent posterior diagnostics rather than self-referential quantities, and the large number of prespecified tasks (431) with fixed initialization strengthens the observation within the tested scenarios. This supports the practical recommendation to report full MCMC diagnostics instead of relying on completion status or fixed iteration counts.
minor comments (1)
- The abstract states that four chains were run but does not report the default or chosen values of iter, burnin, or thin; adding these numbers would allow readers to interpret the reported ESS thresholds directly from the methods description.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, including the recognition of its reproducible design and practical implications for reporting MCMC diagnostics in probit BKMR analyses. The recommendation to accept is appreciated.
Circularity Check
No significant circularity
full rationale
The manuscript reports an empirical simulation study that generates data via bkmr::SimData, fits models via bkmr::kmbayes, and tallies the fraction of completed fits that satisfy external, pre-specified MCMC diagnostics (rank-normalized R-hat ≤ 1.01 together with bulk-ESS and tail-ESS ≥ 400). These counts are direct observations from independently seeded runs; they are not obtained by fitting any parameter to the target convergence metric, by redefining the metric in terms of itself, or by invoking a self-citation chain whose validity depends on the present results. The central claim therefore rests on external software behavior and standard diagnostic thresholds rather than on any internal reduction to the paper's own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The rank-normalized R-hat and effective sample size thresholds are valid indicators of MCMC convergence for this model.
Reference graph
Works this paper leans on
-
[1]
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures
Bobb JF, Valeri L, Claus Henn B, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015;16(3):493-508
2015
-
[2]
Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression
Bobb JF, Claus Henn B, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. 2018;17(1):67
2018
-
[3]
An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length
Gibson EA, Nunez Y, Abuawad A, et al. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health. 2019;18(1):76
2019
-
[4]
Bayesian analysis of binary and polychotomous response data
Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993;88(422):669-679
1993
-
[5]
Inference from iterative simulation using multiple sequences
Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457-472
1992
-
[6]
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. Rank-normalization, folding, and localization: An improved Rˆ for assessing convergence of MCMC (with discussion). Bayesian Anal. 2021;16(2). doi:10.1214/20-ba1221
-
[7]
Stan Reference Manual
Stan Development Team. Stan Reference Manual. Version 2.39. 2026. Accessed June 21, 2026. https://mc- stan.org/docs/reference-manual/
2026
-
[8]
Example using the bkmr R package for probit regression with simulated data
Bobb JF. Example using the bkmr R package for probit regression with simulated data. 2018. Accessed June 9, 2026. https://jenfb.github.io/bkmr/ProbitEx.html
2018
-
[9]
Duttweiler L, Klus J, Coull BA, Geller RJ, Henn BC, Thurston SW. The traceplot thickens: Developing all- purpose convergence diagnostics for any Markov Chain Monte Carlo algorithm. arXiv [statCO]. Published online August 27, 2024. doi:10.48550/arXiv.2408.15392
-
[10]
bkmr: Bayesian kernel machine regression
Bobb JF. bkmr: Bayesian kernel machine regression. R package version 0.2.2.9000 (development version, commit 45413e338a316362d629f53bd2a917c4bf485c1e). 2024. Accessed June 9, 2026. https://github.com/jenfb/bkmr
2024
-
[11]
The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis
Tanvir Hasan K, Odom G, Bursac Z, Ibrahimou B. The sensitivity of Bayesian kernel machine regression (BKMR) to data distribution: a comprehensive simulation analysis. J Stat Comput Simul. 2026;96(7):1752- 1771
2026
-
[12]
Convergence diagnostics for Markov chain Monte Carlo
Roy V. Convergence diagnostics for Markov chain Monte Carlo. Annu Rev Stat Appl. 2020;7(1):387-412
2020
-
[13]
binomial
Bürkner PC, Gabry J, Kay M, Vehtari A. posterior: Tools for Working with Posterior Distributions.” R package version 1.7.0,. 2026. Accessed June 19, 2026. https://cran.r- project.org/web/packages/posterior/index.html Supplemental Digital Content (eAppendix) Convergence fragility in probit Bayesian kernel machine regression for binary-outcome environmental...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.