Recognition: unknown
Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference
Pith reviewed 2026-05-10 03:41 UTC · model grok-4.3
The pith
Embedding the data selection process directly into generative simulators enables debiased amortized Bayesian inference without needing tractable likelihoods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows both debiased estimates and explicit tests for the presence of bias, with integrated diagnostics for simulated-versus-observed discrepancies and posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including cases where likelihood-based approaches yield biased estimates.
What carries the argument
Bias-aware neural posterior estimation that incorporates the selection mechanism directly into the generative simulator.
If this is right
- Debiased parameter estimates become available for models with intractable likelihoods or latent dynamics.
- Users can explicitly test whether selection bias is present by comparing posteriors with and without the embedded mechanism.
- Diagnostics detect mismatches between simulated and observed data as well as miscalibrated uncertainty.
- The framework succeeds in applications where inverse-probability weighting or explicit likelihood models produce biased results.
Where Pith is reading between the lines
- The same embedding strategy could be used to handle other data-generating distortions such as confounding or measurement error by simulating them explicitly.
- In survey or epidemiological practice this reduces reliance on post-hoc weighting and allows joint inference on both parameters and the selection process itself.
- Sensitivity checks on the selection model specification become a standard diagnostic step before trusting the posteriors.
Load-bearing premise
The selection process can be fully and accurately specified and embedded into the generative simulator without introducing new unmodeled biases or needing knowledge of unobserved variables.
What would settle it
Apply the method to a controlled dataset where the true selection rule is known and varied, then check whether the recovered posteriors remain calibrated and unbiased when the embedded selection model matches the true rule versus when it is deliberately misspecified.
Figures
read the original abstract
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a bias-aware simulation-based inference framework that embeds the selection mechanism directly into the generative simulator to enable amortized neural posterior estimation (NPE) without requiring tractable likelihoods. It claims this recasts selection bias correction as a simulation problem, allows debiased estimates, explicit bias testing, and recovers well-calibrated posteriors across three statistical applications with diverse selection mechanisms (including cases where likelihood-based methods are biased). Diagnostics for data discrepancy and posterior calibration are integrated.
Significance. If the empirical results hold under the stated assumptions, the work would meaningfully extend simulation-based inference to handle selection bias in complex models common to epidemiology and surveys. It provides a practical alternative to inverse-probability weighting or explicit likelihood corrections when likelihoods are intractable. Credit is due for integrating calibration diagnostics and for demonstrating the approach on multiple applications. However, significance is limited by the untested requirement that the selection process (including dependence on unobserved variables) be known exactly and correctly specified in the simulator.
major comments (2)
- [§3 (Framework)] §3 (Framework): The central claim that embedding a known selection process P(select | y, x, z) into the simulator yields well-calibrated posteriors via amortized NPE holds only under correct specification. The manuscript provides no sensitivity analysis or robustness checks to misspecification of the selection model, which is load-bearing because any mismatch reintroduces bias; the three applications use hand-specified rules and thus do not test this case.
- [§5 (Applications and Results)] §5 (Applications and Results): While well-calibrated posteriors are asserted, specific quantitative validation (e.g., posterior coverage rates, calibration plots, or comparison metrics against likelihood-based baselines) is not detailed sufficiently to evaluate whether the data support superiority or calibration in the presence of selection on unobserved variables.
minor comments (2)
- [Abstract] Abstract: The phrase 'superiority over likelihood-based methods' is stated without accompanying metrics or conditions, reducing clarity.
- [§2 or §3] Notation for the generative model including the selection indicator should be introduced with an explicit equation early in the methods to aid readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments identify important areas where additional analysis and quantitative detail will strengthen the manuscript. We address each major comment point-by-point below, agree that revisions are warranted, and outline the specific changes we will make.
read point-by-point responses
-
Referee: §3 (Framework): The central claim that embedding a known selection process P(select | y, x, z) into the simulator yields well-calibrated posteriors via amortized NPE holds only under correct specification. The manuscript provides no sensitivity analysis or robustness checks to misspecification of the selection model, which is load-bearing because any mismatch reintroduces bias; the three applications use hand-specified rules and thus do not test this case.
Authors: We agree that correct specification of the selection mechanism is a necessary assumption for the framework to produce well-calibrated posteriors, as is the case for any generative model in simulation-based inference. The manuscript states this modeling assumption explicitly. We did not include sensitivity analyses in the original submission, as the primary goal was to establish the approach under known selection processes. In the revision we will add a new subsection to §3 that discusses the consequences of misspecification and report a sensitivity study on one application (e.g., by perturbing selection probabilities in the epidemiological example) to quantify the resulting bias in posterior estimates. This will directly address the load-bearing nature of the assumption. revision: yes
-
Referee: §5 (Applications and Results): While well-calibrated posteriors are asserted, specific quantitative validation (e.g., posterior coverage rates, calibration plots, or comparison metrics against likelihood-based baselines) is not detailed sufficiently to evaluate whether the data support superiority or calibration in the presence of selection on unobserved variables.
Authors: We thank the referee for noting that the quantitative evidence could be presented more explicitly. The original manuscript reports calibration diagnostics and shows that likelihood-based methods are biased while our approach recovers the ground-truth parameters in the presence of selection on unobserved variables. To improve rigor and transparency, the revised version will include explicit posterior coverage rates (50%, 90%, and 95% credible intervals) computed over repeated simulations, additional calibration plots for all three applications, and tabulated metrics (bias, RMSE, and interval coverage) comparing our method to likelihood-based baselines where feasible. These additions will appear in the main text of §5 and the supplementary material. revision: yes
Circularity Check
No significant circularity; method is a direct application of standard simulation-based inference
full rationale
The paper proposes embedding a known selection mechanism into the generative simulator and then applying amortized neural posterior estimation (NPE) to recover posteriors. This is a straightforward methodological extension of existing SBI techniques rather than a derivation that reduces to fitted inputs or self-referential definitions. Validation rests on three independent empirical applications with hand-specified selection rules and calibration diagnostics, none of which are shown to be tautological by construction in the provided text. No load-bearing equations, uniqueness theorems, or ansatzes are imported via self-citation in a way that collapses the central claim. The framework is therefore self-contained against external simulation benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selection process can be explicitly and accurately incorporated into the generative simulator
Reference graph
Works this paper leans on
-
[1]
Adam J. Streeter and Nan Xuan Lin and Louise Crathorne and Marcela Haasova and Christopher Hyde and David Melzer and William E. Henley , keywords =. Adjusting for unmeasured confounding in nonrandomized longitudinal studies: a methodological review , journal =. 2017 , issn =. doi:https://doi.org/10.1016/j.jclinepi.2017.04.022 , url =
-
[2]
Multivariate Behavioral Research , volume=
An introduction to propensity score methods for reducing the effects of confounding in observational studies , author=. Multivariate Behavioral Research , volume=. 2011 , publisher=
2011
-
[3]
American Journal of Epidemiology , volume=
Using sensitivity analyses for unobserved confounding to address covariate measurement error in propensity score methods , author=. American Journal of Epidemiology , volume=. 2018 , publisher=
2018
-
[4]
Johan Wiley & Sons Inc , year=
Sampling techniques , author=. Johan Wiley & Sons Inc , year=
-
[5]
Econometrica: Journal of the Econometric Society , pages=
Sample selection bias as a specification error , author=. Econometrica: Journal of the Econometric Society , pages=. 1979 , publisher=
1979
-
[6]
2019 , publisher=
Statistical analysis with missing data , author=. 2019 , publisher=
2019
-
[7]
Statistics in Medicine , volume=
The design of simulation studies in medical statistics , author=. Statistics in Medicine , volume=. 2006 , publisher=
2006
-
[8]
Linero, Antonio R and Daniels, Michael J , journal=
-
[9]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Inference for non-random samples , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1997 , publisher=
1997
-
[10]
A robust
Bai, Ray and Lin, Lifeng and Boland, Mary R and Chen, Yong , journal=. A robust
-
[11]
Biostatistics , volume=
Incorporating prior beliefs about selection bias into the analysis of randomized trials with missing outcomes , author=. Biostatistics , volume=. 2003 , publisher=
2003
-
[12]
American journal of epidemiology , volume=
Selection bias in epidemiologic studies , author=. American journal of epidemiology , volume=. 1981 , publisher=
1981
-
[13]
International Journal of Epidemiology , volume=
A systematic review of quantitative bias analysis applied to epidemiological research , author=. International Journal of Epidemiology , volume=. 2021 , publisher=
2021
-
[14]
2008 , publisher=
Modern epidemiology , author=. 2008 , publisher=
2008
-
[15]
The Thirteenth International Conference on Learning Representations , year=
Robust Simulation-Based Inference under Missing Data via Neural Processes , author=. The Thirteenth International Conference on Learning Representations , year=
-
[16]
Proceedings of the 41st International Conference on Machine Learning , pages=
All-in-one simulation-based inference , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
-
[17]
Verifying probabilistic forecasts: Calibration and sharpness
Probabilistic. Journal of the Royal Statistical Society Series B: Statistical Methodology , author =. 2007 , pages =. doi:10.1111/j.1467-9868.2007.00587.x , number =
-
[18]
Journal of statistical software , volume=
Stan: A probabilistic programming language , author=. Journal of statistical software , volume=
-
[19]
Advances in Neural Information Processing Systems , volume=
Deep sets , author=. Advances in Neural Information Processing Systems , volume=
-
[20]
Flow Matching for Generative Modeling
Flow matching for generative modeling , author=. arXiv preprint arXiv:2210.02747 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
Advances in Neural Information Processing Systems , volume=
Flow matching for scalable simulation-based inference , author=. Advances in Neural Information Processing Systems , volume=
-
[22]
Advances in Neural Information Processing Systems , volume=
Consistency models for scalable and fast simulation-based inference , author=. Advances in Neural Information Processing Systems , volume=
-
[23]
Proceedings of the 40th International Conference on Machine Learning , pages=
Consistency models , author=. Proceedings of the 40th International Conference on Machine Learning , pages=
-
[24]
Advances in Neural Information Processing Systems , volume=
Flexible statistical inference for mechanistic models of neural dynamics , author=. Advances in Neural Information Processing Systems , volume=
-
[25]
Statistics and Computing , volume=
Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison , author=. Statistics and Computing , volume=. 2022 , publisher=
2022
-
[26]
arXiv preprint arXiv:2506.09258 , year=
CFMI: Flow Matching for Missing Data Imputation , author=. arXiv preprint arXiv:2506.09258 , year=
-
[27]
Amortized
Habermann, Daniel and Schmitt, Marvin and K. Amortized. Bayesian Analysis , volume=. 2025 , publisher=
2025
-
[28]
American Journal of Epidemiology , volume=
Estimating prevalence from the results of a screening test , author=. American Journal of Epidemiology , volume=. 1978 , publisher=
1978
-
[29]
International conference on machine learning , pages=
Set transformer: A framework for attention-based permutation-invariant neural networks , author=. International conference on machine learning , pages=. 2019 , organization=
2019
-
[30]
Solving stochastic inverse problems with stochastic
Zhang, Yi and Mikelsons, Lars , booktitle=. Solving stochastic inverse problems with stochastic. 2023 , organization=
2023
-
[31]
Current Epidemiology Reports , volume=
Are all biases missing data problems? , author=. Current Epidemiology Reports , volume=. 2015 , publisher=
2015
-
[32]
BMC Medical Research Methodology , volume=
Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study , author=. BMC Medical Research Methodology , volume=. 2024 , publisher=
2024
-
[33]
2025 , journal=
Flow IV: Counterfactual Inference In Nonseparable Outcome Models Using Instrumental Variables , author=. 2025 , journal=
2025
-
[34]
Advances in Neural Information Processing Systems , volume=
Denoising diffusion probabilistic models , author=. Advances in Neural Information Processing Systems , volume=
-
[35]
Advances in Neural Information Processing Systems , volume=
Generative modeling by estimating gradients of the data distribution , author=. Advances in Neural Information Processing Systems , volume=
-
[36]
Diffusion Models in Simulation-Based Inference: A Tutorial Review , author=. arXiv preprint arXiv:2512.20685 , year=
-
[37]
European Journal of Epidemiology , volume=
A multi-state model based reanalysis of the Framingham heart study , author=. European Journal of Epidemiology , volume=. 2019 , publisher=
2019
-
[38]
The New England Journal of Medicine , author =
Letter to ". The New England Journal of Medicine , author =. 2016 , keywords =. doi:10.1056/NEJMc1604823 , language =
-
[39]
Biometrical Journal , author =
Estimating hazard ratios in cohort data with missing disease information due to death:. Biometrical Journal , author =. 2017 , keywords =. doi:10.1002/bimj.201500167 , language =
-
[40]
Missing information caused by death leads to bias in relative risk estimates. , volume =. Journal of Clinical Epidemiology , author =. 2014 , keywords =. doi:10.1016/j.jclinepi.2014.05.010 , language =
-
[41]
International Journal of Epidemiology , author =
Interval-censored time-to-event and competing risk with death: is the illness-death model more accurate than the. International Journal of Epidemiology , author =. 2013 , pages =. doi:10.1093/ije/dyt126 , language =
-
[42]
Biostatistics , volume=
A penalized likelihood approach for an illness--death model with interval-censored data: application to age-specific incidence of dementia , author=. Biostatistics , volume=. 2002 , publisher=
2002
-
[43]
Discriminative calibration: Check
Yao, Yuling and Domke, Justin , journal=. Discriminative calibration: Check
-
[44]
Linhart, Julia and Gramfort, Alexandre and Rodrigues, Pedro , journal=
-
[45]
International Conference on Learning Representations , year=
Revisiting Classifier Two-Sample Tests , author=. International Conference on Learning Representations , year=
-
[46]
New England Journal of Medicine , volume=
Incidence of dementia over three decades in the Framingham Heart Study , author=. New England Journal of Medicine , volume=. 2016 , publisher=
2016
-
[47]
Sensitivity-Aware Amortized
Lasse Elsem. Sensitivity-Aware Amortized. Transactions on Machine Learning Research , issn=. 2024 , url=
2024
-
[48]
Head-to-head evaluation of seven different seroassays including direct viral neutralisation in a representative cohort for
Olbrich, Laura and Castelletti, Noemi and Schaelte, Yannik and Gari, Merce and Puetz, Peter and Bakuli, Abhishek and Pritsch, Michael and Kroidl, Inge and Saathoff, Elmar and Guggenbuehl Noller, Jessica Michelle and others , journal=. Head-to-head evaluation of seven different seroassays including direct viral neutralisation in a representative cohort for...
2021
-
[49]
1994 , publisher=
An introduction to the bootstrap , author=. 1994 , publisher=
1994
-
[50]
International Conference on Learning Representations , year=
Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
-
[51]
2017 , url=
Ilya Loshchilov and Frank Hutter , booktitle=. 2017 , url=
2017
-
[52]
Fifth Symposium on Advances in Approximate
Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models , author=. Fifth Symposium on Advances in Approximate. 2023 , url=
2023
-
[53]
npj Systems Biology and Applications , year=
Simulation-based inference of cell migration dynamics in complex spatial environments , author=. npj Systems Biology and Applications , year=
-
[54]
arXiv preprint arXiv:2502.06492 , year=
An Overview and Recent Developments in the Analysis of Multistate Processes , author=. arXiv preprint arXiv:2502.06492 , year=
-
[55]
Artificial Intelligence Review , volume=
Deep learning for survival analysis: a review , author=. Artificial Intelligence Review , volume=. 2024 , publisher=
2024
-
[56]
Lars Kühmichel and Jerry M. Huang and Valentin Pratz and Jonas Arruda and Hans Olischläger and Daniel Habermann and Simon Kucharsky and Lasse Elsemüller and Aayush Mishra and Niels Bracher and Svenja Jedhoff and Marvin Schmitt and Paul-Christian Bürkner and Stefan T. Radev , year=
-
[57]
Current Epidemiology Reports , volume=
Selection mechanisms and their consequences: understanding and addressing selection bias , author=. Current Epidemiology Reports , volume=. 2020 , publisher=
2020
-
[58]
Clinical and Translational Science , volume=
Big data and large sample size: a cautionary note on the potential for bias , author=. Clinical and Translational Science , volume=. 2014 , publisher=
2014
-
[59]
BMJ Medicine , volume=
Defining representativeness of study samples in medical and population health research , author=. BMJ Medicine , volume=
-
[60]
Nature Genetics , volume=
Using large-scale population-based data to improve disease risk assessment of clinical variants , author=. Nature Genetics , volume=. 2025 , publisher=
2025
-
[61]
Pharmacoepidemiology , pages=
What is pharmacoepidemiology? , author=. Pharmacoepidemiology , pages=. 2019 , publisher=
2019
-
[62]
Journal of the American statistical Association , volume=
A generalization of sampling without replacement from a finite universe , author=. Journal of the American statistical Association , volume=. 1952 , publisher=
1952
-
[63]
arXiv preprint arXiv:2602.11325 , year=
Amortised and provably-robust simulation-based inference , author=. arXiv preprint arXiv:2602.11325 , year=
-
[64]
arXiv preprint arXiv:2602.09161 , year=
Minimum Distance Summaries for Robust Neural Posterior Estimation , author=. arXiv preprint arXiv:2602.09161 , year=
-
[65]
Does Unsupervised Domain Adaptation Improve the Robustness of Amortized
Lasse Elsem. Does Unsupervised Domain Adaptation Improve the Robustness of Amortized. Transactions on Machine Learning Research , issn=. 2025 , url=
2025
-
[66]
arXiv preprint arXiv:2512.22999 , year=
Bracher, Niels and K. arXiv preprint arXiv:2512.22999 , year=
-
[67]
arXiv preprint arXiv:2503.24011 , year=
Simulations in statistical workflows , author=. arXiv preprint arXiv:2503.24011 , year=
-
[68]
Protocol of a population-based prospective
Radon, Katja and Saathoff, Elmar and Pritsch, Michael and Guggenb. Protocol of a population-based prospective. BMC Public Health , volume=. 2020 , publisher=
2020
-
[69]
Serial interval in households infected with
an der Heiden, Matthias and Buchholz, Udo , year=. Serial interval in households infected with. doi:10.1017/S0950268822001248 , journal=
-
[70]
Uncovering the Impact of Control Strategies on the Transmission Pattern of. China CDC Weekly , volume =. 2022 , issn =. doi:10.46234/ccdcw2022.208 , author =
-
[71]
The Lancet Microbe , volume =. 2023 , issn =. doi:10.1016/S2666-5247(23)00005-8 , author =
-
[72]
Environmental Microbiology , volume =
Saliva for molecular detection of. Environmental Microbiology , volume =. doi:10.1111/1462-2920.16151 , author =
-
[73]
Preschool-age children maintain a distinct memory
Manfroi, Beno. Preschool-age children maintain a distinct memory. Science Translational Medicine , volume=. 2024 , publisher=
2024
-
[74]
Accounting for Selection Effects in Supernova Cosmology with Simulation-Based Inference and Hierarchical
Boyd, Benjamin M and Grayling, Matthew and Thorp, Stephen and Mandel, Kaisey S , journal=. Accounting for Selection Effects in Supernova Cosmology with Simulation-Based Inference and Hierarchical
-
[75]
Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate
Wiqvist, Samuel and Mattei, Pierre-Alexandre and Picchini, Umberto and Frellsen, Jes , booktitle =. Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate. 2019 , bdsk-url-1 =
2019
-
[76]
Informative and adaptive distances and summary statistics in sequential approximate
Sch. Informative and adaptive distances and summary statistics in sequential approximate. Plos one , number =
-
[77]
An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation , url =
Arruda, Jonas and Sch\". An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation , url =. Proceedings of the 41st International Conference on Machine Learning , date-added =. 2024 , bdsk-url-1 =
2024
-
[78]
Missing data in amortized simulation-based neural posterior estimation , volume =
Zijian Wang and Jan Hasenauer and Yannik Sch. Missing data in amortized simulation-based neural posterior estimation , volume =. 2024 , bdsk-url-1 =. doi:10.1371/journal.pcbi.1012184 , journal =
-
[79]
arXiv preprint arXiv:1811.08723 , title =
Durkan, Conor and Papamakarios, George and Murray, Iain , date-added =. arXiv preprint arXiv:1811.08723 , title =
-
[80]
Flow Matching for Scalable Simulation-Based Inference , volume =
Dax, Maximilian and Wildberger, Jonas and Buchholz, Simon and Green, Stephen R and Macke, Jakob H and Sch. Flow Matching for Scalable Simulation-Based Inference , volume =. Advances in Neural Information Processing Systems 36 (NeurIPS 2023) , date-added =
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.