Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification

Antar Chandra Das; Apu Chandra Das; Ashim Chandra Das; Md Robiul Islam Talukder; Rakhi Chowdhury; Sakib Salam

arxiv: 2605.20633 · v1 · pith:BTAQP5YRnew · submitted 2026-05-20 · 📊 stat.ME · stat.AP

Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification

Apu Chandra Das , Sakib Salam , Md Robiul Islam Talukder , Ashim Chandra Das , Antar Chandra Das , Rakhi Chowdhury This is my paper

Pith reviewed 2026-05-21 03:09 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords propensity scorecausal inferencemodel misspecificationinverse probability weightingaugmented inverse probability weightingobservational studiessimulation study

0 comments

The pith

Augmented inverse probability weighting stays stable for causal estimates when models are misspecified

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how response surface modeling, inverse probability weighting, and augmented inverse probability weighting perform when the propensity score or outcome models are wrong. It runs extensive simulations that vary the type and degree of misspecification, sample size, and covariate correlations, while comparing logistic regression to random forests, support vector machines, and linear discriminant analysis for estimating propensity scores. The simulations show that augmented inverse probability weighting keeps bias and variance low in most cases because it is doubly robust. Inverse probability weighting breaks down quickly with misspecified propensity scores or unstable machine-learning weights. Response surface modeling works only when the outcome model is correct. The same patterns appear in applications to the ACTG175 trial and Alzheimer's neuroimaging data.

Core claim

AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified.

What carries the argument

The doubly robust property of the augmented inverse probability weighting estimator, which combines inverse probability weights with an outcome regression to remain consistent if either the propensity score model or the outcome model is correct.

If this is right

AIPW reduces sensitivity to errors in specifying the propensity score model.
Machine learning methods for propensity scores should be paired with doubly robust estimators rather than used with plain inverse probability weighting.
Response surface modeling delivers unbiased estimates only when the outcome model is correctly specified.
Real-data analyses gain reliability by comparing multiple estimators rather than relying on a single approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Analysts working with high-dimensional or complex observational data may default to doubly robust methods when employing flexible machine learning for confounding adjustment.
The results suggest that simulation-based comparisons under controlled misspecification can help select estimators before applying them to real studies.
Extending the evaluation to targeted maximum likelihood estimation or other doubly robust variants could test whether similar robustness holds beyond AIPW.

Load-bearing premise

The simulated scenarios with varying levels of PS and outcome model misspecification, sample sizes, and covariate correlation structures adequately capture the types and degrees of misspecification that occur in real observational data applications.

What would settle it

A new simulation or real dataset in which the true causal effect is known and AIPW exhibits larger bias or poorer coverage than IPW under severe double misspecification of both propensity score and outcome models.

read the original abstract

Propensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard simulation comparison confirming AIPW's double robustness under misspecification, with little new beyond prior literature.

read the letter

The paper's core message is that AIPW gives more stable causal estimates than IPW when the propensity score model is misspecified, thanks to its double robustness, while response surface modeling needs a correct outcome model. This matches established theory but comes from a fresh set of simulations. What the authors do well is run a broad simulation study comparing logistic regression to random forests, support vector machines, and linear discriminant analysis for propensity score estimation. They test combinations of correct and misspecified models for both the treatment and outcome, across sample sizes and covariate correlations. They evaluate using bias, absolute bias, root mean squared error, and confidence interval width. The applications to the ACTG175 dataset and ADNI data add a practical layer. The simulations appear systematic, which is a plus for this type of work. The main soft spot is that these controlled misspecification scenarios might not match the subtler or more complex errors common in real observational studies, such as unmodeled interactions or effects that don't fit the ML methods used. The real-data examples can't verify the results since there's no ground truth for the causal effect. It would be good to have more information on replication counts and precise misspecification definitions to assess how reproducible the findings are. This paper would interest applied statisticians and researchers in fields like medicine or epidemiology who work with observational data and worry about model assumptions. It engages with the literature in a straightforward way. I would bring it to a reading group to talk about practical choices in causal estimation. I probably wouldn't cite it myself, as the results reinforce known points rather than introduce new ones. Still, it deserves serious peer review because the topic is relevant and the approach is solid enough to warrant referee input on the details.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates the performance of response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW) for causal effect estimation in observational studies under varying degrees of propensity score (PS) and outcome model misspecification. Simulations compare logistic regression with machine learning methods (random forests, SVM, LDA) for PS estimation across combinations of correct/misspecified models, sample sizes, and covariate correlation structures, using metrics such as bias, absolute bias, RMSE, and CI width. Real-data illustrations are provided on the ACTG175 and ADNI datasets. The central claim is that AIPW yields robust estimates due to its doubly robust property while IPW is sensitive to PS misspecification and unstable ML-based PS estimates.

Significance. If the simulation design adequately represents realistic misspecification patterns, the results would offer practical guidance for selecting doubly robust estimators when pairing flexible ML methods with PS-based causal inference. The inclusion of two real datasets adds applied relevance, though the absence of ground truth limits confirmatory power.

major comments (2)

[Simulation Study] Simulation section: the construction of misspecification scenarios (explicit combinations of correct/misspecified logistic or ML models) does not include omitted interactions, non-monotonic effects, or high-dimensional sparse signals that commonly arise in observational data; this directly affects whether the reported stability ordering (AIPW robust, IPW unstable) generalizes beyond the chosen simulation grid.
[Real-World Applications] Real-data applications: the ACTG175 and ADNI examples lack ground truth, so they cannot independently confirm the simulation-derived ranking of estimators; without additional benchmarks or sensitivity checks, these sections do not strengthen the central claim.

minor comments (2)

[Abstract and Methods] The abstract and methods would benefit from explicit statements of the number of Monte Carlo replications and the precise functional forms used to induce misspecification.
[Introduction] Notation for the doubly robust property and the definitions of the estimators could be introduced earlier with a short equation to aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. We respond to the major comments point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Simulation Study] Simulation section: the construction of misspecification scenarios (explicit combinations of correct/misspecified logistic or ML models) does not include omitted interactions, non-monotonic effects, or high-dimensional sparse signals that commonly arise in observational data; this directly affects whether the reported stability ordering (AIPW robust, IPW unstable) generalizes beyond the chosen simulation grid.

Authors: We acknowledge that our simulation scenarios do not encompass all possible forms of misspecification, such as omitted interactions or non-monotonic effects. Our design emphasizes misspecification arising from the choice between parametric logistic regression and machine learning methods for the propensity score model, which is central to the paper's focus on integrating ML with causal estimators. In the revised manuscript, we will include additional simulation scenarios that incorporate omitted interactions and non-monotonic relationships in the data generating process to better evaluate the generalizability of the AIPW robustness. For high-dimensional sparse signals, we will discuss this as a limitation and suggest it for future work, as expanding to very high dimensions may require substantial additional computational resources. revision: partial
Referee: [Real-World Applications] Real-data applications: the ACTG175 and ADNI examples lack ground truth, so they cannot independently confirm the simulation-derived ranking of estimators; without additional benchmarks or sensitivity checks, these sections do not strengthen the central claim.

Authors: We agree with the referee that the real-data examples cannot confirm the simulation results due to the lack of ground truth. These applications are presented to demonstrate the implementation and potential discrepancies in estimates when applying the methods to real observational data. To strengthen this section, we will incorporate additional sensitivity checks, including alternative model specifications and bootstrap-based comparisons of estimator variability. We will also revise the text to emphasize that these examples serve to illustrate practical considerations rather than to validate the simulation findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical simulation and application study

full rationale

The paper's claims rest on direct computation of bias, RMSE, and related metrics from explicitly constructed simulation scenarios (combinations of correct/misspecified logistic and ML models for PS and outcome) plus applications to external datasets ACTG175 and ADNI. These performance results are generated independently of the estimators themselves and do not reduce to fitted parameters or self-referential definitions. The doubly robust property of AIPW is invoked as a pre-existing theoretical fact rather than derived here, and no self-citations, ansatzes, or uniqueness theorems from the authors appear as load-bearing steps. The evaluation chain is therefore self-contained against the controlled inputs and external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study relies on standard causal inference assumptions and controlled simulation designs rather than new free parameters or invented entities. Specific simulation settings function as design choices rather than fitted parameters.

axioms (1)

domain assumption No unmeasured confounding and positivity assumptions hold for causal identification in the observational data and simulations
These are invoked implicitly as the foundation for propensity score methods to estimate causal effects in observational studies.

pith-pipeline@v0.9.0 · 5840 in / 1387 out tokens · 41440 ms · 2026-05-21T03:09:39.736115+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

Journal of Epidemiology & Community Health58(4), 265–271 (2004)

Hern´ an, M.A.: A definition of causal effect for epidemiological research. Journal of Epidemiology & Community Health58(4), 265–271 (2004)

work page 2004
[2]

Cambridge University Press, Cambridge (2015)

Imbens, G.W., Rubin, D.B.: Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge (2015)

work page 2015
[3]

Journal of educational Psychology66(5), 688 (1974)

Rubin, D.B.: Estimating causal effects of treatments in randomized and nonran- domized studies. Journal of educational Psychology66(5), 688 (1974)

work page 1974
[4]

essay on principles

Splawa-Neyman, J., Dabrowska, D.M., Speed, T.P.: On the application of proba- bility theory to agricultural experiments. essay on principles. section 9. Statistical Science, 465–472 (1990)

work page 1990
[5]

Biometrics24(2), 295–313 (1968)

Cochran, W.G.: The effectiveness of adjustment by subclassification in removing bias. Biometrics24(2), 295–313 (1968)

work page 1968
[6]

Biometrika70(1), 41–55 (1983)

Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika70(1), 41–55 (1983)

work page 1983
[7]

Review of Economics and Statistics86(1), 4–29 (2004)

Imbens, G.W.: Nonparametric estimation of average treatment effects under exogeneity. Review of Economics and Statistics86(1), 4–29 (2004)

work page 2004
[8]

Journal of Business & Economic Statistics29(1), 1–11 (2011)

Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics29(1), 1–11 (2011)

work page 2011
[9]

Journal of the American Statistical Association99(467), 609–618 (2004)

Hansen, B.B.: Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association99(467), 609–618 (2004)

work page 2004
[10]

Lww (2000)

Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Lww (2000)

work page 2000
[11]

Journal of the American Statistical Association89(427), 846–866 (1994)

Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association89(427), 846–866 (1994)

work page 1994
[12]

Biometrics61(4), 962–973 (2005)

Bang, H., Robins, J.M.: Doubly robust estimation in missing data and causal inference models. Biometrics61(4), 962–973 (2005)

work page 2005
[13]

Springer, New York (2011)

Laan, M.J., Rose, S.: Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York (2011)

work page 2011
[14]

Journal of Biopharmaceutical Statistics29(5), 731–748 (2019)

Wang, C., Li, H., Chen, W.-C., Lu, N., Tiwari, R., Xu, Y., Yue, L.Q.: Propensity score-integrated power prior approach for incorporating real-world evidence in 21 single-arm clinical studies. Journal of Biopharmaceutical Statistics29(5), 731–748 (2019)

work page 2019
[15]

Journal of Biopharmaceutical Statistics32(1), 158–169 (2022)

Lu, N., Wang, C., Chen, W.-C., Li, H., Song, C., Tiwari, R., Xu, Y., Yue, L.Q.: Propensity score-integrated power prior approach for augmenting the control arm of a randomized controlled trial. Journal of Biopharmaceutical Statistics32(1), 158–169 (2022)

work page 2022
[16]

arXiv preprint (2026) arXiv:2601.03480

Das, A.C., Salam, S., Roy, A., Chowdhury, R., Das, A.C., Das, A.C.: Improv- ing operating characteristics of clinical trials by augmenting control arm using propensity score-weighted borrowing-by-parts power prior. arXiv preprint (2026) arXiv:2601.03480

work page arXiv 2026
[17]

Statistics in Biosciences (2026) https://doi.org/10.1007/s12561-026-09513-z

Das, A.C., Gwon, Y., Bonangelino, P.: Propensity score-based borrowing-by-parts power prior for augmenting control arm in clinical trials: A two-stage approach. Statistics in Biosciences (2026) https://doi.org/10.1007/s12561-026-09513-z

work page doi:10.1007/s12561-026-09513-z 2026
[18]

Multivariate Behavioral Research46(3), 399–424 (2011)

Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research46(3), 399–424 (2011)

work page 2011
[19]

Statistical Science22(4), 523–539 (2007)

Kang, J.D., Schafer, J.L.: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science22(4), 523–539 (2007)

work page 2007
[20]

Statistics in Medicine29(3), 337–346 (2010)

Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Statistics in Medicine29(3), 337–346 (2010)

work page 2010
[21]

Statistics in Medicine32(19), 3388–3414 (2013)

McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treat- ments using generalized boosted models. Statistics in Medicine32(19), 3388–3414 (2013)

work page 2013
[22]

Statistics in Medicine31(14), 1572–1581 (2012)

Waernbaum, I.: Model misspecification and robustness in causal inference. Statistics in Medicine31(14), 1572–1581 (2012)

work page 2012
[23]

Political Analysis20(1), 25–46 (2012)

Hainmueller, J.: Entropy balancing for causal effects. Political Analysis20(1), 25–46 (2012)

work page 2012
[24]

Medical Decision Making42(2), 156–167 (2022) https://doi.org/10

Kurz, C.F.: Augmented inverse probability weighting and the double robustness property. Medical Decision Making42(2), 156–167 (2022) https://doi.org/10. 1177/0272989X211027181

work page 2022
[25]

Journal of Nonparametric Statistics37(4), 1317–1340 (2025) https://doi.org/10.1080/10485252.2025.2544936 22

Chen, S., Wu, H., Zhao, H.: A comparison of causal inference methods for eval- uating multiple treatment groups. Journal of Nonparametric Statistics37(4), 1317–1340 (2025) https://doi.org/10.1080/10485252.2025.2544936 22

work page doi:10.1080/10485252.2025.2544936 2025
[26]

Journal of the American statistical Association81(396), 945–960 (1986)

Holland, P.W.: Statistics and causal inference. Journal of the American statistical Association81(396), 945–960 (1986)

work page 1986
[27]

University of Chicago Press Chicago, IL (2011)

Gelman, A.: Causality and statistical learning. University of Chicago Press Chicago, IL (2011)

work page 2011
[28]

Journal of the American statistical Association91(434), 444–455 (1996)

Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. Journal of the American statistical Association91(434), 444–455 (1996)

work page 1996
[29]

The International Journal of Biostatistics7(1), 6 (2011)

Austin, P.C., Laupacis, A.: A tutorial on methods to estimating clinically and policy-meaningful measures of treatment effects in prospective observational studies: a review. The International Journal of Biostatistics7(1), 6 (2011)

work page 2011
[30]

Proceedings of the National Academy of Sciences113(27), 7353–7360 (2016)

Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences113(27), 7353–7360 (2016)

work page 2016
[31]

Journal of Computational and Graphical Statistics20(1), 217–240 (2011)

Hill, J.L.: Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics20(1), 217–240 (2011)

work page 2011
[32]

Political Analysis 14(2), 131–159 (2006)

King, G., Zeng, L.: The dangers of extreme counterfactuals. Political Analysis 14(2), 131–159 (2006)

work page 2006
[33]

Political analysis18(1), 36–56 (2010)

Glynn, A.N., Quinn, K.M.: An introduction to the augmented inverse propensity weighted estimator. Political analysis18(1), 36–56 (2010)

work page 2010
[34]

Econometrica71(4), 1161–1189 (2003)

Hirano, K., Imbens, G.W., Ridder, G.: Efficient estimation of average treat- ment effects using the estimated propensity score. Econometrica71(4), 1161–1189 (2003)

work page 2003
[35]

Springer, ??? (2006)

Tsiatis, A.A.: Semiparametric Theory and Missing Data. Springer, ??? (2006)

work page 2006
[36]

Journal of the American Statistical Association90(429), 106–121 (1995)

Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association90(429), 106–121 (1995)

work page 1995
[37]

New England Journal of Medicine335(15), 1081–1090 (1996)

Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley, R.T., Haubrich, R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu, M.,et al.: A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine335(15), 1081–1090 (1996)

work page 1996
[38]

a narrative review

Kueper, J.K., Speechley, M., Montero-Odasso, M.: The alzheimer’s disease assess- ment scale–cognitive subscale (adas-cog): modifications and responsiveness in pre-dementia populations. a narrative review. Journal of Alzheimer’s Disease 63(2), 423–444 (2018)

work page 2018
[39]

Athey, S., Tibshirani, J., Wager, S.: Generalized random forests (2019) 23

work page 2019
[40]

Oxford University Press Oxford, UK (2018)

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Oxford University Press Oxford, UK (2018)

work page 2018
[41]

Journal of the Royal Statistical Society Series B: Statistical Methodology58(1), 267–288 (1996)

Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology58(1), 267–288 (1996)

work page 1996
[42]

Journal of the Royal Statis- tical Society Series B: Statistical Methodology72(4), 417–473 (2010)

Meinshausen, N., B¨ uhlmann, P.: Stability selection. Journal of the Royal Statis- tical Society Series B: Statistical Methodology72(4), 417–473 (2010)

work page 2010
[43]

Journal of the Royal Statistical Society Series B: Statistical Methodology75(1), 55–80 (2013)

Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology75(1), 55–80 (2013)

work page 2013
[44]

Statistics in Biosciences, 1–29 (2026) 24

Das, A.C., Dai, R., Lokshin, A., Salam, S., Smith, L.: Identifying predictive combinations of biomarkers for early cancer detection with stability selection in combination with ensemble learning. Statistics in Biosciences, 1–29 (2026) 24

work page 2026

[1] [1]

Journal of Epidemiology & Community Health58(4), 265–271 (2004)

Hern´ an, M.A.: A definition of causal effect for epidemiological research. Journal of Epidemiology & Community Health58(4), 265–271 (2004)

work page 2004

[2] [2]

Cambridge University Press, Cambridge (2015)

Imbens, G.W., Rubin, D.B.: Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge (2015)

work page 2015

[3] [3]

Journal of educational Psychology66(5), 688 (1974)

Rubin, D.B.: Estimating causal effects of treatments in randomized and nonran- domized studies. Journal of educational Psychology66(5), 688 (1974)

work page 1974

[4] [4]

essay on principles

Splawa-Neyman, J., Dabrowska, D.M., Speed, T.P.: On the application of proba- bility theory to agricultural experiments. essay on principles. section 9. Statistical Science, 465–472 (1990)

work page 1990

[5] [5]

Biometrics24(2), 295–313 (1968)

Cochran, W.G.: The effectiveness of adjustment by subclassification in removing bias. Biometrics24(2), 295–313 (1968)

work page 1968

[6] [6]

Biometrika70(1), 41–55 (1983)

Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika70(1), 41–55 (1983)

work page 1983

[7] [7]

Review of Economics and Statistics86(1), 4–29 (2004)

Imbens, G.W.: Nonparametric estimation of average treatment effects under exogeneity. Review of Economics and Statistics86(1), 4–29 (2004)

work page 2004

[8] [8]

Journal of Business & Economic Statistics29(1), 1–11 (2011)

Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics29(1), 1–11 (2011)

work page 2011

[9] [9]

Journal of the American Statistical Association99(467), 609–618 (2004)

Hansen, B.B.: Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association99(467), 609–618 (2004)

work page 2004

[10] [10]

Lww (2000)

Robins, J.M., Hernan, M.A., Brumback, B.: Marginal structural models and causal inference in epidemiology. Lww (2000)

work page 2000

[11] [11]

Journal of the American Statistical Association89(427), 846–866 (1994)

Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association89(427), 846–866 (1994)

work page 1994

[12] [12]

Biometrics61(4), 962–973 (2005)

Bang, H., Robins, J.M.: Doubly robust estimation in missing data and causal inference models. Biometrics61(4), 962–973 (2005)

work page 2005

[13] [13]

Springer, New York (2011)

Laan, M.J., Rose, S.: Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York (2011)

work page 2011

[14] [14]

Journal of Biopharmaceutical Statistics29(5), 731–748 (2019)

Wang, C., Li, H., Chen, W.-C., Lu, N., Tiwari, R., Xu, Y., Yue, L.Q.: Propensity score-integrated power prior approach for incorporating real-world evidence in 21 single-arm clinical studies. Journal of Biopharmaceutical Statistics29(5), 731–748 (2019)

work page 2019

[15] [15]

Journal of Biopharmaceutical Statistics32(1), 158–169 (2022)

Lu, N., Wang, C., Chen, W.-C., Li, H., Song, C., Tiwari, R., Xu, Y., Yue, L.Q.: Propensity score-integrated power prior approach for augmenting the control arm of a randomized controlled trial. Journal of Biopharmaceutical Statistics32(1), 158–169 (2022)

work page 2022

[16] [16]

arXiv preprint (2026) arXiv:2601.03480

Das, A.C., Salam, S., Roy, A., Chowdhury, R., Das, A.C., Das, A.C.: Improv- ing operating characteristics of clinical trials by augmenting control arm using propensity score-weighted borrowing-by-parts power prior. arXiv preprint (2026) arXiv:2601.03480

work page arXiv 2026

[17] [17]

Statistics in Biosciences (2026) https://doi.org/10.1007/s12561-026-09513-z

Das, A.C., Gwon, Y., Bonangelino, P.: Propensity score-based borrowing-by-parts power prior for augmenting control arm in clinical trials: A two-stage approach. Statistics in Biosciences (2026) https://doi.org/10.1007/s12561-026-09513-z

work page doi:10.1007/s12561-026-09513-z 2026

[18] [18]

Multivariate Behavioral Research46(3), 399–424 (2011)

Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research46(3), 399–424 (2011)

work page 2011

[19] [19]

Statistical Science22(4), 523–539 (2007)

Kang, J.D., Schafer, J.L.: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science22(4), 523–539 (2007)

work page 2007

[20] [20]

Statistics in Medicine29(3), 337–346 (2010)

Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Statistics in Medicine29(3), 337–346 (2010)

work page 2010

[21] [21]

Statistics in Medicine32(19), 3388–3414 (2013)

McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treat- ments using generalized boosted models. Statistics in Medicine32(19), 3388–3414 (2013)

work page 2013

[22] [22]

Statistics in Medicine31(14), 1572–1581 (2012)

Waernbaum, I.: Model misspecification and robustness in causal inference. Statistics in Medicine31(14), 1572–1581 (2012)

work page 2012

[23] [23]

Political Analysis20(1), 25–46 (2012)

Hainmueller, J.: Entropy balancing for causal effects. Political Analysis20(1), 25–46 (2012)

work page 2012

[24] [24]

Medical Decision Making42(2), 156–167 (2022) https://doi.org/10

Kurz, C.F.: Augmented inverse probability weighting and the double robustness property. Medical Decision Making42(2), 156–167 (2022) https://doi.org/10. 1177/0272989X211027181

work page 2022

[25] [25]

Journal of Nonparametric Statistics37(4), 1317–1340 (2025) https://doi.org/10.1080/10485252.2025.2544936 22

Chen, S., Wu, H., Zhao, H.: A comparison of causal inference methods for eval- uating multiple treatment groups. Journal of Nonparametric Statistics37(4), 1317–1340 (2025) https://doi.org/10.1080/10485252.2025.2544936 22

work page doi:10.1080/10485252.2025.2544936 2025

[26] [26]

Journal of the American statistical Association81(396), 945–960 (1986)

Holland, P.W.: Statistics and causal inference. Journal of the American statistical Association81(396), 945–960 (1986)

work page 1986

[27] [27]

University of Chicago Press Chicago, IL (2011)

Gelman, A.: Causality and statistical learning. University of Chicago Press Chicago, IL (2011)

work page 2011

[28] [28]

Journal of the American statistical Association91(434), 444–455 (1996)

Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. Journal of the American statistical Association91(434), 444–455 (1996)

work page 1996

[29] [29]

The International Journal of Biostatistics7(1), 6 (2011)

Austin, P.C., Laupacis, A.: A tutorial on methods to estimating clinically and policy-meaningful measures of treatment effects in prospective observational studies: a review. The International Journal of Biostatistics7(1), 6 (2011)

work page 2011

[30] [30]

Proceedings of the National Academy of Sciences113(27), 7353–7360 (2016)

Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences113(27), 7353–7360 (2016)

work page 2016

[31] [31]

Journal of Computational and Graphical Statistics20(1), 217–240 (2011)

Hill, J.L.: Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics20(1), 217–240 (2011)

work page 2011

[32] [32]

Political Analysis 14(2), 131–159 (2006)

King, G., Zeng, L.: The dangers of extreme counterfactuals. Political Analysis 14(2), 131–159 (2006)

work page 2006

[33] [33]

Political analysis18(1), 36–56 (2010)

Glynn, A.N., Quinn, K.M.: An introduction to the augmented inverse propensity weighted estimator. Political analysis18(1), 36–56 (2010)

work page 2010

[34] [34]

Econometrica71(4), 1161–1189 (2003)

Hirano, K., Imbens, G.W., Ridder, G.: Efficient estimation of average treat- ment effects using the estimated propensity score. Econometrica71(4), 1161–1189 (2003)

work page 2003

[35] [35]

Springer, ??? (2006)

Tsiatis, A.A.: Semiparametric Theory and Missing Data. Springer, ??? (2006)

work page 2006

[36] [36]

Journal of the American Statistical Association90(429), 106–121 (1995)

Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association90(429), 106–121 (1995)

work page 1995

[37] [37]

New England Journal of Medicine335(15), 1081–1090 (1996)

Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley, R.T., Haubrich, R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu, M.,et al.: A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine335(15), 1081–1090 (1996)

work page 1996

[38] [38]

a narrative review

Kueper, J.K., Speechley, M., Montero-Odasso, M.: The alzheimer’s disease assess- ment scale–cognitive subscale (adas-cog): modifications and responsiveness in pre-dementia populations. a narrative review. Journal of Alzheimer’s Disease 63(2), 423–444 (2018)

work page 2018

[39] [39]

Athey, S., Tibshirani, J., Wager, S.: Generalized random forests (2019) 23

work page 2019

[40] [40]

Oxford University Press Oxford, UK (2018)

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Oxford University Press Oxford, UK (2018)

work page 2018

[41] [41]

Journal of the Royal Statistical Society Series B: Statistical Methodology58(1), 267–288 (1996)

Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology58(1), 267–288 (1996)

work page 1996

[42] [42]

Journal of the Royal Statis- tical Society Series B: Statistical Methodology72(4), 417–473 (2010)

Meinshausen, N., B¨ uhlmann, P.: Stability selection. Journal of the Royal Statis- tical Society Series B: Statistical Methodology72(4), 417–473 (2010)

work page 2010

[43] [43]

Journal of the Royal Statistical Society Series B: Statistical Methodology75(1), 55–80 (2013)

Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology75(1), 55–80 (2013)

work page 2013

[44] [44]

Statistics in Biosciences, 1–29 (2026) 24

Das, A.C., Dai, R., Lokshin, A., Salam, S., Smith, L.: Identifying predictive combinations of biomarkers for early cancer detection with stability selection in combination with ensemble learning. Statistics in Biosciences, 1–29 (2026) 24

work page 2026