A Sensitivity Approach to Causal Inference Under Limited Overlap
Pith reviewed 2026-05-17 05:40 UTC · model grok-4.3
The pith
Worst-case bounds on trimming bias allow sensitivity checks for causal inference with limited overlap.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose a sensitivity framework for causal inference under limited overlap that uses worst-case confidence bounds on the bias introduced by standard trimming practices. Under explicit assumptions for extrapolating counterfactual estimates from regions of overlap to those without, the framework assesses the level of irregularity in the outcome function required to invalidate the primary finding. Empirically, it demonstrates protection against spurious findings by quantifying uncertainty in limited-overlap regions.
What carries the argument
The central mechanism is a sensitivity framework that computes worst-case confidence bounds on trimming-induced bias to evaluate the robustness of causal estimates to limited overlap.
If this is right
- If the framework holds, causal researchers can quantify how sensitive results are to limited overlap through explicit bias bounds.
- The approach lets users specify the outcome irregularity degree needed to overturn conclusions.
- Studies that trim weights can report uncertainty that accounts for extrapolation from overlap regions.
- It encourages more cautious interpretation of causal effects when common support is incomplete.
Where Pith is reading between the lines
- This framework could be adapted to sensitivity analyses for other bias sources in observational studies.
- Its use might encourage better study designs that prioritize sufficient overlap to lessen reliance on extrapolation.
- Domain knowledge about the outcome could be used to tighten the worst-case bounds in applications.
Load-bearing premise
The approach depends on explicit assumptions that permit extrapolating counterfactual estimates from overlap regions to regions without overlap.
What would settle it
A dataset with known true causal effects and controlled limited overlap where the computed sensitivity bounds fail to correctly indicate the irregularity level needed to invalidate the finding.
Figures
read the original abstract
Limited overlap between treated and control groups is a key challenge in observational analysis. Standard approaches like trimming importance weights can reduce variance but introduce a fundamental bias. We propose a sensitivity framework for contextualizing findings under limited overlap, where we assess how irregular the outcome function has to be in order for the main finding to be invalidated. Our approach is based on worst-case confidence bounds on the bias introduced by standard trimming practices, under explicit assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without. Empirically, we demonstrate how our sensitivity framework protects against spurious findings by quantifying uncertainty in regions with limited overlap.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a sensitivity framework for causal inference under limited overlap between treated and control groups. Standard trimming of importance weights reduces variance but introduces bias; the paper derives worst-case confidence bounds on this bias under explicit assumptions that permit extrapolation of counterfactual outcome estimates from overlap to non-overlap regions. The framework quantifies how irregular the outcome function must be to invalidate the main finding and includes empirical demonstrations that the approach protects against spurious conclusions.
Significance. If the derived bounds are valid and the extrapolation assumptions are plausible, the work offers a practical tool for reporting uncertainty attributable to limited overlap rather than discarding data or ignoring extrapolation risk. It strengthens sensitivity analysis in observational causal inference by making the required regularity conditions explicit and linking them directly to bias bounds on trimmed estimators.
major comments (1)
- [§3] §3: The worst-case bounds on trimming bias are obtained by restricting the outcome function to a regularity class (e.g., bounded Lipschitz constant) that enables extrapolation outside the overlap support. The manuscript provides no diagnostic or data-driven procedure to assess whether the observed data are consistent with this class. Violation of the assumed regularity (for instance, a discontinuity at the overlap boundary or an unbounded Lipschitz constant) would mean the reported bias interval no longer contains the true ATE, which directly undermines the central claim that the framework protects against spurious findings.
minor comments (1)
- [Abstract] The abstract states that the framework 'protects against spurious findings' but does not preview the specific datasets or simulation designs used in the empirical section; adding one sentence would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments on our manuscript. The major comment raises an important point about the need for diagnostics to assess the regularity assumptions. We address this directly below and outline planned revisions.
read point-by-point responses
-
Referee: [§3] §3: The worst-case bounds on trimming bias are obtained by restricting the outcome function to a regularity class (e.g., bounded Lipschitz constant) that enables extrapolation outside the overlap support. The manuscript provides no diagnostic or data-driven procedure to assess whether the observed data are consistent with this class. Violation of the assumed regularity (for instance, a discontinuity at the overlap boundary or an unbounded Lipschitz constant) would mean the reported bias interval no longer contains the true ATE, which directly undermines the central claim that the framework protects against spurious findings.
Authors: We thank the referee for highlighting this issue. Our framework is explicitly a sensitivity analysis whose goal is to report the minimal degree of irregularity (e.g., the smallest Lipschitz constant) that would be required to overturn the trimmed estimator's conclusion. The bounds are therefore conditional on the outcome function belonging to the stated regularity class; they are not asserted to contain the true ATE unconditionally. This design directly quantifies how much extrapolation risk is needed to invalidate the finding, allowing domain experts to judge plausibility. We agree that the manuscript would be strengthened by explicit guidance on assessing the assumption. In the revised version we will add a subsection to §3 that (i) discusses visual and quantitative checks for smoothness of the estimated outcome regressions inside the overlap region, (ii) recommends reporting results across a range of regularity parameters, and (iii) clarifies the conditional interpretation of the bounds. These additions will make the scope and limitations of the method more transparent without altering the core technical contribution. revision: yes
Circularity Check
No circularity: derivation rests on explicit stated assumptions rather than self-referential reductions
full rationale
The paper presents a sensitivity framework that derives worst-case confidence bounds on trimming bias under explicitly stated assumptions for extrapolating counterfactuals from overlap to non-overlap regions. No equations, self-citations, or fitted parameters are shown in the abstract or context that reduce the bounds or central claims to the inputs by construction. The approach is self-contained, with the extrapolation assumptions serving as independent premises rather than being smuggled in via prior self-work or redefined as predictions. This is the typical honest non-finding for a sensitivity analysis that foregrounds its modeling assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without
Reference graph
Works this paper leans on
-
[1]
A. Abadie and G. W. Imbens. Large sample properties of matching estimators for average treatment effects.Econometrica, 74(1):235–267, 2006
work page 2006
-
[2]
A. Abadie and G. W. Imbens. Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics, 29(1):1–11, 2011
work page 2011
-
[3]
T. B. Armstrong and M. Koles´ ar. Optimal inference in a class of regression models.Econo- metrica, 86(2):655–683, 2018
work page 2018
-
[4]
T. B. Armstrong and M. Koles´ ar. Finite-sample optimal estimation and inference on average treatment effects under unconfoundedness.Econometrica, 89(3):1141–1177, 2021
work page 2021
- [5]
- [6]
-
[7]
T. T. Cai and M. G. Low. An adaptation theory for nonparametric confidence intervals.The Annals of statistics, 32(5):1805–1840, 2004
work page 2004
-
[8]
V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018
work page 2018
-
[9]
S. R. Cole and M. A. Hern´ an. Constructing Inverse Probability Weights for Marginal Structural Models.American Journal of Epidemiology, 168(6):656–664, 2008
work page 2008
-
[10]
R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Moving the goalposts: Address- ing limited overlap in the estimation of average treatment effects by changing the estimand. Technical report, National Bureau of Economic Research, 2006
work page 2006
-
[11]
R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Dealing with limited overlap in estimation of average treatment effects.Biometrika, 96(1):187–199, 2009
work page 2009
-
[12]
Y. Cui. Individualized Decision-Making Under Partial Identification: Three Perspectives, Two Optimality Results, and One Paradox.Harvard Data Science Review, (3), 2021
work page 2021
-
[13]
R. H. Dehejia and S. Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs.Journal of the American Statistical Association, 94(448): 1053–1062, 1999
work page 1999
-
[14]
D. L. Donoho. Statistical estimation and optimal recovery.Annals of Statistics, 22(1):238–270, 1994
work page 1994
-
[15]
A. D’Amour, P. Ding, A. Feller, L. Lei, and J. Sekhon. Overlap in observational studies with high-dimensional covariates.Journal of Econometrics, 221(2):644–654, 2021
work page 2021
- [16]
-
[17]
R. J. Glynn, M. Lunt, K. J. Rothman, C. Poole, S. Schneeweiss, and T. St¨ urmer. Comparison of alternative approaches to trim subjects in the tails of the propensity score distribution. Pharmacoepidemiology and drug safety, 2019
work page 2019
-
[18]
J. J. Heckman, H. Ichimura, and P. E. Todd. Matching As An Econometric Evaluation Estima- tor: Evidence from Evaluating a Job Training Programme.The Review of Economic Studies, 64(4):605–654, 10 1997
work page 1997
-
[19]
P. Heiler and E. Kazak. Valid inference for treatment effect parameters under irregular iden- tification and many extreme propensity scores.Journal of Econometrics, 222(2), 2021
work page 2021
-
[20]
H. Hong, M. P. Leung, and J. Li. Inference on finite-population treatment effects under limited overlap.The Econometrics Journal, 23(1):32–47, 2019
work page 2019
-
[21]
S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 2021
work page 2021
-
[22]
Z. Hussain, M. Oberst, M.-C. Shih, and D. Sontag. Falsification before extrapolation in causal effect estimation. InProceedings of the 36th International Conference on Neural Information 22 Processing Systems, 2022. ISBN 9781713871088
work page 2022
-
[23]
Z. Hussain, M.-C. Shih, M. Oberst, I. Demirel, and D. Sontag. Falsification of internal and external validity in observational studies via conditional moment restrictions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 5869–5898, 2023
work page 2023
-
[24]
G. Imbens. Nonparametric estimation of average treatment effects under exogeneity: a review. The Review of Economics and Statistics, 86(1):4–29, 2004
work page 2004
-
[25]
G. W. Imbens and C. F. Manski. Confidence intervals for partially identified parameters. Econometrica, 72(6):1845–1857, 2004
work page 2004
-
[26]
C. Ju, J. Schwab, and M. J. van der Laan. On adaptive propensity score truncation in causal inference.Statistical methods in medical research, 2019
work page 2019
-
[27]
A. B. Juditsky and A. S. Nemirovski. Nonparametric estimation by convex programming.The Annals of Statistics, 37:2278 – 2300, 2009
work page 2009
-
[28]
N. Kallus. Generalized optimal matching methods for causal inference.Journal of Machine Learning Research, 21(62):1–54, 2020
work page 2020
-
[29]
N. Kallus and A. Zhou. Minimax-optimal policy learning under unobserved confounding. Management Science, 67(5):2870–2890, 2021
work page 2021
-
[30]
J. D. Y. Kang and J. L. Schafer. Demystifying Double Robustness: A Comparison of Alterna- tive Strategies for Estimating a Population Mean from Incomplete Data.Statistical Science, 22(4):523 – 539, 2007
work page 2007
-
[31]
S. Khan and D. Nekipelov. On uniform inference in nonlinear models with endogeneity.Journal of Econometrics, 240(2):105261, 2024
work page 2024
-
[32]
S. Khan and E. Tamer. Irregular identification, support conditions, and inverse weight esti- mation.Econometrica, 78(6):2021–2042, 2010
work page 2021
-
[33]
S. Khan, M. Saveski, and J. Ugander. Off-policy evaluation beyond overlap: Sharp partial iden- tification under smoothness. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 23734–23757, 2024
work page 2024
-
[34]
R. Koenker and Y. Bilias. Quantile regression for duration data: A reappraisal of the penn- sylvania reemployment bonus experiments. InEconomic applications of quantile regression, pages 199–220. Springer, 2002
work page 2002
-
[35]
R. J. LaLonde. Evaluating the econometric evaluations of training programs with experimental data.The American Economic Review, 76(4):604–620, 1986
work page 1986
-
[36]
O. Landgren, D. S. Siegel, D. Auclair, A. Chari, M. Boedigheimer, T. Welliver, K. Mezzi, K. Iskander, and A. Jakubowiak. Carfilzomib-lenalidomide-dexamethasone versus bortezomib- lenalidomide-dexamethasone in patients with newly diagnosed multiple myeloma: results from the prospective, longitudinal, observational commpass study.Blood, 132:799, 2018
work page 2018
-
[37]
B. K. Lee, J. Lessler, and E. A. Stuart. Weight trimming and propensity score weighting. 23 PLOS ONE, 6(3):1–6, 03 2011
work page 2011
-
[38]
B. Li, K. Ren, L. Shen, P. Hou, Z. Su, A. Di Bacco, J.-L. Hong, A. Galaznik, A. B. Dash, V. Crossland, P. Dolin, and S. Szalma. Comparing bortezomib-lenalidomide-dexamethasone (vrd) with carfilzomib-lenalidomide-dexamethasone (krd) in the patients with newly diagnosed multiple myeloma (ndmm) in two observational studies.Blood, 132:3298, 2018
work page 2018
-
[39]
F. Li, L. E. Thomas, and F. Li. Addressing Extreme Propensity Scores via the Overlap Weights.American Journal of Epidemiology, 188(1):250–257, 2018
work page 2018
- [40]
-
[41]
Y. Ma, P. H. Sant’Anna, Y. Sasaki, and T. Ura. Doubly robust estimators with weak overlap. arXiv:2304.08974 [stat.ME], 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990
work page 1990
-
[43]
NIH. Relating clinical outcomes in multiple myeloma to personal assessment of genetic profile(com-mpass), 2016. URLhttps://clinicaltrials.gov/study/NCT01454297
work page 2016
-
[44]
M. L. Petersen, K. E. Porter, S. Gruber, Y. Wang, and M. J. van der Laan. Diagnosing and responding to violations in the positivity assumption.Statistical Methods in Medical Research, 21(1):31–54, 2012
work page 2012
-
[45]
P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983
work page 1983
-
[46]
C. Rothe. Robust confidence intervals for average treatment effects under limited overlap. Econometrica, 85(2):645–660, 2017
work page 2017
-
[47]
Y. Sasaki and T. Ura. Estimation and inference for moments of ratios with robustness against large trimming bias.Econometric Theory, 38(1):66–112, 2022
work page 2022
-
[48]
M. H. Schneider and S. A. Zenios. A comparative study of algorithms for matrix balancing. Operations Research, 38(3), 1990
work page 1990
-
[49]
J. A. Smith and P. E. Todd. Does matching overcome LaLonde’s critique of nonexperimental estimators?Journal of Econometrics, 125:305–353, 2005
work page 2005
-
[50]
J. Stoye. More on confidence intervals for partially identified parameters.Econometrica, 77 (4):1299–1315, 2009
work page 2009
-
[51]
T. St¨ urmer, K. J. Rothman, J. Avorn, and R. J. Glynn. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study.American Journal of Epidemiology, 172(7):843–854, 2010
work page 2010
-
[52]
S. Yang and P. Ding. Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores.Biometrika, 105(2):487–493, 2018. 24 A Proof of Lemma 1 In this section, we closely follow the notation in [4]. We letn 1 =| {i:z i = 1} |andn 0 = | {i:z i = 0} |denote the total number of samples withz= 1 andz= 0. With slight ...
work page 2018
-
[53]
Ifq∈W +, then we haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q
-
[54]
Ifq∈W −,η q ≤η i∗, i.e.,i ∗ is in the middle ofpandq, then we also haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q
-
[55]
Ifq∈W −,η q > η i∗, in this case,q̸∈ Iimpliesf q,1 ≤f i∗,1 ≤f p,1 so |gp,1 −g q,1|= fi∗,1 −f q,1 ≤ fp,1 −f q,1 ≤d p,q. •Ifp̸∈ Iandq∈ I, this is similar to the previous case. Thereforegis feasible and optimal. In summary, we have constructed a solutiongwith i < j≤i ∗ + 1 =⇒η i ≤η j andg i,1 ≥g j,1, Proceeding in the same way as above, we can remove all tro...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.