A Sensitivity Approach to Causal Inference Under Limited Overlap

Hongseok Namkoong; Yian Huang; Yuanzhe Ma

arxiv: 2511.22003 · v2 · submitted 2025-11-27 · 📊 stat.ML · cs.LG· stat.ME

A Sensitivity Approach to Causal Inference Under Limited Overlap

Yuanzhe Ma , Yian Huang , Hongseok Namkoong This is my paper

Pith reviewed 2026-05-17 05:40 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords causal inferencelimited overlapsensitivity analysistrimming biasimportance weightsobservational studiescounterfactual estimation

0 comments

The pith

Worst-case bounds on trimming bias allow sensitivity checks for causal inference with limited overlap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a sensitivity approach for observational causal inference when treated and control groups exhibit limited overlap. Trimming importance weights reduces variance but introduces bias, and the authors supply worst-case confidence bounds on that bias. This lets users determine how irregular the outcome function must be before the main causal finding is overturned. The method rests on explicit assumptions that support extrapolating counterfactual estimates from overlap regions to non-overlap regions. By quantifying uncertainty in low-overlap areas, the framework helps guard against spurious conclusions drawn from insufficient common support.

Core claim

The authors propose a sensitivity framework for causal inference under limited overlap that uses worst-case confidence bounds on the bias introduced by standard trimming practices. Under explicit assumptions for extrapolating counterfactual estimates from regions of overlap to those without, the framework assesses the level of irregularity in the outcome function required to invalidate the primary finding. Empirically, it demonstrates protection against spurious findings by quantifying uncertainty in limited-overlap regions.

What carries the argument

The central mechanism is a sensitivity framework that computes worst-case confidence bounds on trimming-induced bias to evaluate the robustness of causal estimates to limited overlap.

If this is right

If the framework holds, causal researchers can quantify how sensitive results are to limited overlap through explicit bias bounds.
The approach lets users specify the outcome irregularity degree needed to overturn conclusions.
Studies that trim weights can report uncertainty that accounts for extrapolation from overlap regions.
It encourages more cautious interpretation of causal effects when common support is incomplete.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framework could be adapted to sensitivity analyses for other bias sources in observational studies.
Its use might encourage better study designs that prioritize sufficient overlap to lessen reliance on extrapolation.
Domain knowledge about the outcome could be used to tighten the worst-case bounds in applications.

Load-bearing premise

The approach depends on explicit assumptions that permit extrapolating counterfactual estimates from overlap regions to regions without overlap.

What would settle it

A dataset with known true causal effects and controlled limited overlap where the computed sensitivity bounds fail to correctly indicate the irregularity level needed to invalidate the finding.

Figures

Figures reproduced from arXiv: 2511.22003 by Hongseok Namkoong, Yian Huang, Yuanzhe Ma.

**Figure 1.** Figure 1: Left: data-generation process used in the simulation setup where π(x) = P(Z = 1 | X = x) denotes the propensity score, q(x) = min {π(x), 1 − π(x)} measures whether a point has sufficient overlap, and f(x, z) represents the potential outcome for a unit with covariates x under treatment assignment z ∈ {0, 1}. The individual treatment effect is defined as τ (x) = f(x, 1) − f(x, 0). Right: Visualization of one… view at source ↗

**Figure 2.** Figure 2: Confidence intervals from AIPW (left) and its trimmed variant AIPWpartial (right) across different overlap levels, with the dotted red line representing the true estimand value; higher values on the x-axis mean more limited overlap. Left: AIPW yields very wide confidence intervals. Right: We follow standard heuristics to truncate data in a way such that AIPWpartial’s confidence interval has the smallest le… view at source ↗

**Figure 3.** Figure 3: Visualization of our method. In the overlap region, we use typical asymptotic confidence intervals. In the non-overlap region, we use the minimax approach to extrapolate from the overlap region. Our method allows the analyst to analyze the potential bias caused by ignoring samples with extreme propensity scores and see how this depends on the extrapolability of data from the non-overlap region to the overl… view at source ↗

**Figure 4.** Figure 4: For each point in the non-overlap region, we list the set of treated points from the overlap region used in its extrapolation. For example, the leftmost point i in the non-overlap region uses points 1, 2, and 3 for extrapolation. This means that point pairs (i, 1),(i, 2),(i, 3) have binding Lipschitz constraints for the program that defines the minimax estimator ˆτδFLCI (w) (2.5). as a measure of overlap f… view at source ↗

**Figure 5.** Figure 5: Left: There are 2(k + 1)n samples in total and the middle region in pink is the overlap region. See Appendix C for details. Right: RMSE of the estimator ˆτδ(w) vs δ with n = 25, k = 10, L = 1, η = 0.1, ξ = 0.01. Matching interpretation We start by interpreting the minimax estimator as a nearest-neighbor estimator. Lemma 1. Let µ ≥ 0 and Λ ≥ 0 be the optimal dual variables corresponding to the Lipschitz con… view at source ↗

**Figure 6.** Figure 6: Left: Bias, variance, and the length of the confidence interval (2.6) G(δ) = cvα bias(ˆτδ(w)) sd(ˆτδ(w)) · sd(ˆτδ(w)). Right: RMSE as we vary the distance parameter η; for each η, we compute the optimal (lowest) RMSE with respect to δ ≥ 0. and for all i with wi = 0, let ηi = minj {|xi − xj | | wj > 0} denote its distance to the region with positive weights. The following result shows that extrapolation… view at source ↗

**Figure 7.** Figure 7: Contextualization of the Lipschitz constant L˜Z,p (3.1) vs p. Left: Example 1. Right: PennUI dataset in Section 4 3 Sensitivity analysis We use the MP framework to facilitate a diagnostic analysis assessing how different levels of assumed smoothness affect the estimate of bias due to trimming. Our framework asks how strong the extrapolation assumption must be for the induced bias of the asymptotic estimat… view at source ↗

**Figure 8.** Figure 8: Data comes from Example 1. On the left, we have confidence intervals generated by MPϵ at different values of L with ϵ = 0.01. We see how increasing the value of L covers the estimand τ− and this enables the sensitivity analysis. On the right, we have results for M. As we can see, the confidence intervals M are much wider than MP. knowledge and can be potentially transferred to other datasets. In [PITH_FUL… view at source ↗

**Figure 9.** Figure 9: We plot MPϵ with a fixed Lipschitz constant L = 14 with different values of non-overlap parameter η. In this plot, red points represent the non-overlap ATE at each value of the non-overlap parameter. Blue points/intervals represent the point estimate/confidence interval of MP. At every point, we assume the truncation threshold is chosen to minimize the length of the AIPWpartial interval. This shows the MP … view at source ↗

**Figure 10.** Figure 10: Probability of coverage and half length for L = 14, as we vary the non-overlap parameter η (D.1). The truncation threshold is always chosen to be the one such that AIPWpartial has the smallest length. Our method MP helps analyze the unreliability of AIPWpartial. estimate the impact of financial incentives on reemployment [34]. Preliminaries In the PennUI dataset, approximately 15,000 eligible claimants we… view at source ↗

**Figure 11.** Figure 11: Failure of AIPWpartial on observational data simulated from the PennUI dataset. Similar to [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Real data results: confidence intervals for MPϵ ⋆ (left) and MPcombine (right) at different L for η = 0.01. The red dotted points are τ− and τ respectively. The data is the same as in [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Probability of coverage and half length for MP and AIPWpartial with L = 0.32, as we vary the non-overlap parameter η. Results are averaged over 100 runs. The larger the non-overlap parameter, the larger the non-overlap region is. The truncation threshold is always chosen to be the one such that AIPWpartial has the smallest length. We can see how unreliable AIPWpartial is and how our method MP can help mit… view at source ↗

**Figure 14.** Figure 14: Simulation data collection setup narios tested, MP achieves full (100%) coverage of the non-overlap region estimand τ−. Similarly, MPcombine consistently achieves valid coverage for the full ATE τ , while remaining considerably narrower than the fully conservative interval generated by the naive minimax method M. This illustrates a key strength of our framework: it balances the need for robust inference i… view at source ↗

**Figure 15.** Figure 15: Left/right: after Option 1/Option 2, the distribution of π(X). Option 1 is to collect data if x ∈ X2 = (0.4, 0.6) and Option 2 is to collect data if x ∈ X2 = (0, 0.1) ∪ (0.9, 1). Option 1 appears better from the propensity score perspective (after sampling, the propensity score is 0.03 for Option 1 vs 0.01 for Option 2). As we can see later, Option 2 is the better option. The simulation setup for this set… view at source ↗

**Figure 16.** Figure 16: A confidence sequence generated during continual sampling (x ∈ (0.40, 0.47), (0.47, 0.53), (0.53, 0.60), (0, 0.03), (0.03, 0.07), (0.07, 0.10)) in the non-overlap region with a fixed truncation threshold and L = 5.48. From [PITH_FULL_IMAGE:figures/full_fig_p039_16.png] view at source ↗

read the original abstract

Limited overlap between treated and control groups is a key challenge in observational analysis. Standard approaches like trimming importance weights can reduce variance but introduce a fundamental bias. We propose a sensitivity framework for contextualizing findings under limited overlap, where we assess how irregular the outcome function has to be in order for the main finding to be invalidated. Our approach is based on worst-case confidence bounds on the bias introduced by standard trimming practices, under explicit assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without. Empirically, we demonstrate how our sensitivity framework protects against spurious findings by quantifying uncertainty in regions with limited overlap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a sensitivity framework for trimming bias in limited-overlap causal inference via worst-case bounds, but the bounds depend on untested regularity assumptions for extrapolation outside the overlap region.

read the letter

The main thing here is a sensitivity framework that puts worst-case confidence bounds on the bias from standard trimming when overlap is limited. It frames the problem as asking how irregular the outcome function has to be before the main finding gets overturned, under explicit assumptions that allow extrapolation of counterfactuals from the overlap region to the rest of the support. That is a direct response to a frequent practical headache in observational work, and the abstract indicates they demonstrate it empirically as a way to quantify uncertainty and guard against spurious results. The approach builds on existing trimming methods by adding this specific robustness layer rather than just reporting trimmed estimates alone. The citation pattern looks standard and does not appear to overclaim prior results from the abstract alone. The central soft spot is the dependence on those extrapolation assumptions. If the true conditional expectation is more irregular than the assumed class, for example with a discontinuity at the overlap boundary or a larger deviation than the bound allows, the derived intervals no longer contain the true ATE. The abstract gives no indication of diagnostics or data-driven checks that would let a user assess whether the regularity class is plausible in a given dataset, so the sensitivity parameter could absorb both trimming bias and model misspecification. This is a real limitation but not necessarily fatal if the full paper supplies tight examples or ways to interpret the parameter. The work is aimed at applied researchers in causal inference who trim propensity weights and want a structured way to report sensitivity to overlap problems. A reader who already works with trimmed estimators would get concrete value from seeing how the bounds behave. It deserves a serious referee to examine the derivations, any simulations, and whether the assumptions can be made operational in practice.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a sensitivity framework for causal inference under limited overlap between treated and control groups. Standard trimming of importance weights reduces variance but introduces bias; the paper derives worst-case confidence bounds on this bias under explicit assumptions that permit extrapolation of counterfactual outcome estimates from overlap to non-overlap regions. The framework quantifies how irregular the outcome function must be to invalidate the main finding and includes empirical demonstrations that the approach protects against spurious conclusions.

Significance. If the derived bounds are valid and the extrapolation assumptions are plausible, the work offers a practical tool for reporting uncertainty attributable to limited overlap rather than discarding data or ignoring extrapolation risk. It strengthens sensitivity analysis in observational causal inference by making the required regularity conditions explicit and linking them directly to bias bounds on trimmed estimators.

major comments (1)

[§3] §3: The worst-case bounds on trimming bias are obtained by restricting the outcome function to a regularity class (e.g., bounded Lipschitz constant) that enables extrapolation outside the overlap support. The manuscript provides no diagnostic or data-driven procedure to assess whether the observed data are consistent with this class. Violation of the assumed regularity (for instance, a discontinuity at the overlap boundary or an unbounded Lipschitz constant) would mean the reported bias interval no longer contains the true ATE, which directly undermines the central claim that the framework protects against spurious findings.

minor comments (1)

[Abstract] The abstract states that the framework 'protects against spurious findings' but does not preview the specific datasets or simulation designs used in the empirical section; adding one sentence would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. The major comment raises an important point about the need for diagnostics to assess the regularity assumptions. We address this directly below and outline planned revisions.

read point-by-point responses

Referee: [§3] §3: The worst-case bounds on trimming bias are obtained by restricting the outcome function to a regularity class (e.g., bounded Lipschitz constant) that enables extrapolation outside the overlap support. The manuscript provides no diagnostic or data-driven procedure to assess whether the observed data are consistent with this class. Violation of the assumed regularity (for instance, a discontinuity at the overlap boundary or an unbounded Lipschitz constant) would mean the reported bias interval no longer contains the true ATE, which directly undermines the central claim that the framework protects against spurious findings.

Authors: We thank the referee for highlighting this issue. Our framework is explicitly a sensitivity analysis whose goal is to report the minimal degree of irregularity (e.g., the smallest Lipschitz constant) that would be required to overturn the trimmed estimator's conclusion. The bounds are therefore conditional on the outcome function belonging to the stated regularity class; they are not asserted to contain the true ATE unconditionally. This design directly quantifies how much extrapolation risk is needed to invalidate the finding, allowing domain experts to judge plausibility. We agree that the manuscript would be strengthened by explicit guidance on assessing the assumption. In the revised version we will add a subsection to §3 that (i) discusses visual and quantitative checks for smoothness of the estimated outcome regressions inside the overlap region, (ii) recommends reporting results across a range of regularity parameters, and (iii) clarifies the conditional interpretation of the bounds. These additions will make the scope and limitations of the method more transparent without altering the core technical contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation rests on explicit stated assumptions rather than self-referential reductions

full rationale

The paper presents a sensitivity framework that derives worst-case confidence bounds on trimming bias under explicitly stated assumptions for extrapolating counterfactuals from overlap to non-overlap regions. No equations, self-citations, or fitted parameters are shown in the abstract or context that reduce the bounds or central claims to the inputs by construction. The approach is self-contained, with the extrapolation assumptions serving as independent premises rather than being smuggled in via prior self-work or redefined as predictions. This is the typical honest non-finding for a sensitivity analysis that foregrounds its modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on domain assumptions for counterfactual extrapolation outside overlap regions and worst-case analysis of outcome irregularity; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without
Explicitly invoked to justify the worst-case confidence bounds on trimming bias.

pith-pipeline@v0.9.0 · 5400 in / 1120 out tokens · 47221 ms · 2026-05-17T05:40:03.626837+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Abadie and G

A. Abadie and G. W. Imbens. Large sample properties of matching estimators for average treatment effects.Econometrica, 74(1):235–267, 2006

work page 2006
[2]

Abadie and G

A. Abadie and G. W. Imbens. Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics, 29(1):1–11, 2011

work page 2011
[3]

T. B. Armstrong and M. Koles´ ar. Optimal inference in a class of regression models.Econo- metrica, 86(2):655–683, 2018

work page 2018
[4]

T. B. Armstrong and M. Koles´ ar. Finite-sample optimal estimation and inference on average treatment effects under unconfoundedness.Econometrica, 89(3):1141–1177, 2021

work page 2021
[5]

Beliakov

G. Beliakov. Interpolation of Lipschitz functions.Journal of Computational and Applied Mathematics, 196(1):20–44, 2006

work page 2006
[6]

Busso, J

M. Busso, J. DiNardo, and J. McCrary. New evidence on the finite sample properties of propen- sity score reweighting and matching estimators.The Review of Economics and Statistics, 96 (5):885–897, 2014. 21

work page 2014
[7]

T. T. Cai and M. G. Low. An adaptation theory for nonparametric confidence intervals.The Annals of statistics, 32(5):1805–1840, 2004

work page 2004
[8]

Chernozhukov, D

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

work page 2018
[9]

S. R. Cole and M. A. Hern´ an. Constructing Inverse Probability Weights for Marginal Structural Models.American Journal of Epidemiology, 168(6):656–664, 2008

work page 2008
[10]

R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Moving the goalposts: Address- ing limited overlap in the estimation of average treatment effects by changing the estimand. Technical report, National Bureau of Economic Research, 2006

work page 2006
[11]

R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Dealing with limited overlap in estimation of average treatment effects.Biometrika, 96(1):187–199, 2009

work page 2009
[12]

Y. Cui. Individualized Decision-Making Under Partial Identification: Three Perspectives, Two Optimality Results, and One Paradox.Harvard Data Science Review, (3), 2021

work page 2021
[13]

R. H. Dehejia and S. Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs.Journal of the American Statistical Association, 94(448): 1053–1062, 1999

work page 1999
[14]

D. L. Donoho. Statistical estimation and optimal recovery.Annals of Statistics, 22(1):238–270, 1994

work page 1994
[15]

D’Amour, P

A. D’Amour, P. Ding, A. Feller, L. Lei, and J. Sekhon. Overlap in observational studies with high-dimensional covariates.Journal of Econometrics, 221(2):644–654, 2021

work page 2021
[16]

Fr¨ olich

M. Fr¨ olich. Finite-sample properties of propensity-score matching and weighting estimators. Review of Economics and Statistics, 86(1):77–90, 2004

work page 2004
[17]

R. J. Glynn, M. Lunt, K. J. Rothman, C. Poole, S. Schneeweiss, and T. St¨ urmer. Comparison of alternative approaches to trim subjects in the tails of the propensity score distribution. Pharmacoepidemiology and drug safety, 2019

work page 2019
[18]

J. J. Heckman, H. Ichimura, and P. E. Todd. Matching As An Econometric Evaluation Estima- tor: Evidence from Evaluating a Job Training Programme.The Review of Economic Studies, 64(4):605–654, 10 1997

work page 1997
[19]

Heiler and E

P. Heiler and E. Kazak. Valid inference for treatment effect parameters under irregular iden- tification and many extreme propensity scores.Journal of Econometrics, 222(2), 2021

work page 2021
[20]

H. Hong, M. P. Leung, and J. Li. Inference on finite-population treatment effects under limited overlap.The Econometrics Journal, 23(1):32–47, 2019

work page 2019
[21]

S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 2021

work page 2021
[22]

Hussain, M

Z. Hussain, M. Oberst, M.-C. Shih, and D. Sontag. Falsification before extrapolation in causal effect estimation. InProceedings of the 36th International Conference on Neural Information 22 Processing Systems, 2022. ISBN 9781713871088

work page 2022
[23]

Hussain, M.-C

Z. Hussain, M.-C. Shih, M. Oberst, I. Demirel, and D. Sontag. Falsification of internal and external validity in observational studies via conditional moment restrictions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 5869–5898, 2023

work page 2023
[24]

G. Imbens. Nonparametric estimation of average treatment effects under exogeneity: a review. The Review of Economics and Statistics, 86(1):4–29, 2004

work page 2004
[25]

G. W. Imbens and C. F. Manski. Confidence intervals for partially identified parameters. Econometrica, 72(6):1845–1857, 2004

work page 2004
[26]

C. Ju, J. Schwab, and M. J. van der Laan. On adaptive propensity score truncation in causal inference.Statistical methods in medical research, 2019

work page 2019
[27]

A. B. Juditsky and A. S. Nemirovski. Nonparametric estimation by convex programming.The Annals of Statistics, 37:2278 – 2300, 2009

work page 2009
[28]

N. Kallus. Generalized optimal matching methods for causal inference.Journal of Machine Learning Research, 21(62):1–54, 2020

work page 2020
[29]

Kallus and A

N. Kallus and A. Zhou. Minimax-optimal policy learning under unobserved confounding. Management Science, 67(5):2870–2890, 2021

work page 2021
[30]

J. D. Y. Kang and J. L. Schafer. Demystifying Double Robustness: A Comparison of Alterna- tive Strategies for Estimating a Population Mean from Incomplete Data.Statistical Science, 22(4):523 – 539, 2007

work page 2007
[31]

Khan and D

S. Khan and D. Nekipelov. On uniform inference in nonlinear models with endogeneity.Journal of Econometrics, 240(2):105261, 2024

work page 2024
[32]

Khan and E

S. Khan and E. Tamer. Irregular identification, support conditions, and inverse weight esti- mation.Econometrica, 78(6):2021–2042, 2010

work page 2021
[33]

S. Khan, M. Saveski, and J. Ugander. Off-policy evaluation beyond overlap: Sharp partial iden- tification under smoothness. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 23734–23757, 2024

work page 2024
[34]

Koenker and Y

R. Koenker and Y. Bilias. Quantile regression for duration data: A reappraisal of the penn- sylvania reemployment bonus experiments. InEconomic applications of quantile regression, pages 199–220. Springer, 2002

work page 2002
[35]

R. J. LaLonde. Evaluating the econometric evaluations of training programs with experimental data.The American Economic Review, 76(4):604–620, 1986

work page 1986
[36]

Landgren, D

O. Landgren, D. S. Siegel, D. Auclair, A. Chari, M. Boedigheimer, T. Welliver, K. Mezzi, K. Iskander, and A. Jakubowiak. Carfilzomib-lenalidomide-dexamethasone versus bortezomib- lenalidomide-dexamethasone in patients with newly diagnosed multiple myeloma: results from the prospective, longitudinal, observational commpass study.Blood, 132:799, 2018

work page 2018
[37]

B. K. Lee, J. Lessler, and E. A. Stuart. Weight trimming and propensity score weighting. 23 PLOS ONE, 6(3):1–6, 03 2011

work page 2011
[38]

B. Li, K. Ren, L. Shen, P. Hou, Z. Su, A. Di Bacco, J.-L. Hong, A. Galaznik, A. B. Dash, V. Crossland, P. Dolin, and S. Szalma. Comparing bortezomib-lenalidomide-dexamethasone (vrd) with carfilzomib-lenalidomide-dexamethasone (krd) in the patients with newly diagnosed multiple myeloma (ndmm) in two observational studies.Blood, 132:3298, 2018

work page 2018
[39]

F. Li, L. E. Thomas, and F. Li. Addressing Extreme Propensity Scores via the Overlap Weights.American Journal of Epidemiology, 188(1):250–257, 2018

work page 2018
[40]

Ma and J

X. Ma and J. Wang. Robust inference using inverse probability weighting.Journal of the American Statistical Association, 115(532):1851–1860, 2020

work page 2020
[41]

Y. Ma, P. H. Sant’Anna, Y. Sasaki, and T. Ura. Doubly robust estimators with weak overlap. arXiv:2304.08974 [stat.ME], 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990

work page 1990
[43]

Relating clinical outcomes in multiple myeloma to personal assessment of genetic profile(com-mpass), 2016

NIH. Relating clinical outcomes in multiple myeloma to personal assessment of genetic profile(com-mpass), 2016. URLhttps://clinicaltrials.gov/study/NCT01454297

work page 2016
[44]

M. L. Petersen, K. E. Porter, S. Gruber, Y. Wang, and M. J. van der Laan. Diagnosing and responding to violations in the positivity assumption.Statistical Methods in Medical Research, 21(1):31–54, 2012

work page 2012
[45]

P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

work page 1983
[46]

C. Rothe. Robust confidence intervals for average treatment effects under limited overlap. Econometrica, 85(2):645–660, 2017

work page 2017
[47]

Sasaki and T

Y. Sasaki and T. Ura. Estimation and inference for moments of ratios with robustness against large trimming bias.Econometric Theory, 38(1):66–112, 2022

work page 2022
[48]

M. H. Schneider and S. A. Zenios. A comparative study of algorithms for matrix balancing. Operations Research, 38(3), 1990

work page 1990
[49]

J. A. Smith and P. E. Todd. Does matching overcome LaLonde’s critique of nonexperimental estimators?Journal of Econometrics, 125:305–353, 2005

work page 2005
[50]

J. Stoye. More on confidence intervals for partially identified parameters.Econometrica, 77 (4):1299–1315, 2009

work page 2009
[51]

St¨ urmer, K

T. St¨ urmer, K. J. Rothman, J. Avorn, and R. J. Glynn. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study.American Journal of Epidemiology, 172(7):843–854, 2010

work page 2010
[52]

Yang and P

S. Yang and P. Ding. Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores.Biometrika, 105(2):487–493, 2018. 24 A Proof of Lemma 1 In this section, we closely follow the notation in [4]. We letn 1 =| {i:z i = 1} |andn 0 = | {i:z i = 0} |denote the total number of samples withz= 1 andz= 0. With slight ...

work page 2018
[53]

Ifq∈W +, then we haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q

work page
[54]

Ifq∈W −,η q ≤η i∗, i.e.,i ∗ is in the middle ofpandq, then we also haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q

work page
[55]

left” and “right

Ifq∈W −,η q > η i∗, in this case,q̸∈ Iimpliesf q,1 ≤f i∗,1 ≤f p,1 so |gp,1 −g q,1|= fi∗,1 −f q,1 ≤ fp,1 −f q,1 ≤d p,q. •Ifp̸∈ Iandq∈ I, this is similar to the previous case. Thereforegis feasible and optimal. In summary, we have constructed a solutiongwith i < j≤i ∗ + 1 =⇒η i ≤η j andg i,1 ≥g j,1, Proceeding in the same way as above, we can remove all tro...

work page

[1] [1]

Abadie and G

A. Abadie and G. W. Imbens. Large sample properties of matching estimators for average treatment effects.Econometrica, 74(1):235–267, 2006

work page 2006

[2] [2]

Abadie and G

A. Abadie and G. W. Imbens. Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics, 29(1):1–11, 2011

work page 2011

[3] [3]

T. B. Armstrong and M. Koles´ ar. Optimal inference in a class of regression models.Econo- metrica, 86(2):655–683, 2018

work page 2018

[4] [4]

T. B. Armstrong and M. Koles´ ar. Finite-sample optimal estimation and inference on average treatment effects under unconfoundedness.Econometrica, 89(3):1141–1177, 2021

work page 2021

[5] [5]

Beliakov

G. Beliakov. Interpolation of Lipschitz functions.Journal of Computational and Applied Mathematics, 196(1):20–44, 2006

work page 2006

[6] [6]

Busso, J

M. Busso, J. DiNardo, and J. McCrary. New evidence on the finite sample properties of propen- sity score reweighting and matching estimators.The Review of Economics and Statistics, 96 (5):885–897, 2014. 21

work page 2014

[7] [7]

T. T. Cai and M. G. Low. An adaptation theory for nonparametric confidence intervals.The Annals of statistics, 32(5):1805–1840, 2004

work page 2004

[8] [8]

Chernozhukov, D

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

work page 2018

[9] [9]

S. R. Cole and M. A. Hern´ an. Constructing Inverse Probability Weights for Marginal Structural Models.American Journal of Epidemiology, 168(6):656–664, 2008

work page 2008

[10] [10]

R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Moving the goalposts: Address- ing limited overlap in the estimation of average treatment effects by changing the estimand. Technical report, National Bureau of Economic Research, 2006

work page 2006

[11] [11]

R. K. Crump, V. J. Hotz, G. W. Imbens, and O. A. Mitnik. Dealing with limited overlap in estimation of average treatment effects.Biometrika, 96(1):187–199, 2009

work page 2009

[12] [12]

Y. Cui. Individualized Decision-Making Under Partial Identification: Three Perspectives, Two Optimality Results, and One Paradox.Harvard Data Science Review, (3), 2021

work page 2021

[13] [13]

R. H. Dehejia and S. Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs.Journal of the American Statistical Association, 94(448): 1053–1062, 1999

work page 1999

[14] [14]

D. L. Donoho. Statistical estimation and optimal recovery.Annals of Statistics, 22(1):238–270, 1994

work page 1994

[15] [15]

D’Amour, P

A. D’Amour, P. Ding, A. Feller, L. Lei, and J. Sekhon. Overlap in observational studies with high-dimensional covariates.Journal of Econometrics, 221(2):644–654, 2021

work page 2021

[16] [16]

Fr¨ olich

M. Fr¨ olich. Finite-sample properties of propensity-score matching and weighting estimators. Review of Economics and Statistics, 86(1):77–90, 2004

work page 2004

[17] [17]

R. J. Glynn, M. Lunt, K. J. Rothman, C. Poole, S. Schneeweiss, and T. St¨ urmer. Comparison of alternative approaches to trim subjects in the tails of the propensity score distribution. Pharmacoepidemiology and drug safety, 2019

work page 2019

[18] [18]

J. J. Heckman, H. Ichimura, and P. E. Todd. Matching As An Econometric Evaluation Estima- tor: Evidence from Evaluating a Job Training Programme.The Review of Economic Studies, 64(4):605–654, 10 1997

work page 1997

[19] [19]

Heiler and E

P. Heiler and E. Kazak. Valid inference for treatment effect parameters under irregular iden- tification and many extreme propensity scores.Journal of Econometrics, 222(2), 2021

work page 2021

[20] [20]

H. Hong, M. P. Leung, and J. Li. Inference on finite-population treatment effects under limited overlap.The Econometrics Journal, 23(1):32–47, 2019

work page 2019

[21] [21]

S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 2021

work page 2021

[22] [22]

Hussain, M

Z. Hussain, M. Oberst, M.-C. Shih, and D. Sontag. Falsification before extrapolation in causal effect estimation. InProceedings of the 36th International Conference on Neural Information 22 Processing Systems, 2022. ISBN 9781713871088

work page 2022

[23] [23]

Hussain, M.-C

Z. Hussain, M.-C. Shih, M. Oberst, I. Demirel, and D. Sontag. Falsification of internal and external validity in observational studies via conditional moment restrictions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 5869–5898, 2023

work page 2023

[24] [24]

G. Imbens. Nonparametric estimation of average treatment effects under exogeneity: a review. The Review of Economics and Statistics, 86(1):4–29, 2004

work page 2004

[25] [25]

G. W. Imbens and C. F. Manski. Confidence intervals for partially identified parameters. Econometrica, 72(6):1845–1857, 2004

work page 2004

[26] [26]

C. Ju, J. Schwab, and M. J. van der Laan. On adaptive propensity score truncation in causal inference.Statistical methods in medical research, 2019

work page 2019

[27] [27]

A. B. Juditsky and A. S. Nemirovski. Nonparametric estimation by convex programming.The Annals of Statistics, 37:2278 – 2300, 2009

work page 2009

[28] [28]

N. Kallus. Generalized optimal matching methods for causal inference.Journal of Machine Learning Research, 21(62):1–54, 2020

work page 2020

[29] [29]

Kallus and A

N. Kallus and A. Zhou. Minimax-optimal policy learning under unobserved confounding. Management Science, 67(5):2870–2890, 2021

work page 2021

[30] [30]

J. D. Y. Kang and J. L. Schafer. Demystifying Double Robustness: A Comparison of Alterna- tive Strategies for Estimating a Population Mean from Incomplete Data.Statistical Science, 22(4):523 – 539, 2007

work page 2007

[31] [31]

Khan and D

S. Khan and D. Nekipelov. On uniform inference in nonlinear models with endogeneity.Journal of Econometrics, 240(2):105261, 2024

work page 2024

[32] [32]

Khan and E

S. Khan and E. Tamer. Irregular identification, support conditions, and inverse weight esti- mation.Econometrica, 78(6):2021–2042, 2010

work page 2021

[33] [33]

S. Khan, M. Saveski, and J. Ugander. Off-policy evaluation beyond overlap: Sharp partial iden- tification under smoothness. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 23734–23757, 2024

work page 2024

[34] [34]

Koenker and Y

R. Koenker and Y. Bilias. Quantile regression for duration data: A reappraisal of the penn- sylvania reemployment bonus experiments. InEconomic applications of quantile regression, pages 199–220. Springer, 2002

work page 2002

[35] [35]

R. J. LaLonde. Evaluating the econometric evaluations of training programs with experimental data.The American Economic Review, 76(4):604–620, 1986

work page 1986

[36] [36]

Landgren, D

O. Landgren, D. S. Siegel, D. Auclair, A. Chari, M. Boedigheimer, T. Welliver, K. Mezzi, K. Iskander, and A. Jakubowiak. Carfilzomib-lenalidomide-dexamethasone versus bortezomib- lenalidomide-dexamethasone in patients with newly diagnosed multiple myeloma: results from the prospective, longitudinal, observational commpass study.Blood, 132:799, 2018

work page 2018

[37] [37]

B. K. Lee, J. Lessler, and E. A. Stuart. Weight trimming and propensity score weighting. 23 PLOS ONE, 6(3):1–6, 03 2011

work page 2011

[38] [38]

B. Li, K. Ren, L. Shen, P. Hou, Z. Su, A. Di Bacco, J.-L. Hong, A. Galaznik, A. B. Dash, V. Crossland, P. Dolin, and S. Szalma. Comparing bortezomib-lenalidomide-dexamethasone (vrd) with carfilzomib-lenalidomide-dexamethasone (krd) in the patients with newly diagnosed multiple myeloma (ndmm) in two observational studies.Blood, 132:3298, 2018

work page 2018

[39] [39]

F. Li, L. E. Thomas, and F. Li. Addressing Extreme Propensity Scores via the Overlap Weights.American Journal of Epidemiology, 188(1):250–257, 2018

work page 2018

[40] [40]

Ma and J

X. Ma and J. Wang. Robust inference using inverse probability weighting.Journal of the American Statistical Association, 115(532):1851–1860, 2020

work page 2020

[41] [41]

Y. Ma, P. H. Sant’Anna, Y. Sasaki, and T. Ura. Doubly robust estimators with weak overlap. arXiv:2304.08974 [stat.ME], 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990

work page 1990

[43] [43]

Relating clinical outcomes in multiple myeloma to personal assessment of genetic profile(com-mpass), 2016

NIH. Relating clinical outcomes in multiple myeloma to personal assessment of genetic profile(com-mpass), 2016. URLhttps://clinicaltrials.gov/study/NCT01454297

work page 2016

[44] [44]

M. L. Petersen, K. E. Porter, S. Gruber, Y. Wang, and M. J. van der Laan. Diagnosing and responding to violations in the positivity assumption.Statistical Methods in Medical Research, 21(1):31–54, 2012

work page 2012

[45] [45]

P. R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

work page 1983

[46] [46]

C. Rothe. Robust confidence intervals for average treatment effects under limited overlap. Econometrica, 85(2):645–660, 2017

work page 2017

[47] [47]

Sasaki and T

Y. Sasaki and T. Ura. Estimation and inference for moments of ratios with robustness against large trimming bias.Econometric Theory, 38(1):66–112, 2022

work page 2022

[48] [48]

M. H. Schneider and S. A. Zenios. A comparative study of algorithms for matrix balancing. Operations Research, 38(3), 1990

work page 1990

[49] [49]

J. A. Smith and P. E. Todd. Does matching overcome LaLonde’s critique of nonexperimental estimators?Journal of Econometrics, 125:305–353, 2005

work page 2005

[50] [50]

J. Stoye. More on confidence intervals for partially identified parameters.Econometrica, 77 (4):1299–1315, 2009

work page 2009

[51] [51]

St¨ urmer, K

T. St¨ urmer, K. J. Rothman, J. Avorn, and R. J. Glynn. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study.American Journal of Epidemiology, 172(7):843–854, 2010

work page 2010

[52] [52]

Yang and P

S. Yang and P. Ding. Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores.Biometrika, 105(2):487–493, 2018. 24 A Proof of Lemma 1 In this section, we closely follow the notation in [4]. We letn 1 =| {i:z i = 1} |andn 0 = | {i:z i = 0} |denote the total number of samples withz= 1 andz= 0. With slight ...

work page 2018

[53] [53]

Ifq∈W +, then we haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q

work page

[54] [54]

Ifq∈W −,η q ≤η i∗, i.e.,i ∗ is in the middle ofpandq, then we also haved i∗,q ≤d p,q so that |gp,1 −g q,1|= fi∗,1 −f q,1 ≤d i∗,q ≤d p,q

work page

[55] [55]

left” and “right

Ifq∈W −,η q > η i∗, in this case,q̸∈ Iimpliesf q,1 ≤f i∗,1 ≤f p,1 so |gp,1 −g q,1|= fi∗,1 −f q,1 ≤ fp,1 −f q,1 ≤d p,q. •Ifp̸∈ Iandq∈ I, this is similar to the previous case. Thereforegis feasible and optimal. In summary, we have constructed a solutiongwith i < j≤i ∗ + 1 =⇒η i ≤η j andg i,1 ≥g j,1, Proceeding in the same way as above, we can remove all tro...

work page