Bounding Treatment Effects by Pooling Limited Information across Observations

Martin Weidner; Sokbae Lee

arxiv: 2111.05243 · v8 · submitted 2021-11-09 · 💰 econ.EM · stat.ME

Bounding Treatment Effects by Pooling Limited Information across Observations

Sokbae Lee , Martin Weidner This is my paper

Pith reviewed 2026-05-24 13:31 UTC · model grok-4.3

classification 💰 econ.EM stat.ME

keywords treatment effectsboundsunconfoundednesspartial identificationpropensity scorepoolingoverlap violation

0 comments

The pith

Bounds on average treatment effects remain valid under unconfoundedness by constructing sample averages that depend on the treatment status of only a limited number of observations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs bounds on average treatment effects on the treated that continue to hold when conditioning variables take many distinct values or when the overlap condition is violated. It achieves this by restricting how much information is pooled across observations, so that each observed outcome enters the bound only through its dependence on a small number of other observations' treatment indicators. This intermediate approach sits between the fully non-pooled Manski bounds and the fully pooled inverse-propensity weighting estimator, and the authors supply corresponding inference procedures. Monte Carlo simulations and two empirical applications illustrate that the resulting bounds stay informative in practice.

Core claim

By forming bounds as sample averages over functions of observed outcomes where each outcome's contribution depends on the treatment status of only a limited number of observations, one obtains valid bounds on average treatment effects under the unconfoundedness assumption that remain robust precisely in the regimes where standard methods break down.

What carries the argument

Limited information pooling: the device that restricts dependence of each outcome contribution to a small number of treatment statuses, interpolating between Manski bounds and inverse propensity weighting.

If this is right

The bounds stay valid and can be computed even when the number of distinct covariate values approaches the sample size.
The same construction yields valid bounds when the propensity score is zero or one for some covariate cells.
Inference procedures accompany the bounds and deliver valid confidence intervals under the maintained assumptions.
In Monte Carlo designs that replicate high-dimensional covariates or overlap violations, the pooled bounds remain tighter than Manski bounds while retaining coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The limited-pooling construction may extend naturally to settings with continuous covariates by choosing the pooling window size as a tuning parameter.
Researchers could compare the width of these bounds against fully nonparametric estimates in data sets where overlap is known to hold, to quantify the price of robustness.
The same limited-dependence structure could be applied to other partial-identification problems that currently rely on either no pooling or full pooling.

Load-bearing premise

Treatment assignment is independent of potential outcomes conditional on the observed covariates.

What would settle it

A data set in which treatment assignment is demonstrably dependent on potential outcomes even after conditioning on the covariates would render the bounds invalid.

Figures

Figures reproduced from arXiv: 2111.05243 by Martin Weidner, Sokbae Lee.

**Figure 1.** Figure 1: Two simple examples for samples of (Xi , Di), i = 1, . . . , n, with n = 100. For the example on the left we have one-dimensional Xi ∼ U[0, 1] and p(x) = x 4 . For the example on the right we have two-dimensional Xi ∼ U[0, 1]2 , and p(x) = 0.3. implies that5 E h Ce(2) ij (amax) [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Weights w (2)(p, p∗) as a function of p, for different values of p∗. Second-order ATT bounds So far we have focused on the ATE bounds. We now apply the same argument to derive ATT bounds. We want to generalize the first-oder bounds in (4) by replacing C (1)(a) with C (2)(a) = D (Y − a) + [λ0(X) + λ1(X) p(X)] (1 − D) (Y − a), (15) where the coefficients λ0(x), λ1(x) ∈ R need to be determined such that E [P… view at source ↗

**Figure 3.** Figure 3: Weights we (2)(p, p∗) as a function of p, for different values of p∗. where the weight function we (2) : [0, 1] × (0, 1] → (−∞, 1] reads we (2)(p, p∗) := 1 − 1 p p − p∗ 1 − p∗ 2 . Under Assumption 1 we calculate that E [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Weights w (q) (p, p∗) and we (q) (p, p∗) as a function of p, for p∗ = 0.4 and q ∈ {1, 2, 3, 4}. The proof is given in the appendix. To better understand the result of Proposition (2), consider the lower bound on E [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Sample Weights wb1(x) plotted as a function of n1(x)/n(x) for p∗(x) = 0.4 (left), p∗(x) = 0.5 (middle) and p∗(x) = 0.6 (right). The corresponding population weights w (q) (p(x), p(x)∗) are also plotted as a function of p(x). p∗(x) > 1 2 the absolute values of the weights wb0(x) and vb(x) grow exponentially with n0(x). Only for p∗(x) = 1/2 are all the sample weights bounded, independent of the realization o… view at source ↗

read the original abstract

We provide novel bounds on average treatment effects (on the treated) that are valid under an unconfoundedness assumption. Our bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. Namely, the bounds are constructed as sample averages over functions of the observed outcomes such that the contribution of each outcome only depends on the treatment status of a limited number of observations. No information pooling across observations leads to so-called "Manski bounds", while unlimited information pooling leads to standard inverse propensity score weighting. We explore the intermediate range between these two extremes and provide corresponding inference methods. We show in Monte Carlo experiments and through two empirical application that our bounds are indeed robust and informative in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lee and Weidner give ATT bounds that sit between Manski and full IPW by restricting how much each observation's outcome can depend on others' treatment status.

read the letter

The main takeaway is that these bounds stay valid under unconfoundedness while staying informative when covariates take many values or overlap is weak. They achieve this by building sample averages where each term only uses treatment status from a limited number of other observations, rather than none (Manski) or all (IPW). The paper also supplies inference methods for the resulting bounds. Monte Carlo designs that enforce unconfoundedness show the bounds cover the true ATT, and the two empirical examples demonstrate they can be tighter than pure Manski bounds in practice. That combination of construction and checks is the useful part. One soft spot is that users still need guidance on how to pick the pooling level in applications; if the paper leaves that mostly to discretion, it could limit how automatically the method travels. The dependence restriction itself looks clean and avoids the circularity that sometimes appears in other partial identification work. This is for applied econometricians who routinely face high-dimensional covariates or thin overlap and want something between the two extremes. It shows honest engagement with the existing Manski and IPW literature and supplies reproducible Monte Carlo evidence. I would send it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper proposes novel bounds on the average treatment effect on the treated (ATT) that remain valid under the standard unconfoundedness (selection on observables) assumption. The bounds are formed as sample averages of functions of observed outcomes where each outcome contributes information depending on the treatment status of only a limited number of other observations; this limited-pooling construction is designed to remain informative when the covariate support is large or overlap fails. The approach explicitly interpolates between Manski bounds (zero pooling) and inverse-propensity-score weighting (unrestricted pooling), supplies corresponding inference procedures, and is illustrated with Monte Carlo experiments that enforce unconfoundedness plus two empirical applications.

Significance. If the derivations and coverage results hold, the contribution is useful for applied work in settings where standard IPSW or matching methods become unreliable due to sparse cells or lack of overlap. The explicit Monte Carlo designs that maintain unconfoundedness and report finite-sample coverage of the true ATT, together with the parameter-free character of the validity guarantee (which depends only on unconfoundedness, not on the chosen pooling level), are concrete strengths that would make the bounds a practical addition to the econometric toolkit for partial identification of treatment effects.

minor comments (2)

[Abstract] The abstract states that the bounds are 'constructed as sample averages over functions of the observed outcomes such that the contribution of each outcome only depends on the treatment status of a limited number of observations,' but does not indicate how the limited number is chosen in practice or whether it is data-driven; a brief clarification in the introduction would help readers understand the tuning parameter.
[Monte Carlo experiments] The Monte Carlo section reports coverage under designs that enforce unconfoundedness, but it would be helpful to see a small table or figure showing how coverage and bound width vary with the pooling parameter for the same design; this would make the robustness claim more transparent.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the paper, the recognition of its practical strengths for applied work, and the recommendation for minor revision. No major comments appear in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives bounds on the ATT directly from the unconfoundedness assumption by constructing sample averages of outcome functions whose dependence on other observations is restricted by design. This construction is explicit in the abstract and does not reduce any claimed bound to a fitted parameter or to a self-citation chain; the validity guarantee is stated to hold exactly when unconfoundedness holds, and Monte Carlo designs are supplied to verify coverage under that assumption. No load-bearing step equates a derived quantity to its own input by definition or renames a known result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard unconfoundedness assumption from causal inference; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Unconfoundedness assumption
Bounds are stated to be valid under this assumption (abstract).

pith-pipeline@v0.9.0 · 5676 in / 1161 out tokens · 33791 ms · 2026-05-24T13:31:44.362337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

bounds are constructed as sample averages over functions of the observed outcomes such that the contribution of each outcome only depends on the treatment status of a limited number of observations
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reference propensity score p*

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Abadie, A. and G. W. Imbens (2006). Large sample properties of matching estimators for average treatment effects. Econometrica\/ 74\/ (1), 235--267

work page 2006
[2]

Abadie, A. and G. W. Imbens (2008). On the failure of the bootstrap for matching estimators. Econometrica\/ 76\/ (6), 1537--1557

work page 2008
[3]

Abadie, A. and G. W. Imbens (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics\/ 29\/ (1), 1--11

work page 2011
[4]

Armstrong, T. B. and M. Koles\' a r (2021). Finite-sample optimal estimation and inference on average treatment effects under unconfoundedness. Econometrica\/ 89\/ (3), 1141--1177

work page 2021
[5]

Bhattacharya, J., A. M. Shaikh, and E. Vytlacil (2008). Treatment effect bounds under monotonicity assumptions: an application to Swan-Ganz catheterization. American Economic Review: Papers and Proceedings\/ 98\/ (2), 351--56

work page 2008
[6]

Bhattacharya, J., A. M. Shaikh, and E. Vytlacil (2012). Treatment effect bounds: An application to Swan-Ganz catheterization. Journal of Econometrics\/ 168\/ (2), 223--243

work page 2012
[7]

Lamadon, and E

Bonhomme, S., T. Lamadon, and E. Manresa (2021). Discretizing unobserved heterogeneity. Econometrica\/ . Forthcoming

work page 2021
[8]

DiNardo, and J

Busso, M., J. DiNardo, and J. McCrary (2014). New evidence on the finite sample properties of propensity score reweighting and matching estimators. Review of Economics and Statistics\/ 96\/ (5), 885--897

work page 2014
[9]

Lee, and A

Chernozhukov, V., S. Lee, and A. M. Rosen (2013). Intersection bounds: estimation and inference. Econometrica\/ 81\/ (2), 667--737

work page 2013
[10]

Speroff, N

Connors, Alfred F., J., T. Speroff, N. V. Dawson, C. Thomas, J. Harrell, Frank E., D. Wagner, N. Desbiens, L. Goldman, A. W. Wu, R. M. Califf, J. Fulkerson, William J., H. Vidaillet, S. Broste, P. Bellamy, J. Lynn, and W. A. Knaus (1996). The Effectiveness of Right Heart Catheterization in the Initial Care of Critically III Patients . JAMA \/ 276\/ (11), 889--897

work page 1996
[11]

Crump, R. K., V. J. Hotz, G. W. Imbens, and O. A. Mitnik (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika\/ 96\/ (1), 187--199

work page 2009
[12]

D'Amour, A., P. Ding, A. Feller, L. Lei, and J. Sekhon (2021). Overlap in observational studies with high-dimensional covariates. Journal of Econometrics\/ 221\/ (2), 644--654

work page 2021
[13]

Everitt, B. S., S. Landau, M. Leese, and D. Stahl (2011). Cluster Analysis\/ (5th ed.). John Wiley & Sons

work page 2011
[14]

Hirano, K. and G. W. Imbens (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology\/ 2\/ (3-4), 259--278

work page 2001
[15]

Hong, H., M. P. Leung, and J. Li (2019). Inference on finite--population treatment effects under limited overlap . Econometrics Journal\/ 23 , 32--47

work page 2019
[16]

Imbens, G. W. and C. F. Manski (2004). Confidence intervals for partially identified parameters. Econometrica\/ 72\/ (6), 1845--1857

work page 2004
[17]

Imbens, G. W. and D. B. Rubin (2015). Causal inference in statistics, social, and biomedical sciences . Cambridge University Press

work page 2015
[18]

Imbens, G. W. and J. M. Wooldridge (2009, March). Recent developments in the econometrics of program evaluation. Journal of Economic Literature\/ 47\/ (1), 5--86

work page 2009
[19]

Kaufman, L. and P. J. Rousseeuw (2005). Finding groups in data: an introduction to cluster analysis . John Wiley & Sons

work page 2005
[20]

Khan, S. and E. Tamer (2010). Irregular identification, support conditions, and inverse weight estimation. Econometrica\/ 78\/ (6), 2021--2042

work page 2010
[21]

Li, F., K. L. Morgan, and A. M. Zaslavsky (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association\/ 113\/ (521), 390--400

work page 2018
[22]

Rousseeuw, A

Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik (2021). cluster: Cluster Analysis Basics and Extensions . R package version 2.1.1. https://CRAN.R-project.org/package=cluster

work page 2021
[23]

Manski, C. F. (1989). Anatomy of the selection problem. Journal of Human resources\/ , 343--360

work page 1989
[24]

Manski, C. F. (1990). Nonparametric bounds on treatment effects. American Economic Review\/ 80\/ (2), 319--323

work page 1990
[25]

M \"u llner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python . Journal of Statistical Software\/ 53\/ (9), 1--18

work page 2013
[26]

Nethery, R. C., F. Mealli, and F. Dominici (2019). Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality. Annals of Applied Statistics\/ 13\/ (2), 1242--1267

work page 2019
[27]

Rothe, C. (2017). Robust confidence intervals for average treatment effects under limited overlap. Econometrica\/ 85\/ (2), 645--660

work page 2017
[28]

Sasaki, Y. and T. Ura (2021). Estimation and inference for moments of ratios with robustness against large trimming bias. Econometric Theory\/ . https://doi.org/10.1017/S0266466621000025, forthcoming

work page doi:10.1017/s0266466621000025 2021
[29]

Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Annals of Statistics\/ 8\/ (6), 1348--1360

work page 1980
[30]

Stoye, J. (2009). More on confidence intervals for partially identified parameters. Econometrica\/ 77\/ (4), 1299--1315

work page 2009
[31]

Stoye, J. (2020). A simple, short, but never-empty confidence interval for partially identified parameters. ar X iv:2010.10484, [econ.EM], https://arxiv.org/abs/2010.10484

work page arXiv 2020
[32]

Yang, S. and P. Ding (2018). Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores . Biometrika\/ 105\/ (2), 487--493

work page 2018

[1] [1]

Abadie, A. and G. W. Imbens (2006). Large sample properties of matching estimators for average treatment effects. Econometrica\/ 74\/ (1), 235--267

work page 2006

[2] [2]

Abadie, A. and G. W. Imbens (2008). On the failure of the bootstrap for matching estimators. Econometrica\/ 76\/ (6), 1537--1557

work page 2008

[3] [3]

Abadie, A. and G. W. Imbens (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics\/ 29\/ (1), 1--11

work page 2011

[4] [4]

Armstrong, T. B. and M. Koles\' a r (2021). Finite-sample optimal estimation and inference on average treatment effects under unconfoundedness. Econometrica\/ 89\/ (3), 1141--1177

work page 2021

[5] [5]

Bhattacharya, J., A. M. Shaikh, and E. Vytlacil (2008). Treatment effect bounds under monotonicity assumptions: an application to Swan-Ganz catheterization. American Economic Review: Papers and Proceedings\/ 98\/ (2), 351--56

work page 2008

[6] [6]

Bhattacharya, J., A. M. Shaikh, and E. Vytlacil (2012). Treatment effect bounds: An application to Swan-Ganz catheterization. Journal of Econometrics\/ 168\/ (2), 223--243

work page 2012

[7] [7]

Lamadon, and E

Bonhomme, S., T. Lamadon, and E. Manresa (2021). Discretizing unobserved heterogeneity. Econometrica\/ . Forthcoming

work page 2021

[8] [8]

DiNardo, and J

Busso, M., J. DiNardo, and J. McCrary (2014). New evidence on the finite sample properties of propensity score reweighting and matching estimators. Review of Economics and Statistics\/ 96\/ (5), 885--897

work page 2014

[9] [9]

Lee, and A

Chernozhukov, V., S. Lee, and A. M. Rosen (2013). Intersection bounds: estimation and inference. Econometrica\/ 81\/ (2), 667--737

work page 2013

[10] [10]

Speroff, N

Connors, Alfred F., J., T. Speroff, N. V. Dawson, C. Thomas, J. Harrell, Frank E., D. Wagner, N. Desbiens, L. Goldman, A. W. Wu, R. M. Califf, J. Fulkerson, William J., H. Vidaillet, S. Broste, P. Bellamy, J. Lynn, and W. A. Knaus (1996). The Effectiveness of Right Heart Catheterization in the Initial Care of Critically III Patients . JAMA \/ 276\/ (11), 889--897

work page 1996

[11] [11]

Crump, R. K., V. J. Hotz, G. W. Imbens, and O. A. Mitnik (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika\/ 96\/ (1), 187--199

work page 2009

[12] [12]

D'Amour, A., P. Ding, A. Feller, L. Lei, and J. Sekhon (2021). Overlap in observational studies with high-dimensional covariates. Journal of Econometrics\/ 221\/ (2), 644--654

work page 2021

[13] [13]

Everitt, B. S., S. Landau, M. Leese, and D. Stahl (2011). Cluster Analysis\/ (5th ed.). John Wiley & Sons

work page 2011

[14] [14]

Hirano, K. and G. W. Imbens (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology\/ 2\/ (3-4), 259--278

work page 2001

[15] [15]

Hong, H., M. P. Leung, and J. Li (2019). Inference on finite--population treatment effects under limited overlap . Econometrics Journal\/ 23 , 32--47

work page 2019

[16] [16]

Imbens, G. W. and C. F. Manski (2004). Confidence intervals for partially identified parameters. Econometrica\/ 72\/ (6), 1845--1857

work page 2004

[17] [17]

Imbens, G. W. and D. B. Rubin (2015). Causal inference in statistics, social, and biomedical sciences . Cambridge University Press

work page 2015

[18] [18]

Imbens, G. W. and J. M. Wooldridge (2009, March). Recent developments in the econometrics of program evaluation. Journal of Economic Literature\/ 47\/ (1), 5--86

work page 2009

[19] [19]

Kaufman, L. and P. J. Rousseeuw (2005). Finding groups in data: an introduction to cluster analysis . John Wiley & Sons

work page 2005

[20] [20]

Khan, S. and E. Tamer (2010). Irregular identification, support conditions, and inverse weight estimation. Econometrica\/ 78\/ (6), 2021--2042

work page 2010

[21] [21]

Li, F., K. L. Morgan, and A. M. Zaslavsky (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association\/ 113\/ (521), 390--400

work page 2018

[22] [22]

Rousseeuw, A

Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik (2021). cluster: Cluster Analysis Basics and Extensions . R package version 2.1.1. https://CRAN.R-project.org/package=cluster

work page 2021

[23] [23]

Manski, C. F. (1989). Anatomy of the selection problem. Journal of Human resources\/ , 343--360

work page 1989

[24] [24]

Manski, C. F. (1990). Nonparametric bounds on treatment effects. American Economic Review\/ 80\/ (2), 319--323

work page 1990

[25] [25]

M \"u llner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python . Journal of Statistical Software\/ 53\/ (9), 1--18

work page 2013

[26] [26]

Nethery, R. C., F. Mealli, and F. Dominici (2019). Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality. Annals of Applied Statistics\/ 13\/ (2), 1242--1267

work page 2019

[27] [27]

Rothe, C. (2017). Robust confidence intervals for average treatment effects under limited overlap. Econometrica\/ 85\/ (2), 645--660

work page 2017

[28] [28]

Sasaki, Y. and T. Ura (2021). Estimation and inference for moments of ratios with robustness against large trimming bias. Econometric Theory\/ . https://doi.org/10.1017/S0266466621000025, forthcoming

work page doi:10.1017/s0266466621000025 2021

[29] [29]

Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Annals of Statistics\/ 8\/ (6), 1348--1360

work page 1980

[30] [30]

Stoye, J. (2009). More on confidence intervals for partially identified parameters. Econometrica\/ 77\/ (4), 1299--1315

work page 2009

[31] [31]

Stoye, J. (2020). A simple, short, but never-empty confidence interval for partially identified parameters. ar X iv:2010.10484, [econ.EM], https://arxiv.org/abs/2010.10484

work page arXiv 2020

[32] [32]

Yang, S. and P. Ding (2018). Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores . Biometrika\/ 105\/ (2), 487--493

work page 2018