The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters
Pith reviewed 2026-05-18 23:22 UTC · model grok-4.3
The pith
Pairing randomized experiments with self-selected observational data produces sharper nonparametric bounds on distributional treatment effect parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For broad classes of distributional treatment effect parameters, nonparametric sharp bounds are derived from the combined experimental and observational data. Self-selection in the observational data supplies the key source of identification power beyond randomization alone. Necessary and sufficient conditions are given under which the combined data strictly improves identification, and such gains arise unless selection-on-observables holds in the observational data.
What carries the argument
Nonparametric sharp bounds obtained from the union of randomized experimental data and self-selected observational data, where the self-selection mechanism supplies additional identifying variation.
If this is right
- The identified set for the distribution of individual treatment effects shrinks when the observational sample exhibits self-selection unexplained by observables.
- The linear programming procedure permits incorporation of structural assumptions such as positive dependence between potential outcomes or the generalized Roy selection model.
- In empirical settings such as negative campaign advertisements, the combined data yield narrower ranges for heterogeneous treatment effects than experimental data alone.
Where Pith is reading between the lines
- The same logic of mixing randomized and self-selected samples could apply to other partial-identification problems that currently rely on one data source.
- Researchers could first test whether selection-on-observables holds in the observational data to forecast whether combining it with an experiment will tighten bounds.
- Data-collection designs that deliberately include both randomized and self-selected subsamples might become a standard way to obtain tighter distributional estimates for policy.
Load-bearing premise
The observational data must contain genuine self-selection that is not fully explained by the observables already present in the experimental sample.
What would settle it
Compute the sharp bounds from the experimental sample alone and from the combined data in a setting where selection-on-observables is known to fail; if the combined bounds are not strictly narrower, the identification improvement claim is falsified.
read the original abstract
This study investigates the identification power gained by combining experimental data, in which treatment is randomized, with observational data, in which treatment is self-selected, for distributional treatment effect (DTE) parameters. While experimental data identify average treatment effects, many DTE parameters, such as the distribution of individual treatment effects, are only partially identified. We examine whether and how combining these two data sources tightens the identified set for such parameters. For broad classes of DTE parameters, we derive nonparametric sharp bounds under the combined data and clarify the mechanism through which data combination improves identification relative to using experimental data alone. Our analysis highlights that self-selection in observational data is a key source of identification power. We establish necessary and sufficient conditions under which the combined data strictly shrink the identified set, and show that such gains arise generically unless selection-on-observables holds in the observational data. We also propose a linear programming approach to compute sharp bounds that can incorporate additional structural restrictions, such as positive dependence between potential outcomes and the generalized Roy selection model. An empirical application using data on negative campaign advertisements in the 2008 U.S. presidential election illustrates the practical relevance of the proposed approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that combining randomized experimental data with observational data under endogenous self-selection yields nonparametric sharp bounds for a broad class of distributional treatment effect (DTE) parameters. It derives the identified set from the intersection of moment restrictions implied by random assignment and by the observational selection process, establishes necessary and sufficient conditions under which the combined data strictly shrink the identified set relative to experimental data alone, shows that such gains arise generically unless selection-on-observables holds in the observational sample, and proposes a linear-programming representation that computes the bounds and accommodates additional restrictions such as positive dependence or the generalized Roy model. An empirical illustration applies the method to negative campaign advertisements in the 2008 U.S. presidential election.
Significance. If the derivations are correct, the paper makes a useful contribution to the partial-identification literature by clarifying how self-selection in observational data supplies identifying power for DTE parameters that remain only partially identified from experimental data alone. The necessary-and-sufficient conditions and the LP formulation are practical strengths that allow researchers to assess the value of data combination and to impose economically motivated restrictions in a transparent way.
major comments (2)
- [§4.2] §4.2, the LP formulation: the claim that the linear program computes sharp bounds for the distribution of individual treatment effects rests on the maintained assumption that the support of (Y(0),Y(1)) is the same in both samples; if this common-support condition fails, the feasible set of the LP may exclude some distributions that are consistent with the combined data, so the reported bounds would not be sharp. A brief discussion or robustness check on support overlap would be needed to confirm the central identification result.
- [§3.3] §3.3, necessary and sufficient conditions: the proof that gains occur generically unless selection-on-observables holds in the observational data is stated for the case of binary treatment and continuous outcomes; it is not immediately clear whether the same argument extends without modification to the multi-valued or discrete-outcome settings that are also covered by the general DTE class. Clarifying the scope of the generic-gain result would strengthen the main theoretical claim.
minor comments (3)
- [§2] The notation for the potential-outcome distributions is introduced in §2 but used with slight variations in §4; a single consolidated definition would improve readability.
- Figure 1 (empirical bounds) would benefit from an additional panel or table that reports the experimental-only bounds alongside the combined-data bounds so that the identification gain is immediately visible to the reader.
- A few references to the partial-identification literature on DTE parameters (e.g., recent work on bounds for the distribution of treatment effects) appear to be missing from the introduction.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment below. Where appropriate, we will revise the paper to incorporate clarifications and additional discussion.
read point-by-point responses
-
Referee: [§4.2] §4.2, the LP formulation: the claim that the linear program computes sharp bounds for the distribution of individual treatment effects rests on the maintained assumption that the support of (Y(0),Y(1)) is the same in both samples; if this common-support condition fails, the feasible set of the LP may exclude some distributions that are consistent with the combined data, so the reported bounds would not be sharp. A brief discussion or robustness check on support overlap would be needed to confirm the central identification result.
Authors: We agree that the sharpness result for the linear program in Section 4.2 relies on the support of the joint distribution of potential outcomes (Y(0), Y(1)) being identical across the experimental and observational samples. This follows from our maintained assumption that both samples are drawn from the same underlying population, so the support of the potential outcomes is common by construction. Nevertheless, to address the referee's concern explicitly, we will add a short paragraph in Section 4.2 clarifying this assumption and noting that if the supports were to differ (for example, due to sampling from distinct subpopulations), the LP feasible set could be further restricted and the resulting bounds would remain valid but potentially conservative. We will also include a brief robustness discussion suggesting that researchers can restrict the LP to the overlapping support when such differences are suspected. revision: yes
-
Referee: [§3.3] §3.3, necessary and sufficient conditions: the proof that gains occur generically unless selection-on-observables holds in the observational data is stated for the case of binary treatment and continuous outcomes; it is not immediately clear whether the same argument extends without modification to the multi-valued or discrete-outcome settings that are also covered by the general DTE class. Clarifying the scope of the generic-gain result would strengthen the main theoretical claim.
Authors: The necessary-and-sufficient conditions for strict improvement from data combination are derived within the general framework of Section 3 that applies to the full class of DTE parameters, including multi-valued treatments and discrete outcomes. The generic-gain result is driven by the observation that selection-on-observables constitutes a measure-zero set in the space of admissible selection processes; this geometric argument does not depend on the cardinality of the treatment or the support of the outcome. The detailed proof in the appendix is presented for the binary-continuous case purely for expositional simplicity, but the same logic carries over directly once the appropriate moment restrictions are substituted. In the revision we will add a clarifying remark in Section 3.3 (and a corresponding sentence in the appendix) stating that the generic-gain result holds for the entire DTE class covered by the paper. revision: yes
Circularity Check
No significant circularity in the identification derivation
full rationale
The paper derives nonparametric sharp bounds for broad classes of distributional treatment effect parameters by intersecting the moment restrictions implied by random assignment in the experimental sample with the selection process in the observational sample. These bounds and the necessary and sufficient conditions for strict shrinkage of the identified set are obtained directly from the differing information on the joint distribution of (Y(0), Y(1), D) under randomization versus endogenous selection; the construction does not reduce any target quantity to a fitted parameter or to a self-citation that itself depends on the present result. The linear-programming representation is a computational device for the same set of restrictions and does not introduce circularity. The derivation is therefore self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Treatment is randomly assigned in the experimental sample.
- domain assumption Observational treatment is self-selected and does not satisfy selection-on-observables.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive nonparametric sharp bounds under the combined data... using copula bound analysis... supermodular functions or φ-indicator functions... linear programming approach
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
F^∗_p−(y1,y0) := E[M(F^*_{Y1|SX}(y1|S,X), F^*_{Y0|SX}(y0|S,X))]
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.