The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters

Shosei Sakaguchi

arxiv: 2508.12206 · v6 · submitted 2025-08-17 · 💰 econ.EM

The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters

Shosei Sakaguchi This is my paper

Pith reviewed 2026-05-18 23:22 UTC · model grok-4.3

classification 💰 econ.EM

keywords distributional treatment effectsdata combinationexperimental dataobservational datasharp boundsself-selectionidentificationtreatment heterogeneity

0 comments

The pith

Pairing randomized experiments with self-selected observational data produces sharper nonparametric bounds on distributional treatment effect parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Experimental data alone identifies average treatment effects but leaves many distributional parameters, such as the distribution of individual treatment effects, only partially identified. Adding observational data in which individuals self-select into treatment can tighten the identified sets because that self-selection carries private information not present in the randomized sample. The paper derives nonparametric sharp bounds for broad classes of these parameters and states necessary and sufficient conditions under which the combined data strictly shrinks the identified set. These gains occur generically unless selection on observables fully explains treatment choice in the observational sample. A linear-programming method is proposed to compute the bounds while allowing additional restrictions such as positive dependence between potential outcomes.

Core claim

For broad classes of distributional treatment effect parameters, nonparametric sharp bounds are derived from the combined experimental and observational data. Self-selection in the observational data supplies the key source of identification power beyond randomization alone. Necessary and sufficient conditions are given under which the combined data strictly improves identification, and such gains arise unless selection-on-observables holds in the observational data.

What carries the argument

Nonparametric sharp bounds obtained from the union of randomized experimental data and self-selected observational data, where the self-selection mechanism supplies additional identifying variation.

If this is right

The identified set for the distribution of individual treatment effects shrinks when the observational sample exhibits self-selection unexplained by observables.
The linear programming procedure permits incorporation of structural assumptions such as positive dependence between potential outcomes or the generalized Roy selection model.
In empirical settings such as negative campaign advertisements, the combined data yield narrower ranges for heterogeneous treatment effects than experimental data alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same logic of mixing randomized and self-selected samples could apply to other partial-identification problems that currently rely on one data source.
Researchers could first test whether selection-on-observables holds in the observational data to forecast whether combining it with an experiment will tighten bounds.
Data-collection designs that deliberately include both randomized and self-selected subsamples might become a standard way to obtain tighter distributional estimates for policy.

Load-bearing premise

The observational data must contain genuine self-selection that is not fully explained by the observables already present in the experimental sample.

What would settle it

Compute the sharp bounds from the experimental sample alone and from the combined data in a setting where selection-on-observables is known to fail; if the combined bounds are not strictly narrower, the identification improvement claim is falsified.

read the original abstract

This study investigates the identification power gained by combining experimental data, in which treatment is randomized, with observational data, in which treatment is self-selected, for distributional treatment effect (DTE) parameters. While experimental data identify average treatment effects, many DTE parameters, such as the distribution of individual treatment effects, are only partially identified. We examine whether and how combining these two data sources tightens the identified set for such parameters. For broad classes of DTE parameters, we derive nonparametric sharp bounds under the combined data and clarify the mechanism through which data combination improves identification relative to using experimental data alone. Our analysis highlights that self-selection in observational data is a key source of identification power. We establish necessary and sufficient conditions under which the combined data strictly shrink the identified set, and show that such gains arise generically unless selection-on-observables holds in the observational data. We also propose a linear programming approach to compute sharp bounds that can incorporate additional structural restrictions, such as positive dependence between potential outcomes and the generalized Roy selection model. An empirical application using data on negative campaign advertisements in the 2008 U.S. presidential election illustrates the practical relevance of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

Combining randomized experiments with observational data yields new sharp nonparametric bounds on distributional treatment effects, with gains from self-selection unless selection-on-observables holds. The paper derives these bounds for broad classes of DTE parameters and gives necessary and sufficient conditions for when the pooled data strictly shrinks the identified set relative to the experiment alone. Self-selection in the observational sample supplies the extra identifying power by adding information on how treatment choice relates to potential outcomes. The linear programming setup for computing the bounds is a practical addition that can fold in restrictions like positive dependence or a generalized Roy model. The central argument holds up. The bounds come from intersecting the moment restrictions of the randomized experiment with those implied by endogenous selection, and the conditions for improvement follow directly from that structure without circularity or post-hoc fitting. Soft spots are limited. The improvement requires genuine self-selection in the observational data that is not already captured by observables shared with the experiment; if that feature is weak in practice the tightening could be modest, though the paper notes the gains arise generically. The empirical illustration with negative campaign ads shows the idea in action but would need closer checks on support conditions and how much the bounds actually move. This paper is for econometricians and applied researchers working on partial identification of heterogeneous treatment effects who can access both data types. A reader focused on tightening inference for distributional parameters will find usable tools here. It deserves a serious referee because the identification results are new and the computational approach is straightforward to apply.

Referee Report

2 major / 3 minor

Summary. The paper claims that combining randomized experimental data with observational data under endogenous self-selection yields nonparametric sharp bounds for a broad class of distributional treatment effect (DTE) parameters. It derives the identified set from the intersection of moment restrictions implied by random assignment and by the observational selection process, establishes necessary and sufficient conditions under which the combined data strictly shrink the identified set relative to experimental data alone, shows that such gains arise generically unless selection-on-observables holds in the observational sample, and proposes a linear-programming representation that computes the bounds and accommodates additional restrictions such as positive dependence or the generalized Roy model. An empirical illustration applies the method to negative campaign advertisements in the 2008 U.S. presidential election.

Significance. If the derivations are correct, the paper makes a useful contribution to the partial-identification literature by clarifying how self-selection in observational data supplies identifying power for DTE parameters that remain only partially identified from experimental data alone. The necessary-and-sufficient conditions and the LP formulation are practical strengths that allow researchers to assess the value of data combination and to impose economically motivated restrictions in a transparent way.

major comments (2)

[§4.2] §4.2, the LP formulation: the claim that the linear program computes sharp bounds for the distribution of individual treatment effects rests on the maintained assumption that the support of (Y(0),Y(1)) is the same in both samples; if this common-support condition fails, the feasible set of the LP may exclude some distributions that are consistent with the combined data, so the reported bounds would not be sharp. A brief discussion or robustness check on support overlap would be needed to confirm the central identification result.
[§3.3] §3.3, necessary and sufficient conditions: the proof that gains occur generically unless selection-on-observables holds in the observational data is stated for the case of binary treatment and continuous outcomes; it is not immediately clear whether the same argument extends without modification to the multi-valued or discrete-outcome settings that are also covered by the general DTE class. Clarifying the scope of the generic-gain result would strengthen the main theoretical claim.

minor comments (3)

[§2] The notation for the potential-outcome distributions is introduced in §2 but used with slight variations in §4; a single consolidated definition would improve readability.
Figure 1 (empirical bounds) would benefit from an additional panel or table that reports the experimental-only bounds alongside the combined-data bounds so that the identification gain is immediately visible to the reader.
A few references to the partial-identification literature on DTE parameters (e.g., recent work on bounds for the distribution of treatment effects) appear to be missing from the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment below. Where appropriate, we will revise the paper to incorporate clarifications and additional discussion.

read point-by-point responses

Referee: [§4.2] §4.2, the LP formulation: the claim that the linear program computes sharp bounds for the distribution of individual treatment effects rests on the maintained assumption that the support of (Y(0),Y(1)) is the same in both samples; if this common-support condition fails, the feasible set of the LP may exclude some distributions that are consistent with the combined data, so the reported bounds would not be sharp. A brief discussion or robustness check on support overlap would be needed to confirm the central identification result.

Authors: We agree that the sharpness result for the linear program in Section 4.2 relies on the support of the joint distribution of potential outcomes (Y(0), Y(1)) being identical across the experimental and observational samples. This follows from our maintained assumption that both samples are drawn from the same underlying population, so the support of the potential outcomes is common by construction. Nevertheless, to address the referee's concern explicitly, we will add a short paragraph in Section 4.2 clarifying this assumption and noting that if the supports were to differ (for example, due to sampling from distinct subpopulations), the LP feasible set could be further restricted and the resulting bounds would remain valid but potentially conservative. We will also include a brief robustness discussion suggesting that researchers can restrict the LP to the overlapping support when such differences are suspected. revision: yes
Referee: [§3.3] §3.3, necessary and sufficient conditions: the proof that gains occur generically unless selection-on-observables holds in the observational data is stated for the case of binary treatment and continuous outcomes; it is not immediately clear whether the same argument extends without modification to the multi-valued or discrete-outcome settings that are also covered by the general DTE class. Clarifying the scope of the generic-gain result would strengthen the main theoretical claim.

Authors: The necessary-and-sufficient conditions for strict improvement from data combination are derived within the general framework of Section 3 that applies to the full class of DTE parameters, including multi-valued treatments and discrete outcomes. The generic-gain result is driven by the observation that selection-on-observables constitutes a measure-zero set in the space of admissible selection processes; this geometric argument does not depend on the cardinality of the treatment or the support of the outcome. The detailed proof in the appendix is presented for the binary-continuous case purely for expositional simplicity, but the same logic carries over directly once the appropriate moment restrictions are substituted. In the revision we will add a clarifying remark in Section 3.3 (and a corresponding sentence in the appendix) stating that the generic-gain result holds for the entire DTE class covered by the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the identification derivation

full rationale

The paper derives nonparametric sharp bounds for broad classes of distributional treatment effect parameters by intersecting the moment restrictions implied by random assignment in the experimental sample with the selection process in the observational sample. These bounds and the necessary and sufficient conditions for strict shrinkage of the identified set are obtained directly from the differing information on the joint distribution of (Y(0), Y(1), D) under randomization versus endogenous selection; the construction does not reduce any target quantity to a fitted parameter or to a self-citation that itself depends on the present result. The linear-programming representation is a computational device for the same set of restrictions and does not introduce circularity. The derivation is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard causal-inference assumptions (random assignment in the experiment, common support, and the structure of self-selection) plus the modeling choice that observational treatment is not randomized. No free parameters or invented entities are introduced in the abstract description.

axioms (2)

domain assumption Treatment is randomly assigned in the experimental sample.
Standard maintained assumption for experimental data; invoked when contrasting identification power with observational data.
domain assumption Observational treatment is self-selected and does not satisfy selection-on-observables.
The paper states that identification gains arise generically unless this condition holds; it is the key source of extra identifying power.

pith-pipeline@v0.9.0 · 5734 in / 1358 out tokens · 32750 ms · 2026-05-18T23:22:28.220347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive nonparametric sharp bounds under the combined data... using copula bound analysis... supermodular functions or φ-indicator functions... linear programming approach
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

F^∗_p−(y1,y0) := E[M(F^*_{Y1|SX}(y1|S,X), F^*_{Y0|SX}(y0|S,X))]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.