Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

Fatima Kasenally; Frank Windmeijer; Ruoxi Guan

arxiv: 2606.20240 · v1 · pith:QDGYE5CEnew · submitted 2026-06-18 · 💰 econ.EM · stat.AP

Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

Fatima Kasenally , Ruoxi Guan , Frank Windmeijer This is my paper

Pith reviewed 2026-06-26 15:14 UTC · model grok-4.3

classification 💰 econ.EM stat.AP

keywords two-sample IVinstrumental variablesheteroskedasticityoveridentification testweak instrumentsefficient estimationsummary statisticstwo-step estimator

0 comments

The pith

Two-sample IV estimation achieves efficiency under heteroskedasticity and sample differences using only six regression summary statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a two-step estimator for instrumental variables when the outcome and treatment come from separate samples but instruments appear in both. This procedure remains efficient even when the samples differ in distribution and the errors are heteroskedastic, unlike the usual two-sample two-stage least squares approach. Only the coefficient vectors and their variance matrices from the reduced-form and first-stage regressions in each sample are required to build both the estimator and a corresponding overidentification test. The same summary statistics also support tests for weak instruments, with an adjustment that accounts for heteroskedasticity. The method is shown in an application to education and voting behavior.

Core claim

We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias her

What carries the argument

The two-step procedure that assembles the six summary statistics (coefficient vectors plus homoskedastic and robust variance matrices from reduced-form and first-stage regressions in each sample) into an efficient estimator and overidentification test.

If this is right

Efficient two-sample IV estimates and standard errors can be obtained without access to individual-level data from both samples.
A Hansen-style test for overidentification becomes available in the two-sample setting under heteroskedasticity.
The first-stage F-statistic from the treatment sample serves as a valid weak-instrument test when samples are homogeneous and errors homoskedastic.
An adjusted effective F-statistic extends the weak-instrument test to the heteroskedastic and heterogeneous case.
Relative bias from weak instruments remains proportional to the usual single-sample case under the stated conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could pool summary statistics across independent surveys or administrative datasets to perform IV analyses that would otherwise require data sharing.
The approach may reduce computational and privacy barriers in large-scale policy evaluations that rely on separate data sources for outcomes and treatments.
Similar summary-statistic methods could be explored for two-sample settings with nonlinear models or additional covariates if the corresponding reduced-form and first-stage objects can be defined.
The method invites checks on whether efficiency gains persist when samples exhibit strong dependence or when instruments are weak in one sample only.

Load-bearing premise

The six summary statistics from the separate sample regressions are jointly sufficient to deliver the efficient estimator and valid tests without any further assumptions on how the two samples relate in their joint distribution.

What would settle it

A simulation in which the two samples are drawn from populations with different variances and the two-step estimator's reported standard errors fail to match the Monte Carlo variability of the estimates would show the procedure does not achieve the claimed efficiency.

read the original abstract

Two-sample IV is a popular estimation method when the outcome and treatment variables are available in different samples, whereas instruments are available in both samples. The standard estimator is two-sample two-stage least squares estimator, which is efficient under homoskedasticity and homogeneity of the samples. We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias here a proportional bias. We propose an extension of the effective F-statistic of Montiel-Olea and Pflueger (2013) for the heteroskedastic case, following the generalization in Windmeijer (2025). We illustrate the estimators and tests in an application studying the effect of education on voting behavior from Marshall (2019), with cluster robust inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical two-step two-sample IV estimator that stays efficient under heteroskedasticity and sample differences, using only six summary statistics plus adapted Hansen and weak-IV tests.

read the letter

The core advance is the two-step procedure that delivers efficiency for two-sample IV when heteroskedasticity is present and the samples are not identical, together with a two-sample Hansen test and an extension of the effective F-statistic. All of it runs off the coefficient vectors and both homoskedastic and robust variance matrices from the reduced-form and first-stage regressions in each sample.

This is useful because many two-sample applications already publish exactly those six objects. The approach therefore lowers the data-sharing bar while still handling the complications that standard two-sample 2SLS ignores. It sits cleanly on top of the Windmeijer 2025 generalization and the Montiel-Olea-Pflueger framework, so the incremental step is clear rather than a wholesale reinvention.

The main open question is how sensitive the efficiency result is to the precise form of heterogeneity across samples; the abstract claims it works under general conditions, but the size and power of the new tests will need Monte Carlo confirmation that is not visible here. The application to Marshall 2019 is mainly illustrative and does not stress-test the new features heavily.

The paper is aimed at applied researchers who already use two-sample IV and want to move beyond homoskedasticity assumptions without merging micro data. A reader who cares about implementation details and minimal data requirements will get something concrete. It is worth sending to referees because the claim is well-motivated, the data requirement is genuinely light, and the technical extension looks feasible on standard GMM grounds.

Referee Report

2 major / 2 minor

Summary. The paper develops a two-step GMM-style estimator for two-sample IV that achieves efficiency under heteroskedasticity and sample heterogeneity using only six summary statistics (reduced-form and first-stage coefficient vectors plus their homoskedastic and heteroskedasticity-robust variance matrices) from the two samples. It also constructs a two-sample Hansen overidentification test from the same objects and extends weak-instrument diagnostics, including a heteroskedasticity-robust effective F-statistic. The methods are applied to Marshall (2019) on education and voting with cluster-robust inference.

Significance. If the sufficiency claim and asymptotic results hold, the contribution is practically important: it allows efficient two-sample IV and valid specification tests without micro-data sharing, which is common in privacy-sensitive or administrative-data settings. The explicit use of only summary statistics and the generalization of the Montiel-Olea–Pflueger effective F-statistic are concrete strengths that could be adopted quickly.

major comments (2)

[Abstract / two-step estimator derivation] Abstract and the section deriving the two-step estimator: the central claim that the six summary statistics are jointly sufficient for an efficient estimator and valid tests rests on the assumption that cross-sample covariances are zero and that the weighting matrix can be formed from the reported variance matrices alone; the manuscript must show explicitly how the optimal weighting matrix is constructed from these objects without additional cross-sample moments.
[Weak-instrument section] Section on weak-instrument diagnostics: the claim that the first-stage F-statistic from the treatment sample yields a 'proportional bias' under homoskedasticity and homogeneity requires the explicit proportionality factor (likely involving relative sample sizes or variance ratios) to be derived; without it, the test's size and power properties in the two-sample setting remain unclear.

minor comments (2)

[Application] The application section should report both the standard two-sample 2SLS and the new efficient estimator side-by-side with the same six summary statistics to allow direct comparison of point estimates and standard errors.
[Notation / estimator section] Notation for the two samples (e.g., sample sizes N1 and N2) should be introduced early and used consistently when stating the asymptotic variance of the two-step estimator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. We address each major comment below and will incorporate clarifications in the revision.

read point-by-point responses

Referee: [Abstract / two-step estimator derivation] Abstract and the section deriving the two-step estimator: the central claim that the six summary statistics are jointly sufficient for an efficient estimator and valid tests rests on the assumption that cross-sample covariances are zero and that the weighting matrix can be formed from the reported variance matrices alone; the manuscript must show explicitly how the optimal weighting matrix is constructed from these objects without additional cross-sample moments.

Authors: We agree that an explicit derivation of the optimal weighting matrix will improve clarity. Because the two samples are independent by construction in the two-sample IV setting, all cross-sample covariance blocks are zero. The optimal GMM weighting matrix is therefore block-diagonal and can be assembled directly from the four reported variance matrices (homoskedastic and robust versions for each sample). In the revision we will add a short subsection that writes out the resulting 2x2 block weighting matrix in terms of the six summary statistics only. revision: yes
Referee: [Weak-instrument section] Section on weak-instrument diagnostics: the claim that the first-stage F-statistic from the treatment sample yields a 'proportional bias' under homoskedasticity and homogeneity requires the explicit proportionality factor (likely involving relative sample sizes or variance ratios) to be derived; without it, the test's size and power properties in the two-sample setting remain unclear.

Authors: We accept the point. Under the maintained assumptions of homoskedasticity and sample homogeneity the relative bias of the two-sample 2SLS estimator is exactly proportional to the bias of the single-sample estimator, with the proportionality factor equal to n2 / (n1 + n2) scaled by the ratio of the first-stage variances. We will insert the explicit factor and the corresponding size/power discussion in the weak-instrument section of the revision. revision: yes

Circularity Check

0 steps flagged

Minor self-citation for weak-instrument extension; core two-sample estimator self-contained

full rationale

The paper's central derivation constructs an efficient two-step estimator and two-sample Hansen test from six summary statistics (reduced-form and first-stage coefficients plus their variance matrices) under sample independence. This follows directly from standard GMM on independent samples with zero cross-sample covariances, without reduction to fitted inputs or self-definitional steps. The sole self-citation is to Windmeijer (2025) solely to extend the effective F-statistic; this is not load-bearing for the primary claims on estimation or overidentification testing. No ansatz smuggling, uniqueness theorems, or renaming of known results occurs in the provided derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard instrumental-variables assumptions plus the claim that summary statistics from separate samples are jointly sufficient for efficiency and testing.

axioms (2)

domain assumption Instruments are valid and the usual IV moment conditions hold separately in each sample
Implicit in any two-sample IV setup described in the abstract
domain assumption The six regression summary statistics are jointly sufficient for the efficient estimator and tests
Explicitly stated as the key feature of the approach

pith-pipeline@v0.9.1-grok · 5765 in / 1327 out tokens · 26083 ms · 2026-06-26T15:14:20.425611+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 14 canonical work pages

[1]

Diebold and Roberto S

Angrist, Joshua D. and Krueger, Alan B. , journal =. Split-Sample Instrumental Variables Estimates of the Return to Schooling , year =. doi:10.1080/07350015.1995.10524597 , publisher =

work page doi:10.1080/07350015.1995.10524597 1995
[2]

and Jörn-Steffen Pischke , publisher =

Angrist, Joshua D. and Jörn-Steffen Pischke , publisher =. Mostly Harmless Econometrics , year =
[3]

Female Labour Supply and On-the-Job Search: An Empirical Model Estimated Using Complementary Data Sets , year =

Arellano, Manuel and Meghir, Costas , journal =. Female Labour Supply and On-the-Job Search: An Empirical Model Estimated Using Complementary Data Sets , year =. doi:10.2307/2297863 , publisher =

work page doi:10.2307/2297863
[4]

and Pinger, Pia R

van den Berg, Gerard J. and Pinger, Pia R. and Schoch, Johannes , journal =. Instrumental Variable Estimation of the Causal Effect of Hunger Early in Life on Health Later in Life , year =. doi:10.1111/ecoj.12250 , publisher =

work page doi:10.1111/ecoj.12250
[5]

and Evans, William N

Dee, Thomas S. and Evans, William N. , journal =. Teen Drinking and Educational Attainment: Evidence from Two‐Sample Instrumental Variables Estimates , year =. doi:10.1086/344127 , publisher =

work page doi:10.1086/344127
[6]

Two-Sample Instrumental Variables Estimators , year =

Inoue, Atsushi and Solon, Gary , journal =. Two-Sample Instrumental Variables Estimators , year =. doi:10.1162/rest_a_00011 , publisher =

work page doi:10.1162/rest_a_00011
[7]

Two-Sample Instrumental Variables Estimators , year =

Inoue, Atsushi and Solon, Gary , journal =. Two-Sample Instrumental Variables Estimators , year =
[8]

Anders Klevmarken , journal =

N. Anders Klevmarken , journal =. Missing Variables and Two-Stage Least Squares Estimation from More than One Data Set , year =
[9]

Robust inference for the Two-Sample 2SLS estimator , year =

Pacini, David and Windmeijer, Frank , journal =. Robust inference for the Two-Sample 2SLS estimator , year =. doi:10.1016/j.econlet.2016.06.033 , publisher =

work page doi:10.1016/j.econlet.2016.06.033 2016
[10]

, journal =

Zhao, Qingyuan and Wang, Jingshu and Spiller, Wes and Bowden, Jack and Small, Dylan S. , journal =. Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples , year =. doi:10.1214/18-sts692 , publisher =

work page doi:10.1214/18-sts692
[11]

1992 , journal =

Angrist, Joshua D. and Krueger, Alan B. , journal =. The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples , year =. doi:10.1080/01621459.1992.10475212 , publisher =

work page doi:10.1080/01621459.1992.10475212 1992
[12]

, journal =

Zhao, Qingyuan and Wang, Jingshu and Hemani, Gibran and Bowden, Jack and Small, Dylan S. , journal =. Statistical Inference in Two-Sample Summary-Data Mendelian Randomization Using Robust Adjusted Profile Score , year =. doi:10.1214/19-aos1866 , publisher =

work page doi:10.1214/19-aos1866
[13]

Two-Sample Instrumental-Variables Regression with Potentially Weak Instruments , year =

Choi, Jaerim and Shen, Shu , journal =. Two-Sample Instrumental-Variables Regression with Potentially Weak Instruments , year =. doi:10.1177/1536867x19874235 , publisher =

work page doi:10.1177/1536867x19874235
[14]

Weak-Instrument Robust Inference for Two-Sample Instrumental Variables Regression , year =

Choi, Jaerim and Gu, Jiaying and Shen, Shu , journal =. Weak-Instrument Robust Inference for Two-Sample Instrumental Variables Regression , year =. doi:10.1002/jae.2580 , publisher =

work page doi:10.1002/jae.2580
[15]

and Yogo, Motohiro , pages =

Stock, James H. and Yogo, Motohiro , pages =. Testing for Weak Instruments in Linear IV Regression , year =. Identification and Inference for Econometric Models , doi =
[16]

, journal =

Staiger, Douglas and Stock, James H. , journal =. Instrumental Variables Regression with Weak Instruments , year =. doi:10.2307/2171753 , publisher =

work page doi:10.2307/2171753
[17]

The robust F-statistic as a test for weak instruments , year =

Windmeijer, Frank , journal =. The robust F-statistic as a test for weak instruments , year =. doi:10.1016/j.jeconom.2025.105951 , publisher =

work page doi:10.1016/j.jeconom.2025.105951 2025
[18]

A Robust Test for Weak Instruments , year =

Montiel Olea, José Luis and Pflueger, Carolin , journal =. A Robust Test for Weak Instruments , year =. doi:10.1080/00401706.2013.806694 , publisher =

work page doi:10.1080/00401706.2013.806694 2013
[19]

The anti-Democrat diploma: How high school education decreases support for the Democratic Party , year =

Marshall, John , journal =. The anti-Democrat diploma: How high school education decreases support for the Democratic Party , year =

[1] [1]

Diebold and Roberto S

Angrist, Joshua D. and Krueger, Alan B. , journal =. Split-Sample Instrumental Variables Estimates of the Return to Schooling , year =. doi:10.1080/07350015.1995.10524597 , publisher =

work page doi:10.1080/07350015.1995.10524597 1995

[2] [2]

and Jörn-Steffen Pischke , publisher =

Angrist, Joshua D. and Jörn-Steffen Pischke , publisher =. Mostly Harmless Econometrics , year =

[3] [3]

Female Labour Supply and On-the-Job Search: An Empirical Model Estimated Using Complementary Data Sets , year =

Arellano, Manuel and Meghir, Costas , journal =. Female Labour Supply and On-the-Job Search: An Empirical Model Estimated Using Complementary Data Sets , year =. doi:10.2307/2297863 , publisher =

work page doi:10.2307/2297863

[4] [4]

and Pinger, Pia R

van den Berg, Gerard J. and Pinger, Pia R. and Schoch, Johannes , journal =. Instrumental Variable Estimation of the Causal Effect of Hunger Early in Life on Health Later in Life , year =. doi:10.1111/ecoj.12250 , publisher =

work page doi:10.1111/ecoj.12250

[5] [5]

and Evans, William N

Dee, Thomas S. and Evans, William N. , journal =. Teen Drinking and Educational Attainment: Evidence from Two‐Sample Instrumental Variables Estimates , year =. doi:10.1086/344127 , publisher =

work page doi:10.1086/344127

[6] [6]

Two-Sample Instrumental Variables Estimators , year =

Inoue, Atsushi and Solon, Gary , journal =. Two-Sample Instrumental Variables Estimators , year =. doi:10.1162/rest_a_00011 , publisher =

work page doi:10.1162/rest_a_00011

[7] [7]

Two-Sample Instrumental Variables Estimators , year =

Inoue, Atsushi and Solon, Gary , journal =. Two-Sample Instrumental Variables Estimators , year =

[8] [8]

Anders Klevmarken , journal =

N. Anders Klevmarken , journal =. Missing Variables and Two-Stage Least Squares Estimation from More than One Data Set , year =

[9] [9]

Robust inference for the Two-Sample 2SLS estimator , year =

Pacini, David and Windmeijer, Frank , journal =. Robust inference for the Two-Sample 2SLS estimator , year =. doi:10.1016/j.econlet.2016.06.033 , publisher =

work page doi:10.1016/j.econlet.2016.06.033 2016

[10] [10]

, journal =

Zhao, Qingyuan and Wang, Jingshu and Spiller, Wes and Bowden, Jack and Small, Dylan S. , journal =. Two-Sample Instrumental Variable Analyses Using Heterogeneous Samples , year =. doi:10.1214/18-sts692 , publisher =

work page doi:10.1214/18-sts692

[11] [11]

1992 , journal =

Angrist, Joshua D. and Krueger, Alan B. , journal =. The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples , year =. doi:10.1080/01621459.1992.10475212 , publisher =

work page doi:10.1080/01621459.1992.10475212 1992

[12] [12]

, journal =

Zhao, Qingyuan and Wang, Jingshu and Hemani, Gibran and Bowden, Jack and Small, Dylan S. , journal =. Statistical Inference in Two-Sample Summary-Data Mendelian Randomization Using Robust Adjusted Profile Score , year =. doi:10.1214/19-aos1866 , publisher =

work page doi:10.1214/19-aos1866

[13] [13]

Two-Sample Instrumental-Variables Regression with Potentially Weak Instruments , year =

Choi, Jaerim and Shen, Shu , journal =. Two-Sample Instrumental-Variables Regression with Potentially Weak Instruments , year =. doi:10.1177/1536867x19874235 , publisher =

work page doi:10.1177/1536867x19874235

[14] [14]

Weak-Instrument Robust Inference for Two-Sample Instrumental Variables Regression , year =

Choi, Jaerim and Gu, Jiaying and Shen, Shu , journal =. Weak-Instrument Robust Inference for Two-Sample Instrumental Variables Regression , year =. doi:10.1002/jae.2580 , publisher =

work page doi:10.1002/jae.2580

[15] [15]

and Yogo, Motohiro , pages =

Stock, James H. and Yogo, Motohiro , pages =. Testing for Weak Instruments in Linear IV Regression , year =. Identification and Inference for Econometric Models , doi =

[16] [16]

, journal =

Staiger, Douglas and Stock, James H. , journal =. Instrumental Variables Regression with Weak Instruments , year =. doi:10.2307/2171753 , publisher =

work page doi:10.2307/2171753

[17] [17]

The robust F-statistic as a test for weak instruments , year =

Windmeijer, Frank , journal =. The robust F-statistic as a test for weak instruments , year =. doi:10.1016/j.jeconom.2025.105951 , publisher =

work page doi:10.1016/j.jeconom.2025.105951 2025

[18] [18]

A Robust Test for Weak Instruments , year =

Montiel Olea, José Luis and Pflueger, Carolin , journal =. A Robust Test for Weak Instruments , year =. doi:10.1080/00401706.2013.806694 , publisher =

work page doi:10.1080/00401706.2013.806694 2013

[19] [19]

The anti-Democrat diploma: How high school education decreases support for the Democratic Party , year =

Marshall, John , journal =. The anti-Democrat diploma: How high school education decreases support for the Democratic Party , year =