Evaluating A Key Instrumental Variable Assumption Using Randomization Tests
Pith reviewed 2026-05-25 09:55 UTC · model grok-4.3
The pith
A randomization test checks if an instrumental variable is significantly closer to as-if random than the exposure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose a nonparametric randomization test that evaluates the validity of the as-if randomized assumption for an instrument by comparing its observed balance or bias to the balance or bias that would have been produced under randomization, and that allows investigators to validly assess if the instrument is significantly closer to being as-if randomized than the exposure.
What carries the argument
Randomization test applied to balance or bias measures, in which the observed difference between instrument and exposure is compared against the distribution obtained by simulating or permuting the instrument's assignment.
If this is right
- Global balance measures can replace separate covariate-by-covariate judgments when assessing IV validity.
- Graphical comparisons of balance across the instrument and exposure become directly interpretable.
- The test remains valid without requiring parametric assumptions about the data-generating process.
- Investigators obtain a single statistical statement on whether the instrument is meaningfully closer to randomization than the exposure.
Where Pith is reading between the lines
- The test could be adapted to other quasi-experimental settings that rely on an as-if randomization premise.
- Integration with existing sensitivity analyses for unmeasured confounding would allow joint assessment of design assumptions.
- Routine use might shift reporting standards in applied health services research toward explicit randomization-based checks.
Load-bearing premise
The randomization distribution for the instrument can be validly simulated or permuted without additional modeling assumptions that would invalidate the comparison to observed balance.
What would settle it
A dataset in which the instrument is known to violate as-if randomization yet the test does not reject the hypothesis that the instrument is no closer to randomization than the exposure would falsify the claim that the procedure validly assesses the assumption.
read the original abstract
Instrumental variable (IV) analyses are becoming common in health services research and epidemiology. Most IV analyses use naturally occurring instruments, such as distance to a hospital. In these analyses, investigators must assume the instrument is as-if randomly assigned. This assumption cannot be tested directly, but it can be falsified. Most falsification tests in the literature compare relative prevalence or bias in observed covariates between the instrument and the exposure. These tests require investigators to make a covariate-by-covariate judgment about the validity of the IV design. Often, only some of the covariates are well-balanced, making it unclear if as-if randomization can be assumed for the instrument across all covariates. We propose an alternative falsification test that compares IV balance or bias to the balance or bias that would have been produced under randomization. A key advantage of our test is that it allows for global balance measures as well as easily interpretable graphical comparisons. Furthermore, our test does not rely on any parametric assumptions and can be used to validly assess if the instrument is significantly closer to being as-if randomized than the exposure. We demonstrate our approach on a recent IV application that uses bed availability in the intensive care unit (ICU) as an instrument for admission to the ICU.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a nonparametric randomization test for falsifying the as-if randomization assumption in instrumental variable analyses. It compares observed balance or bias for the instrument (and separately the exposure) against a simulated randomization distribution, enabling global balance measures and graphical comparisons rather than covariate-by-covariate judgments. The approach is illustrated with an observational IV example using ICU bed availability as the instrument for admission.
Significance. If the randomization distributions can be generated without introducing assumptions stronger than the IV design itself, the method would offer a useful advance over existing falsification tests in health services research and epidemiology by supporting unified global assessments and direct instrument-versus-exposure comparisons. The nonparametric framing is a potential strength.
major comments (2)
- [Abstract] Abstract: The claim that the test 'can be used to validly assess if the instrument is significantly closer to being as-if randomized than the exposure' is load-bearing on the randomization distribution for the instrument accurately reflecting the as-if mechanism in observational data; the abstract provides no specification of the simulation or permutation scheme used in the ICU example, leaving open whether dependence structure or exclusion restrictions are encoded in a way that avoids circularity.
- [Methods] Methods (demonstration section): No simulation studies, derivation of the test statistic's properties, or error-rate guarantees (e.g., type I error control under the null) are referenced, which is load-bearing for the central nonparametric validity claim when the assignment process is observational rather than experimental.
minor comments (1)
- [Abstract] Abstract: The phrase 'global balance measures' is introduced without naming the specific metric(s) employed, which would aid immediate understanding.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below and indicate revisions made to strengthen the presentation of the method.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the test 'can be used to validly assess if the instrument is significantly closer to being as-if randomized than the exposure' is load-bearing on the randomization distribution for the instrument accurately reflecting the as-if mechanism in observational data; the abstract provides no specification of the simulation or permutation scheme used in the ICU example, leaving open whether dependence structure or exclusion restrictions are encoded in a way that avoids circularity.
Authors: We agree the abstract would benefit from greater specificity on this point. In the revised manuscript we have added a concise clause describing the permutation scheme: the randomization distribution is obtained by permuting the instrument assignment while conditioning on the observed covariate vector and preserving the dependence structure among covariates, without conditioning on any post-instrument variables. This encoding is identical to the conditioning used in the original IV analysis, so the comparison between instrument and exposure is not circular; it simply asks which variable is closer to the hypothesized randomization mechanism. The phrase 'validly assess' is understood to be conditional on the correctness of that mechanism, which is the standard interpretation for any falsification test that relies on a posited assignment process. revision: yes
-
Referee: [Methods] Methods (demonstration section): No simulation studies, derivation of the test statistic's properties, or error-rate guarantees (e.g., type I error control under the null) are referenced, which is load-bearing for the central nonparametric validity claim when the assignment process is observational rather than experimental.
Authors: The referee correctly notes that the submitted manuscript contained no Monte Carlo simulations or explicit derivations. The type-I error control nevertheless follows directly from the classical theory of randomization tests: under the null that the observed assignment is drawn from the posited distribution, the p-value is the exact proportion of simulated realizations at least as extreme as the observed statistic, guaranteeing control at the nominal level without parametric assumptions. We have now added a short subsection in the Methods that states this result with reference to the randomization-inference literature and have included a small simulation study in the supplement that confirms nominal type-I error for both the instrument and the exposure under the null. These additions address the concern for observational settings while preserving the nonparametric character of the procedure. revision: yes
Circularity Check
No significant circularity: test defined directly from observed data and randomization distribution
full rationale
The paper proposes a nonparametric falsification test that computes a p-value or comparison by contrasting observed balance statistics against a reference distribution generated via permutation or simulation of the instrument (and separately the exposure) under an as-if randomization null. This procedure is constructed directly from the data and the chosen randomization mechanism; the resulting test statistic and its null distribution are not obtained by fitting a parameter to a subset of the same data and then relabeling that fit as a prediction. No load-bearing self-citation, uniqueness theorem, or ansatz is invoked to justify the core validity claim. The method therefore remains self-contained against external benchmarks and does not reduce to its inputs by definition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A randomization distribution for the instrument can be generated by permutation or simulation from the observed data without further parametric modeling.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.