Complier General Causal Effect in Randomized Controlled Trials with One-Sided Noncompliance

Jiwei Zhao; Yanyuan Ma; Yin Tang

arxiv: 2510.14142 · v2 · submitted 2025-10-15 · 📊 stat.ME

Complier General Causal Effect in Randomized Controlled Trials with One-Sided Noncompliance

Yin Tang , Yanyuan Ma , Jiwei Zhao This is my paper

Pith reviewed 2026-05-18 05:43 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal inferencerandomized controlled trialsone-sided noncompliancecomplier causal effectsemiparametric efficiencynuisance estimationmachine learning

0 comments

The pith

In RCTs with one-sided noncompliance, the complier general causal effect is the identifiable target and can be estimated at the semiparametric efficiency bound when nuisance functions converge only in L2 norm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes randomized controlled trials where some participants assigned to treatment do not receive it, under the one-sided noncompliance structure. It shows through likelihood-based identifiability that the complier general causal effect becomes the appropriate primary target for inference. Two estimators follow: a straightforward parametric one and a more efficient version that attains the semiparametric efficiency bound. The key theoretical result is that efficiency holds as long as the nuisance estimators are consistent in L2 norm, without any further rate requirements. This property permits direct use of flexible modern machine learning procedures for the nuisances while preserving asymptotic efficiency.

Core claim

Under a likelihood model for an RCT with one-sided noncompliance, the complier general causal effect is identified as the central estimand; a simple estimator requires no nonparametric components while an efficient estimator reaches the semiparametric efficiency bound provided the nuisance estimators converge in L2 norm, with no restrictions on their convergence rates.

What carries the argument

The complier general causal effect (CGCE), which isolates the causal effect among participants who comply with treatment assignment under the one-sided noncompliance assumption and serves as the identified target in the likelihood analysis.

If this is right

The simple estimator can be implemented without any nonparametric or machine learning steps.
The efficient estimator remains asymptotically efficient even when modern machine learning methods are used for the nuisance functions.
The rate-free property expands the set of applicable nuisance estimators while still guaranteeing efficiency.
Simulation studies and real-data applications demonstrate practical performance relative to existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same L2-norm-only condition may extend the reach of efficient estimation to other noncompliance patterns beyond the one-sided case.
Policy evaluations of interventions with imperfect uptake could adopt the CGCE directly rather than intent-to-treat effects.
Further work could test whether the approach remains stable when the one-sided noncompliance assumption is mildly violated.

Load-bearing premise

The likelihood model correctly identifies the complier general causal effect under the one-sided noncompliance assumption.

What would settle it

In a Monte Carlo study, generate data from a correctly specified one-sided noncompliance model, fit the efficient CGCE estimator with nuisance functions that converge slowly in L2 norm but are consistent, and check whether the estimator attains the semiparametric efficiency bound.

read the original abstract

A randomized controlled trial (RCT) is widely regarded as the gold standard for assessing the causal effect of a treatment or intervention, assuming perfect implementation. In practice, however, randomization can be compromised for various reasons, such as one-sided noncompliance. In this paper, we first systematically study the likelihood-based identifiability in an RCT with one-sided noncompliance. This foundational analysis naturally gives rise to the complier general causal effect (CGCE) as the primary estimand. We further develop two estimators for the CGCE: a simple estimator that requires no nonparametric procedures, and an efficient estimator that achieves the semiparametric efficiency bound. Our theoretical analysis shows that, achieving semiparametric efficiency requires only the nuisance estimators to converge in $L_2$-norm, with no restriction on their convergence rates. This rate-free property opens the door to employing many more modern machine learning methods while still guaranteeing efficiency. Comprehensive simulation studies and a real data application are conducted to illustrate the proposed methods and to compare them with existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a new CGCE estimand from likelihood identifiability in one-sided noncompliance RCTs and claims rate-free semiparametric efficiency, but the efficiency result looks fragile without cross-fitting or extra structure.

read the letter

The main thing here is a new estimand, the complier general causal effect, that comes out of a systematic likelihood identifiability study for RCTs with one-sided noncompliance. The authors treat this as the natural target rather than starting from a parameter and working backward. They also give two estimators: a simple one that needs no nonparametric steps, and an efficient one that they say hits the semiparametric bound whenever the nuisance functions converge in L2 norm, with no rate conditions at all. That last part is the bold claim and the main reason someone might pick up the paper. If it holds, it would let people use more flexible machine learning tools without the usual rate headaches. The simple estimator is a practical plus, and the simulations plus real-data example show how the methods compare to standard approaches. The identifiability analysis itself looks like a clean way to motivate the estimand. The soft spot is the efficiency claim. In the usual semiparametric expansion the remainder is a product of two L2 errors, and that product only becomes negligible under L2 consistency alone if the estimator is Neyman-orthogonal and uses cross-fitting, or if the one-sided noncompliance structure makes one nuisance parametric. The stress-test note flags exactly this issue. The abstract says the result holds under L2 convergence only, but without the full expansion or proof it is not clear whether they have a special cancellation or are implicitly assuming something stronger. If the derivation skips cross-fitting and does not exploit the structure to control the remainder, the rate-free guarantee probably does not go through for arbitrary slow nuisance rates. This paper is aimed at statisticians and applied researchers who work on causal effects in trials or policy settings where noncompliance is one-sided. A reader who wants a fresh estimand and is willing to check the technical details on efficiency would get value from it. It deserves a serious referee because the practical problem is common, the identifiability step is new, and the simple estimator could be useful even if the efficiency bound needs more work.

Referee Report

2 major / 2 minor

Summary. The manuscript studies identifiability in RCTs with one-sided noncompliance via a likelihood-based analysis, which leads to the complier general causal effect (CGCE) as the primary estimand. It proposes a simple estimator requiring no nonparametric methods and an efficient estimator claimed to attain the semiparametric efficiency bound. The central theoretical result is that this efficiency holds whenever nuisance estimators converge in L2 norm, with no further restrictions on convergence rates. The claims are illustrated with simulations and a real-data application.

Significance. If the rate-free efficiency property is rigorously established, the work would offer a useful advance for causal inference under noncompliance by permitting flexible machine-learning nuisances without rate conditions. The systematic likelihood identifiability analysis provides a clear foundation for the CGCE estimand and distinguishes it from standard complier average causal effect approaches.

major comments (2)

The efficiency claim (abstract and theoretical analysis section) states that the efficient CGCE estimator attains the semiparametric bound under L2 convergence of nuisances alone. Standard semiparametric expansions for such functionals produce a remainder that is typically the product of two L2 errors; for this to be o_p(n^{-1/2}) under arbitrarily slow L2 rates, the construction must either employ cross-fitting or exploit the one-sided noncompliance structure to render at least one nuisance component parametric or exactly orthogonal. The manuscript does not appear to detail which of these devices is used, raising a load-bearing concern for the rate-free property.
The identifiability analysis (first paragraph of the abstract and the likelihood model section) treats the CGCE as the natural estimand under the one-sided noncompliance assumption. It is not immediately clear whether this follows directly from the observed-data likelihood without additional parametric restrictions on the outcome or compliance models; a concrete verification that the CGCE is identified solely from the observed likelihood (without implicit restrictions that would make the result circular) would strengthen the foundational claim.

minor comments (2)

The abstract and introduction would benefit from an explicit statement of the observed-data likelihood and the precise definition of the CGCE (e.g., as an integral or conditional expectation) to make the transition from identifiability to estimation fully transparent.
Simulation results comparing the simple and efficient estimators would be clearer if the tables reported both bias and coverage probabilities alongside the efficiency comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments highlight important points regarding the clarity of our theoretical claims. We address each major comment below and have revised the manuscript accordingly to improve exposition and rigor.

read point-by-point responses

Referee: The efficiency claim (abstract and theoretical analysis section) states that the efficient CGCE estimator attains the semiparametric bound under L2 convergence of nuisances alone. Standard semiparametric expansions for such functionals produce a remainder that is typically the product of two L2 errors; for this to be o_p(n^{-1/2}) under arbitrarily slow L2 rates, the construction must either employ cross-fitting or exploit the one-sided noncompliance structure to render at least one nuisance component parametric or exactly orthogonal. The manuscript does not appear to detail which of these devices is used, raising a load-bearing concern for the rate-free property.

Authors: We thank the referee for this insightful comment. Our construction exploits the one-sided noncompliance structure, which renders one key nuisance component (the compliance probability under control) exactly identified and free of estimation error in the relevant expansion term. This structural feature ensures the remainder term is controlled by a single L2 error rather than a product, permitting the rate-free property without cross-fitting. We acknowledge that this mechanism was not stated with sufficient explicitness in the original theoretical analysis section. We have revised that section to include a dedicated paragraph detailing the structural orthogonality and the corresponding expansion. revision: yes
Referee: The identifiability analysis (first paragraph of the abstract and the likelihood model section) treats the CGCE as the natural estimand under the one-sided noncompliance assumption. It is not immediately clear whether this follows directly from the observed-data likelihood without additional parametric restrictions on the outcome or compliance models; a concrete verification that the CGCE is identified solely from the observed likelihood (without implicit restrictions that would make the result circular) would strengthen the foundational claim.

Authors: We appreciate the referee's request for greater clarity on this foundational point. The CGCE is identified directly from the observed-data likelihood under the one-sided noncompliance assumption alone, without imposing parametric forms on the outcome or compliance models. The likelihood factorization separates the compliance and outcome components in a manner that isolates the CGCE as a functional of the observed distribution. We have added an explicit verification subsection in the likelihood model section that derives the identification result step by step from the observed likelihood, confirming no additional restrictions are required. revision: yes

Circularity Check

0 steps flagged

No significant circularity; CGCE derived from identifiability analysis and efficiency bound shown as independent theoretical property

full rationale

The paper performs a likelihood-based identifiability study under one-sided noncompliance to define the CGCE as the primary estimand, then separately develops estimators and proves the semiparametric efficiency bound holds under L2-norm convergence of nuisances with no rate restrictions. This derivation chain is self-contained against external benchmarks: the identifiability step uses the RCT structure and noncompliance assumption without presupposing the estimator form, and the efficiency result is presented as a general semiparametric property rather than a fit or self-citation reduction. No load-bearing self-citations, fitted inputs renamed as predictions, or ansatzes smuggled via prior work are evident in the provided abstract and claims. The rate-free efficiency claim may warrant separate correctness scrutiny regarding remainder terms, but it does not reduce to circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard RCT assumptions plus the one-sided noncompliance structure that enables likelihood identifiability of the CGCE; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption One-sided noncompliance: participants assigned to control never receive treatment while those assigned to treatment may comply or not.
This structural assumption is invoked to establish likelihood-based identifiability of the CGCE as the primary estimand.

pith-pipeline@v0.9.0 · 5710 in / 1260 out tokens · 48805 ms · 2026-05-18T05:43:26.358347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

achieving semiparametric efficiency requires only the nuisance estimators to converge in L2-norm, with no restriction on their convergence rates

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.