A simple strategy for valid inference in target trial emulations

Mats Julius Stensrud

arxiv: 2604.26471 · v1 · submitted 2026-04-29 · 📊 stat.ME · stat.AP

A simple strategy for valid inference in target trial emulations

Mats Julius Stensrud This is my paper

Pith reviewed 2026-05-07 11:49 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords target trial emulationsample splittingvalid inferencecausal inferenceobservational dataprotocol developmentcomparative effectivenesscoverage guarantees

0 comments

The pith

Sample splitting preserves standard coverage guarantees in target trial emulations even after data-informed protocol choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Target trial emulation improves comparative effectiveness research by making the causal question and assumptions explicit, yet protocols are often revised iteratively after data inspection. This paper proposes a simple sample splitting procedure to restore valid inference. Investigators first explore one data split to select and define a target trial protocol. They then apply the fixed protocol to the second independent split for analysis and inference. The approach mirrors the transition from pilot studies to a phase 3 trial, allowing realistic adaptation while maintaining usual statistical properties.

Core claim

A sample splitting procedure addresses concerns about selective choices and invalid statistical inference in target trial emulations. In the initial split, investigators explore the data to define a target trial protocol. When these choices are made, the target trial protocol is implemented on the second split. Although the investigators made data-informed choices to select the target trial protocol, the inference has the usual coverage guarantees.

What carries the argument

Sample splitting, which separates data exploration for protocol definition from confirmatory analysis on an independent hold-out split to carry the validity argument.

If this is right

Observational data can support more flexible, realistic target trial protocols without invalidating the final causal estimates.
The procedure allows investigators to learn which target trials the data can support before committing to analysis.
Standard coverage properties apply directly to the final estimates as long as the protocol remains fixed after the exploration split.
The method aligns with existing practice in clinical trials where pilot data inform but do not contaminate the confirmatory analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same splitting logic could apply to other iterative selection problems in observational research, such as variable selection before final modeling.
Study designs with larger sample sizes would be needed in practice to retain power after splitting while still enabling protocol exploration.
Transparent reporting of the exploration split and chosen protocol could become standard to document the data-informed steps.
Extensions to survival outcomes or time-varying treatments would require only minor adjustments to the splitting step.

Load-bearing premise

The two data splits are independent, and once the protocol is fixed on the first split, no further data-dependent decisions are made when analyzing the second split.

What would settle it

A simulation in which the nominal coverage probability fails to hold for confidence intervals computed on the second split after protocol selection on the first split would falsify the claim.

read the original abstract

Target trial emulation has improved comparative effectiveness research by making the causal question, assumptions, and analysis plan explicit. However, target trial protocols are usually developed iteratively. After examining the data, investigators revise the protocol to reflect which target trials the observational data can realistically support. While this iterative procedure is part of normal scientific practice, it raises concerns about selective choices and invalid statistical inference. A simple procedure can address these concerns. This procedure is based on sample splitting. In the initial split, investigators explore the data to define a target trial protocol. When these choices are made, the target trial protocol is implemented on the second split. Although the investigators made data-informed choices to select the target trial protocol, the inference has the usual coverage guarantees. The procedure is created to mirror how trialists move from pilot studies to a phase 3 trial. First, they use data from pilots and early-phase trials to learn and decide on a final protocol. Then they implement this protocol and analyze a new set of data in a phase 3 trial.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sample splitting gives a clean way to keep valid inference after data-driven protocol tweaks in target trial emulations.

read the letter

The core idea here is straightforward: split your data once, use the first half to explore and settle on a full target trial protocol (eligibility, treatments, outcomes, analysis plan), then run the fixed protocol on the untouched second half. Because the splits are independent and the protocol is locked in before touching the inference sample, the usual coverage properties carry over. This mirrors the pilot-to-phase-3 sequence exactly, and the abstract states the guarantee plainly without circularity or hidden dependence on the same data for both selection and estimation. The paper does a solid job making the procedure simple and intuitive rather than burying it in extra theory. The analogy to real trial development is helpful for the audience that actually does this work. The main limitation is practical rather than theoretical: you give up half the sample for the final analysis, which matters when data are limited, and it requires discipline to avoid any further tweaks once the second split is in hand. Those are standard costs of splitting and not unique to this setting. The argument holds up under the usual regularity conditions once the protocol is fixed. This is aimed at epidemiologists and health researchers who emulate trials from observational data and want a defensible way to handle iterative protocol development. A reader already familiar with target trials will see the value immediately. It is worth sending to peer review as a practical methodological note that does not overreach.

Referee Report

2 major / 3 minor

Summary. The paper proposes a sample-splitting procedure for target trial emulations: randomly partition the observational data into two independent splits; use the first split to iteratively explore the data and finalize a complete target trial protocol (eligibility criteria, treatment strategies, outcome definitions, and analysis plan); then apply this fixed protocol without further data-dependent modifications to the second split to obtain point estimates and inference. The central claim is that the resulting inference on the second split retains the usual coverage guarantees (e.g., valid confidence intervals) despite the data-informed protocol choices, by direct analogy to the transition from pilot/phase II studies to a pre-specified phase III trial.

Significance. If the argument holds, the procedure offers a practical, low-overhead solution to a recurring methodological concern in comparative effectiveness research: how to permit the iterative, data-driven refinement of target trial protocols that is standard in practice while preserving frequentist validity. By leveraging only the independence of random splits and pre-commitment to a fixed protocol on the hold-out set, the approach requires no new estimators or assumptions beyond those already used in the chosen analysis. It could be adopted immediately in many observational studies and may help reduce selective reporting bias without sacrificing the flexibility investigators need when observational data cannot support every conceivable target trial.

major comments (2)

The coverage claim rests on the second split being analyzed exactly as if the protocol had been pre-specified, with no further data-dependent decisions. The manuscript should explicitly state (perhaps in a dedicated subsection or appendix) the precise regularity conditions required for this conditional validity, including correct model specification for the chosen estimator and strict independence of the two splits; without this, readers cannot verify whether the guarantee survives common practical complications such as missing data handling or subgroup definitions that might inadvertently depend on the hold-out set.
The paper does not appear to contain a formal derivation or proof of the coverage result. While the logic follows from standard properties of independent samples, a short proof sketch (or reference to the relevant theorem on sample splitting) would make the central claim load-bearing and falsifiable rather than intuitive.

minor comments (3)

The abstract and introduction repeatedly use the phrase 'usual coverage guarantees' without defining what estimator or inferential procedure is assumed on the second split; a single clarifying sentence would remove ambiguity.
Consider adding a small numerical illustration or flowchart showing the sequence of steps (split, protocol finalization, analysis) to aid readers unfamiliar with target trial emulation.
The manuscript would benefit from a brief discussion of how the procedure interacts with existing guidelines (e.g., Hernán & Robins target trial framework) and whether any modifications to those guidelines are implied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and for the constructive comments that will strengthen the presentation. We address each major comment below.

read point-by-point responses

Referee: The coverage claim rests on the second split being analyzed exactly as if the protocol had been pre-specified, with no further data-dependent decisions. The manuscript should explicitly state (perhaps in a dedicated subsection or appendix) the precise regularity conditions required for this conditional validity, including correct model specification for the chosen estimator and strict independence of the two splits; without this, readers cannot verify whether the guarantee survives common practical complications such as missing data handling or subgroup definitions that might inadvertently depend on the hold-out set.

Authors: We agree that a dedicated statement of the regularity conditions will improve clarity. In the revision we will add a new subsection (Section 3.3) that lists the precise conditions: (i) the two splits are formed by random partitioning and are therefore independent; (ii) every element of the target-trial protocol—including eligibility criteria, treatment strategies, outcome definitions, the full analysis plan, model specification, missing-data handling, and any subgroup definitions—is finalized on the first split and then applied verbatim to the second split with no further data-dependent modifications; (iii) the estimator chosen for the second split satisfies its standard regularity conditions (correct specification for parametric models, or the usual assumptions for semiparametric or non-parametric estimators). We will also note that practical complications such as missing data are accommodated by pre-specifying the imputation or complete-case procedure inside the protocol on the first split, so that the conditional validity is preserved. revision: yes
Referee: The paper does not appear to contain a formal derivation or proof of the coverage result. While the logic follows from standard properties of independent samples, a short proof sketch (or reference to the relevant theorem on sample splitting) would make the central claim load-bearing and falsifiable rather than intuitive.

Authors: We accept the suggestion. Although the result is a direct consequence of the independence of the splits and the pre-commitment to a fixed protocol, we will insert a short proof sketch in a new appendix. The sketch shows that, conditional on the protocol selected from the first split, the second split constitutes an independent sample to which a non-random protocol is applied; therefore the usual coverage properties of the estimator hold conditionally and hence unconditionally. We will also cite standard results on sample splitting for post-selection inference (e.g., the relevant theorems in the literature on cross-validation and data-driven model selection). revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central argument rests on a sample-splitting procedure: the target trial protocol (eligibility, treatment strategies, outcome definitions, and analysis plan) is developed on one data split, then fixed and applied to an independent second split. The claim that inference on the second split retains standard coverage guarantees follows from the independence of the splits and the absence of further data-dependent decisions, which is a direct application of standard properties of independent samples and pre-specified analyses. No equations, fitted parameters, or derivations are presented that reduce to their own inputs by construction; there are no self-citations invoked as load-bearing uniqueness theorems, no ansatzes smuggled via prior work, and no renaming of known results as novel derivations. The procedure is explicitly analogized to the pilot-to-phase-3 trial transition, which is externally justified and does not create internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard statistical assumptions about random sampling and independence; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption The two data splits are independent and identically distributed samples from the same population.
Required for the second-split analysis to retain nominal coverage after protocol fixation on the first split.

pith-pipeline@v0.9.0 · 5469 in / 1154 out tokens · 44534 ms · 2026-05-07T11:49:18.538755+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

specification sample

A simple strategy for valid inference in target trial emulations Mats J. Stensrud¹ ¹ Institute of Mathematics and Chair of Biostatistics, École polytechnique fédérale de Lausanne (EPFL), Lausanne, Switzerland. Abstract / Summary Target trial emulation has improved comparative effectiveness research by making the causal question, assumptions, and analysis ...

work page 2016
[2]

Let’s Take the Con Out of Econometrics

Leamer EE. Let’s Take the Con Out of Econometrics. American Economic Review. 1983;73(1):31–43. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis All...

work page 1983

[1] [1]

specification sample

A simple strategy for valid inference in target trial emulations Mats J. Stensrud¹ ¹ Institute of Mathematics and Chair of Biostatistics, École polytechnique fédérale de Lausanne (EPFL), Lausanne, Switzerland. Abstract / Summary Target trial emulation has improved comparative effectiveness research by making the causal question, assumptions, and analysis ...

work page 2016

[2] [2]

Let’s Take the Con Out of Econometrics

Leamer EE. Let’s Take the Con Out of Econometrics. American Economic Review. 1983;73(1):31–43. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. Simmons JP, Nelson LD, Simonsohn U. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis All...

work page 1983