Estimation of treatment effect in clinical trials of continuous endpoints with retrieved dropouts

Myeongjong Kang; Sangyoon Yi

arxiv: 2604.03863 · v1 · submitted 2026-04-04 · 📊 stat.ME

Estimation of treatment effect in clinical trials of continuous endpoints with retrieved dropouts

Myeongjong Kang , Sangyoon Yi This is my paper

Pith reviewed 2026-05-13 16:52 UTC · model grok-4.3

classification 📊 stat.ME

keywords clinical trialsestimandsretrieved dropoutslikelihood estimationANCOVAprobit modelmissing datatreatment policy

0 comments

The pith

A joint likelihood model with ANCOVA and probit components estimates treatment effects by incorporating retrieved dropout data for both hypothetical and policy strategies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a likelihood-based approach to analyze continuous clinical trial endpoints when some patients discontinue treatment but their data can be retrieved. It jointly models the outcome using analysis of covariance and the discontinuation process using a probit model, allowing estimation for both hypothetical and treatment policy estimands as defined in regulatory guidelines. This unified framework incorporates data from completers, retrieved dropouts, and lost-to-follow-up patients without relying on separate imputation steps. Numerical comparisons indicate that the method reduces bias and variability relative to standard imputation techniques.

Core claim

We propose a likelihood-based model for continuous endpoints that integrates data from all subject categories, including RDs. The approach combines an analysis of covariance formulation with a probit model for treatment discontinuation, enabling explicit formulation of treatment effects for estimands defined using the hypothetical and TP strategies. Estimation is carried out via a computationally efficient maximum likelihood procedure. Numerical studies demonstrate that the proposed method achieves improved bias and variability properties compared with commonly used imputation-based approaches.

What carries the argument

Joint maximum likelihood estimation of an ANCOVA model for the continuous endpoint and a probit model for treatment discontinuation probability.

Load-bearing premise

The joint modeling assumptions, including the probit model for discontinuation probability and the ANCOVA structure for continuous endpoints, correctly represent the data-generating process for both hypothetical and treatment policy estimands.

What would settle it

A simulation or real trial dataset with a known true discontinuation mechanism different from probit, where the proposed estimator shows higher bias or mean squared error than a correctly calibrated imputation approach.

read the original abstract

The estimand framework provides guidance on handling intercurrent events, such as treatment discontinuation, in the analysis of clinical trial responses. Under ICH E9(R1), the treatment policy (TP) strategy incorporates post-discontinuation data to reflect treatment effects in real-world practice. However, many existing approaches focus primarily on imputing missing endpoint values for lost-to-follow-up subjects and do not explicitly model completers, retrieved dropouts (RDs), and lost-to-follow-up subjects within a unified framework. This may obscure the relationship between modeling assumptions and the estimand of interest when RD data are present. We propose a likelihood-based model for continuous endpoints that integrates data from all subject categories, including RDs. The approach combines an analysis of covariance formulation with a probit model for treatment discontinuation, enabling explicit formulation of treatment effects for estimands defined using the hypothetical and TP strategies. Estimation is carried out via a computationally efficient maximum likelihood procedure. Numerical studies demonstrate that the proposed method achieves improved bias and variability properties compared with commonly used imputation-based approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A unified likelihood for retrieved dropouts that ties directly to ICH E9(R1) estimands, but simulations probably test only correct specification.

read the letter

The paper's core move is to put completers, retrieved dropouts, and lost-to-follow-up subjects into one likelihood. It pairs a standard ANCOVA for the continuous endpoint with a probit model for discontinuation probability, then writes out the treatment effect for both the hypothetical and treatment-policy strategies in the same framework. Estimation is ordinary maximum likelihood, which keeps things computationally light. That is the actual novelty relative to the usual two-step imputation routines that treat retrieved dropouts as an afterthought. The numerical studies report lower bias and variability than common imputation methods, which follows if the joint model is correct. The pieces are all standard, so the work is easy to reproduce and does not rest on circular or invented quantities. The soft spot is exactly where the stress-test note points: the simulations appear to generate data from the same probit-plus-ANCOVA structure that is fitted. That shows efficiency gains under correct specification but does not address what happens when the discontinuation mechanism or the endpoint model is misspecified, which is the practical worry under ICH E9(R1). A minor additional gap is the lack of any real-data illustration in the abstract-level description. The paper is aimed at trial statisticians who need an explicit, single-model way to handle retrieved dropouts rather than separate imputation. It is coherent on its own terms and deserves a serious referee who can ask for misspecification checks and perhaps an applied example. I would bring it to a reading group to discuss the estimand implications. I would not cite it in my own work until the robustness evidence is stronger. Send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a likelihood-based framework for estimating treatment effects on continuous endpoints in clinical trials with retrieved dropouts. It combines an ANCOVA model for the endpoint with a probit model for discontinuation probability, allowing explicit formulation of hypothetical and treatment-policy estimands under ICH E9(R1). Parameters are estimated by maximum likelihood, and numerical studies are presented to show reduced bias and variability relative to standard imputation approaches.

Significance. If the joint modeling assumptions are appropriate, the unified likelihood approach offers a coherent alternative to separate imputation steps and can improve efficiency for the treatment-policy estimand by directly incorporating retrieved-dropout data. The computational efficiency of the ML procedure is a practical strength. However, the significance for regulatory use hinges on whether the reported gains hold under realistic departures from the probit-ANCOVA specification.

major comments (2)

[Numerical Studies] Numerical Studies section: the data-generating mechanisms are described as following the proposed ANCOVA-plus-probit model; consequently the reported reductions in bias and variability demonstrate efficiency gains only under correct specification. To support the claim of improved properties for the treatment-policy estimand, the simulations must also include scenarios that violate the probit discontinuation model or the ANCOVA linearity assumption.
[Section 3] Section 3 (Model and Estimands): the mapping from the joint likelihood to the treatment-policy estimand is derived under the assumption that the probit model correctly captures the dependence between discontinuation and the endpoint. No sensitivity analysis or alternative link functions are presented to quantify how departures from this assumption affect the TP estimand.

minor comments (2)

[Abstract] The abstract states that the method 'integrates data from all subject categories' but does not clarify whether the likelihood contribution for lost-to-follow-up subjects is fully specified or treated as censored.
[Model Specification] Notation for the probit threshold parameters and the ANCOVA regression coefficients should be made consistent between the model equations and the simulation tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for strengthening the numerical evidence and robustness checks in our manuscript. We address each major comment below and will incorporate revisions to expand the simulations and add sensitivity analyses.

read point-by-point responses

Referee: Numerical Studies section: the data-generating mechanisms are described as following the proposed ANCOVA-plus-probit model; consequently the reported reductions in bias and variability demonstrate efficiency gains only under correct specification. To support the claim of improved properties for the treatment-policy estimand, the simulations must also include scenarios that violate the probit discontinuation model or the ANCOVA linearity assumption.

Authors: We agree that the existing simulations demonstrate performance under correct specification of the ANCOVA-plus-probit model. To support broader claims regarding the treatment-policy estimand, we will revise the Numerical Studies section by adding new scenarios that violate the probit discontinuation model (e.g., logit link or alternative dependence structures) and the ANCOVA linearity assumption (e.g., quadratic terms or interactions). These additions will quantify bias and variability under misspecification. revision: yes
Referee: Section 3 (Model and Estimands): the mapping from the joint likelihood to the treatment-policy estimand is derived under the assumption that the probit model correctly captures the dependence between discontinuation and the endpoint. No sensitivity analysis or alternative link functions are presented to quantify how departures from this assumption affect the TP estimand.

Authors: We acknowledge that the current derivation and results rely on the probit assumption without explicit sensitivity checks. In revision, we will add a sensitivity analysis to Section 3 (or a dedicated subsection) that evaluates the treatment-policy estimand under alternative link functions, such as logit, and reports the resulting changes in bias and efficiency. This will directly address the impact of departures from the probit specification. revision: yes

Circularity Check

0 steps flagged

No circularity: standard likelihood model with data-driven ML estimation

full rationale

The paper proposes a joint likelihood combining standard ANCOVA for the continuous endpoint with a probit model for discontinuation probability. All parameters are estimated directly from the observed data (completers, RDs, and lost-to-follow-up) via maximum likelihood; the resulting treatment-effect estimates for hypothetical and treatment-policy estimands are therefore functions of the data rather than tautological re-expressions of any fitted input. Numerical studies compare finite-sample bias and variance against imputation baselines under the assumed data-generating process, but no step equates a claimed prediction to its own inputs by construction, nor does any load-bearing premise rest on a self-citation chain. The derivation chain is self-contained and externally falsifiable against real trial data.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach relies on standard statistical assumptions for likelihood estimation in clinical trials; no new entities are postulated.

free parameters (2)

ANCOVA regression coefficients
Estimated from continuous endpoint data to capture treatment and covariate effects.
probit model parameters
Fitted to model the probability of treatment discontinuation.

axioms (2)

domain assumption Continuous endpoints follow a normal distribution conditional on covariates and treatment.
Required for the ANCOVA formulation and likelihood construction.
domain assumption Treatment discontinuation follows a probit model.
Used to jointly model discontinuation with the outcome data.

pith-pipeline@v0.9.0 · 5479 in / 1371 out tokens · 116359 ms · 2026-05-13T16:52:43.880256+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials

ICH . E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials. https://www.ich.org/page/ efficacy-guidelines; 2019

work page 2019
[2]

John Wiley & Sons

Molenberghs G, Kenward M.Missing data in clinical studies. John Wiley & Sons . 2007

work page 2007
[3]

Little RJ, Rubin DB.Statistical analysis with missing data. 793. John Wiley & Sons . 2019

work page 2019
[4]

FletcherC,HeftingN,WrightM,etal.Marking2-yearsofnewthinkinginclinicaltrials:theestimandjourney.Therapeutic Innovation & Regulatory Science2022; 56(4): 637–650

work page
[5]

Statistical methods for handling missing data to align with treatment policy strategy

Wang Y, Tu W, Kim Y, et al. Statistical methods for handling missing data to align with treatment policy strategy. Pharmaceutical statistics2023; 22(4): 650–670

work page
[6]

Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: A framework for relevant, accessible assumptions, and inference via multiple imputation.Journal of biopharmaceutical statistics2013; 23(6): 1352– 1371

work page
[7]

Missing data in clinical trials for weight management.Journal of Biopharmaceutical Statistics2016; 26(1): 30–36

McEvoy BW. Missing data in clinical trials for weight management.Journal of Biopharmaceutical Statistics2016; 26(1): 30–36

work page
[8]

Wharton S, Astrup A, Endahl L, et al. Estimating and reporting treatment effects in clinical trials for weight management: Using estimands to interpret effects of intercurrent events and missing data.International Journal of Obesity2021; 45(5): 923–933

work page
[9]

Impute the missing data using retrieved dropouts.BMC Medical Research Methodology2022; 22(1): 82

Wang S, Hu H. Impute the missing data using retrieved dropouts.BMC Medical Research Methodology2022; 22(1): 82

work page
[10]

Estimation methods for estimands using the treatment policy strategy; a simulation study based on the PIONEER 1 trial.Pharmaceutical Statistics2025; 24(2): e2472

Bell J, Drury T, Mütze T, et al. Estimation methods for estimands using the treatment policy strategy; a simulation study based on the PIONEER 1 trial.Pharmaceutical Statistics2025; 24(2): e2472

work page
[11]

DruryT,BartlettJW,WrightD,KeeneON.Theestimandframeworkandcausalinference:Complementarynotcompeting paradigms.Pharmaceutical statistics2025; 24(5): e70035

work page
[12]

Estimation of treatment policy Estimands for continuous outcomes using off-treatment sequential multiple imputation.Pharmaceutical Statistics2024; 23(6): 1144–1155

Drury T, Abellan JJ, Best N, White IR. Estimation of treatment policy Estimands for continuous outcomes using off-treatment sequential multiple imputation.Pharmaceutical Statistics2024; 23(6): 1144–1155. 12 Kang and Yi

work page
[13]

Causal inference and estimands in clinical trials.Statistics in Biopharmaceutical Research2020; 12(1): 54–67

Lipkovich I, Ratitch B, Mallinckrodt CH. Causal inference and estimands in clinical trials.Statistics in Biopharmaceutical Research2020; 12(1): 54–67

work page
[14]

John Wiley & Sons

Montgomery DC.Design and analysis of experiments. John Wiley & Sons . 2017

work page 2017
[15]

ICH Harmonised Guideline E6 (R2): Guideline for good clinical practice

International Council for Harmonisation . ICH Harmonised Guideline E6 (R2): Guideline for good clinical practice. https: //database.ich.org/sites/default/files/E6_R2_Addendum.pdf; 2016

work page 2016
[16]

Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data.Statistics in Medicine2024; 43(22): 4388–4436

Lipkovich I, Svensson D, Ratitch B, Dmitrienko A. Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data.Statistics in Medicine2024; 43(22): 4388–4436

work page
[17]

Interval estimation for treatment effects using propensity score matching.Statistics in medicine2006; 25(13): 2230–2256

Hill J, Reiter JP. Interval estimation for treatment effects using propensity score matching.Statistics in medicine2006; 25(13): 2230–2256

work page
[18]

Cambridge university press

Davison AC, Hinkley DV.Bootstrap methods and their application. Cambridge university press . 1997

work page 1997
[19]

HesterbergTC.Whatteachersshouldknowaboutthebootstrap:Resamplingintheundergraduatestatisticscurriculum.The american statistician2015; 69(4): 371–386

work page
[20]

Return-to-baseline multiple imputation for missing values in clinical trials.Pharmaceutical statistics2022; 21(3): 641–653

Qu Y, Dai B. Return-to-baseline multiple imputation for missing values in clinical trials.Pharmaceutical statistics2022; 21(3): 641–653

work page
[21]

Multiple imputation after 18+ years.Journal of the American statistical Association1996; 91(434): 473–489

Rubin DB. Multiple imputation after 18+ years.Journal of the American statistical Association1996; 91(434): 473–489

work page
[22]

Recent developments in the prevention and treatment of missing data

Mallinckrodt C, Roger J, Chuang-Stein C, et al. Recent developments in the prevention and treatment of missing data. Therapeutic Innovation & Regulatory Science2014; 48(1): 68–80

work page
[23]

Duloxetine in the acute and long-term treatment of major depressive disorder: A placebo-and paroxetine-controlled trial.European Neuropsychopharmacology 2004; 14(6): 457–470

Detke MJ, Wiltse CG, Mallinckrodt CH, McNamara RK, Demitrack MA, Bitter I. Duloxetine in the acute and long-term treatment of major depressive disorder: A placebo-and paroxetine-controlled trial.European Neuropsychopharmacology 2004; 14(6): 457–470

work page 2004
[24]

Duloxetine in the treatment of depression: A double-blind placebo-controlled comparison with paroxetine.Journal of clinical psychopharmacology2004; 24(4): 389– 399

Goldstein DJ, Lu Y, Detke MJ, Wiltse C, Mallinckrodt C, Demitrack MA. Duloxetine in the treatment of depression: A double-blind placebo-controlled comparison with paroxetine.Journal of clinical psychopharmacology2004; 24(4): 389– 399

work page
[25]

Estimand framework: Delineating what to be estimated with clinical questions of interest in clinical trials

Jin M, Liu G. Estimand framework: Delineating what to be estimated with clinical questions of interest in clinical trials. Contemporary Clinical Trials2020; 96: 106093

work page
[26]

Estimation of Treatment Effect in Clinical Trials of Continuous Endpoints with Retrieved Dropouts

Davidson R, MacKinnon J.Estimation and inference in econometrics. Oxford University Press . 1993. Kang and Yi 13 TABLE 1Performance of each method under three scenarios with higher treatment discontinuation rate in the placebo arm (𝛾𝑋 = −0.25) and sample size𝑁= 200 Method𝛽 TP 𝑋 Bias RMSE Rejection Rate 95%CI Coverage (%) Length Scenario 1: Efficacious dru...

work page 1993

[1] [1]

E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials

ICH . E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials. https://www.ich.org/page/ efficacy-guidelines; 2019

work page 2019

[2] [2]

John Wiley & Sons

Molenberghs G, Kenward M.Missing data in clinical studies. John Wiley & Sons . 2007

work page 2007

[3] [3]

Little RJ, Rubin DB.Statistical analysis with missing data. 793. John Wiley & Sons . 2019

work page 2019

[4] [4]

FletcherC,HeftingN,WrightM,etal.Marking2-yearsofnewthinkinginclinicaltrials:theestimandjourney.Therapeutic Innovation & Regulatory Science2022; 56(4): 637–650

work page

[5] [5]

Statistical methods for handling missing data to align with treatment policy strategy

Wang Y, Tu W, Kim Y, et al. Statistical methods for handling missing data to align with treatment policy strategy. Pharmaceutical statistics2023; 22(4): 650–670

work page

[6] [6]

Carpenter JR, Roger JH, Kenward MG. Analysis of longitudinal trials with protocol deviation: A framework for relevant, accessible assumptions, and inference via multiple imputation.Journal of biopharmaceutical statistics2013; 23(6): 1352– 1371

work page

[7] [7]

Missing data in clinical trials for weight management.Journal of Biopharmaceutical Statistics2016; 26(1): 30–36

McEvoy BW. Missing data in clinical trials for weight management.Journal of Biopharmaceutical Statistics2016; 26(1): 30–36

work page

[8] [8]

Wharton S, Astrup A, Endahl L, et al. Estimating and reporting treatment effects in clinical trials for weight management: Using estimands to interpret effects of intercurrent events and missing data.International Journal of Obesity2021; 45(5): 923–933

work page

[9] [9]

Impute the missing data using retrieved dropouts.BMC Medical Research Methodology2022; 22(1): 82

Wang S, Hu H. Impute the missing data using retrieved dropouts.BMC Medical Research Methodology2022; 22(1): 82

work page

[10] [10]

Estimation methods for estimands using the treatment policy strategy; a simulation study based on the PIONEER 1 trial.Pharmaceutical Statistics2025; 24(2): e2472

Bell J, Drury T, Mütze T, et al. Estimation methods for estimands using the treatment policy strategy; a simulation study based on the PIONEER 1 trial.Pharmaceutical Statistics2025; 24(2): e2472

work page

[11] [11]

DruryT,BartlettJW,WrightD,KeeneON.Theestimandframeworkandcausalinference:Complementarynotcompeting paradigms.Pharmaceutical statistics2025; 24(5): e70035

work page

[12] [12]

Estimation of treatment policy Estimands for continuous outcomes using off-treatment sequential multiple imputation.Pharmaceutical Statistics2024; 23(6): 1144–1155

Drury T, Abellan JJ, Best N, White IR. Estimation of treatment policy Estimands for continuous outcomes using off-treatment sequential multiple imputation.Pharmaceutical Statistics2024; 23(6): 1144–1155. 12 Kang and Yi

work page

[13] [13]

Causal inference and estimands in clinical trials.Statistics in Biopharmaceutical Research2020; 12(1): 54–67

Lipkovich I, Ratitch B, Mallinckrodt CH. Causal inference and estimands in clinical trials.Statistics in Biopharmaceutical Research2020; 12(1): 54–67

work page

[14] [14]

John Wiley & Sons

Montgomery DC.Design and analysis of experiments. John Wiley & Sons . 2017

work page 2017

[15] [15]

ICH Harmonised Guideline E6 (R2): Guideline for good clinical practice

International Council for Harmonisation . ICH Harmonised Guideline E6 (R2): Guideline for good clinical practice. https: //database.ich.org/sites/default/files/E6_R2_Addendum.pdf; 2016

work page 2016

[16] [16]

Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data.Statistics in Medicine2024; 43(22): 4388–4436

Lipkovich I, Svensson D, Ratitch B, Dmitrienko A. Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data.Statistics in Medicine2024; 43(22): 4388–4436

work page

[17] [17]

Interval estimation for treatment effects using propensity score matching.Statistics in medicine2006; 25(13): 2230–2256

Hill J, Reiter JP. Interval estimation for treatment effects using propensity score matching.Statistics in medicine2006; 25(13): 2230–2256

work page

[18] [18]

Cambridge university press

Davison AC, Hinkley DV.Bootstrap methods and their application. Cambridge university press . 1997

work page 1997

[19] [19]

HesterbergTC.Whatteachersshouldknowaboutthebootstrap:Resamplingintheundergraduatestatisticscurriculum.The american statistician2015; 69(4): 371–386

work page

[20] [20]

Return-to-baseline multiple imputation for missing values in clinical trials.Pharmaceutical statistics2022; 21(3): 641–653

Qu Y, Dai B. Return-to-baseline multiple imputation for missing values in clinical trials.Pharmaceutical statistics2022; 21(3): 641–653

work page

[21] [21]

Multiple imputation after 18+ years.Journal of the American statistical Association1996; 91(434): 473–489

Rubin DB. Multiple imputation after 18+ years.Journal of the American statistical Association1996; 91(434): 473–489

work page

[22] [22]

Recent developments in the prevention and treatment of missing data

Mallinckrodt C, Roger J, Chuang-Stein C, et al. Recent developments in the prevention and treatment of missing data. Therapeutic Innovation & Regulatory Science2014; 48(1): 68–80

work page

[23] [23]

Duloxetine in the acute and long-term treatment of major depressive disorder: A placebo-and paroxetine-controlled trial.European Neuropsychopharmacology 2004; 14(6): 457–470

Detke MJ, Wiltse CG, Mallinckrodt CH, McNamara RK, Demitrack MA, Bitter I. Duloxetine in the acute and long-term treatment of major depressive disorder: A placebo-and paroxetine-controlled trial.European Neuropsychopharmacology 2004; 14(6): 457–470

work page 2004

[24] [24]

Duloxetine in the treatment of depression: A double-blind placebo-controlled comparison with paroxetine.Journal of clinical psychopharmacology2004; 24(4): 389– 399

Goldstein DJ, Lu Y, Detke MJ, Wiltse C, Mallinckrodt C, Demitrack MA. Duloxetine in the treatment of depression: A double-blind placebo-controlled comparison with paroxetine.Journal of clinical psychopharmacology2004; 24(4): 389– 399

work page

[25] [25]

Estimand framework: Delineating what to be estimated with clinical questions of interest in clinical trials

Jin M, Liu G. Estimand framework: Delineating what to be estimated with clinical questions of interest in clinical trials. Contemporary Clinical Trials2020; 96: 106093

work page

[26] [26]

Estimation of Treatment Effect in Clinical Trials of Continuous Endpoints with Retrieved Dropouts

Davidson R, MacKinnon J.Estimation and inference in econometrics. Oxford University Press . 1993. Kang and Yi 13 TABLE 1Performance of each method under three scenarios with higher treatment discontinuation rate in the placebo arm (𝛾𝑋 = −0.25) and sample size𝑁= 200 Method𝛽 TP 𝑋 Bias RMSE Rejection Rate 95%CI Coverage (%) Length Scenario 1: Efficacious dru...

work page 1993