Improving the Efficiency of Subgroup Analysis in Randomized Controlled Trials with TMLE

Jens Tarp; Mark van der Laan; Maya Petersen; Nerissa Nance; Rachael Phillips; Sky Qiu

arxiv: 2605.15483 · v1 · pith:COEEWI3Cnew · submitted 2026-05-14 · 📊 stat.ME · stat.ML

Improving the Efficiency of Subgroup Analysis in Randomized Controlled Trials with TMLE

Sky Qiu , Nerissa Nance , Rachael Phillips , Jens Tarp , Maya Petersen , Mark van der Laan This is my paper

Pith reviewed 2026-05-19 14:19 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords TMLEsubgroup analysisrandomized controlled trialsinformation borrowingtargeted maximum likelihood estimationdouble robustnessefficiency improvement

0 comments

The pith

Targeted maximum likelihood estimators borrow information from non-subgroup trial participants to improve precision of subgroup-specific treatment effect estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes two variants of targeted maximum likelihood estimation that use data from participants outside a defined subgroup to augment estimates within the subgroup. This approach stays entirely within the randomized controlled trial, preserving its bias protections from randomization and consistent data collection. If successful, it addresses the common problem of underpowered subgroup analyses in trials, where small groups like those defined by race or ethnicity often yield imprecise results. The methods avoid the need for external real-world evidence, which can introduce new biases. In the LEADER trial example, they produce narrower confidence intervals for treatment effects on major adverse cardiac events in Black and Asian participants.

Core claim

The authors introduce TMLE with pooled regression (TMLE-PR) and Adaptive TMLE (A-TMLE) that share information across subgroup and non-subgroup participants in an RCT to estimate subgroup-specific treatment effects more efficiently. These estimators capitalize on the trial's randomization to avoid bias while increasing statistical power for small subgroups.

What carries the argument

Targeted Maximum Likelihood Estimation with pooled or adaptive regression that incorporates data from the full trial population to target subgroup-specific parameters.

If this is right

Subgroup-specific absolute risk reductions can be estimated with greater precision, as shown in the LEADER trial where estimates for Black and Asian groups had confidence intervals excluding the null.
The approach supports regulatory goals for equitable labeling and access by providing more reliable subgroup inferences.
Information borrowing occurs without external data, maintaining the internal validity of the randomized trial.
Double robustness protects against bias if either the outcome or propensity score model is correct.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These methods could extend to time-to-event outcomes or other trial designs where subgroup sizes are limited.
Applying A-TMLE might systematically reduce the sample size needed for detecting effects in minority subgroups.
Future work could compare these to other borrowing methods like Bayesian hierarchical models within the same trial framework.

Load-bearing premise

The nuisance parameter estimators for the outcome regression or propensity score must be correctly specified, or the double robustness property of TMLE must hold to prevent bias from the borrowing.

What would settle it

A simulation experiment in which the nuisance estimators are deliberately misspecified and the resulting subgroup estimates show bias compared to the known true effects.

Figures

Figures reproduced from arXiv: 2605.15483 by Jens Tarp, Mark van der Laan, Maya Petersen, Nerissa Nance, Rachael Phillips, Sky Qiu.

**Figure 2.** Figure 2: Estimated risk differences comparing liraglutide versus placebo for the primary endpoint in the [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗

read the original abstract

Subgroup analyses within randomized controlled trials are often underpowered due to limited sample sizes. We address this challenge by leveraging trial participants outside the subgroup of interest to augment estimation within the subgroup. Specifically, we study two Targeted Maximum Likelihood Estimators (TMLEs) that borrow information from non-subgroup participants within the same trial: a TMLE with pooled regression (TMLE-PR) and an Adaptive Targeted Maximum Likelihood Estimator (A-TMLE). Both estimators enable information sharing without relying on any external real-world data, thereby capitalizing on key strengths of the trial: most importantly, the protection against bias afforded by the randomized treatment, but also harmonized data collection, and consistent treatment and outcome definitions. The general strategy proposed here directly advances the priorities of key regulatory agencies, including the FDA, by improving the precision of subgroup-specific treatment effect estimates without introducing external sources of bias, thereby facilitating rigorous inference to support equitable labeling, access, and post-market evaluation. In a case study based on analysis of data from a cardiovascular outcome trial (LEADER, NCT01179048), we estimate the risk reduction of major adverse cardiac events (MACE) under liraglutide treatment among Black and Asian subgroups -- each comprising less than 10\% of the trial population -- using the proposed estimators that borrow information from the remainder of the trial. Using A-TMLE, in particular, we find estimated absolute MACE risk reductions of 1.6, 1.5, and 1.5 percentage points among Asian participants and 2.1, 2.0, and 2.1 percentage points among Black participants at 365, 540, and 730 days, respectively, with 95\% confidence intervals excluding the null at each time point.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts TMLE to borrow within-RCT data for subgroup estimates, but double robustness may not hold cleanly under pooled heterogeneous outcomes.

read the letter

The main takeaway is that the authors develop two TMLE-based estimators to improve precision in subgroup analyses by borrowing information from other trial participants, all while relying only on the randomized trial data. This targets a real issue in RCTs where subgroups like racial minorities often have small samples, and the LEADER example illustrates it with estimates that exclude the null. They introduce TMLE-PR using pooled regression and A-TMLE with adaptive targeting. Both are meant to share information without external data, preserving the bias protection from randomization and consistent definitions within the trial. This aligns with regulatory interests in better subgroup inference for labeling and access. The work does well in framing the problem practically and providing a concrete application to the LEADER trial data for MACE outcomes in Black and Asian groups, each under 10% of the sample. The reported absolute risk reductions of about 1.5 to 2.1 percentage points at different time points show the method yielding usable numbers. The potential issue is whether double robustness still holds when nuisance models are estimated on pooled data that includes heterogeneous outcome distributions. If the regression model misses the differences between subgroups, the targeting step might bias the subgroup-specific estimate away from the truth. The paper does not appear to include a dedicated proof or simulation for this scenario, which leaves the central claim a bit exposed. Details on exact model choices and any robustness checks are not prominent in the abstract, making it difficult to assess fully from what's given. The citation pattern seems standard for TMLE work, but more on prior subgroup methods would help context. This is the kind of paper that methodologists in biostatistics or causal inference for trials would want to see. It could be valuable for readers dealing with similar precision problems in subgroup effects. I would send it to peer review to get expert input on the theoretical properties and to suggest expansions like simulations under misspecification.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two Targeted Maximum Likelihood Estimators (TMLE-PR with pooled regression and A-TMLE) to improve efficiency of subgroup-specific treatment effect estimation in RCTs by borrowing information from non-subgroup participants within the same trial. The approach capitalizes on randomization for bias protection and is illustrated via a LEADER trial case study estimating absolute MACE risk reductions for Black and Asian subgroups (each <10% of the sample), with A-TMLE yielding estimates of 1.6–2.1 percentage points and 95% CIs excluding the null at multiple time points.

Significance. If the double-robustness and consistency claims hold under pooled nuisance estimation, the work provides a practical internal-data strategy for increasing precision in underpowered subgroup analyses, directly supporting FDA priorities on equitable labeling without external data sources. The concrete LEADER estimates and emphasis on harmonized trial data are strengths that could influence regulatory and trial-design practices.

major comments (2)

[§3.2] §3.2 (A-TMLE definition) and the double-robustness argument: the targeting step that incorporates the full-sample outcome regression does not explicitly verify that the influence function remains unbiased for the subgroup-specific parameter when the working model cannot capture heterogeneity between subgroup and non-subgroup conditional outcome distributions; this is load-bearing for the central claim that borrowing does not introduce bias.
[§5.1] §5.1 (Simulation design): the reported efficiency gains and coverage probabilities are shown only under correctly specified or mildly misspecified pooled regressions; no simulation isolates the case where the outcome model is misspecified precisely because of subgroup heterogeneity, leaving the skeptic's concern about violation of the usual TMLE guarantee untested.

minor comments (2)

[Table 1] Table 1: the column headers for the three time points should explicitly label the follow-up days (365, 540, 730) to match the text description of the LEADER results.
[§2] §2 (Notation): the definition of the subgroup indicator S should be introduced earlier and used consistently when defining the target parameter ψ_S to avoid ambiguity in the adaptive weighting step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation.

read point-by-point responses

Referee: [§3.2] §3.2 (A-TMLE definition) and the double-robustness argument: the targeting step that incorporates the full-sample outcome regression does not explicitly verify that the influence function remains unbiased for the subgroup-specific parameter when the working model cannot capture heterogeneity between subgroup and non-subgroup conditional outcome distributions; this is load-bearing for the central claim that borrowing does not introduce bias.

Authors: We appreciate the referee drawing attention to this point. The manuscript establishes double robustness for the subgroup-specific parameter under the randomization assumption, which ensures that the efficient influence function for the target parameter remains unbiased even if the initial pooled outcome regression is misspecified. However, we agree that an explicit verification for the heterogeneity case would improve clarity. In the revised manuscript we have expanded the derivation in Section 3.2 and added a short proof in Appendix A showing that the influence function for the subgroup parameter is unbiased under randomization whenever the treatment mechanism is known (as it is by design), regardless of whether the pooled outcome regression captures subgroup heterogeneity. revision: yes
Referee: [§5.1] §5.1 (Simulation design): the reported efficiency gains and coverage probabilities are shown only under correctly specified or mildly misspecified pooled regressions; no simulation isolates the case where the outcome model is misspecified precisely because of subgroup heterogeneity, leaving the skeptic's concern about violation of the usual TMLE guarantee untested.

Authors: The referee correctly notes that the existing simulations do not isolate misspecification arising specifically from subgroup heterogeneity. We have added a new simulation scenario in the revised Section 5.1 in which the data-generating process includes conditional outcome distributions that differ systematically between the subgroup and the remainder of the sample, rendering any pooled regression misspecified for the subgroup. In this setting the A-TMLE continues to exhibit nominal coverage and efficiency gains relative to the subgroup-only estimator, consistent with the theoretical results. We have also added a brief discussion of these findings. revision: yes

Circularity Check

0 steps flagged

Estimators defined on full RCT data; no reduction of subgroup effects to fitted inputs by construction

full rationale

The paper introduces TMLE-PR and A-TMLE as extensions that pool the full trial sample under known randomization to improve precision for subgroup-specific effects. These constructions are explicit design choices that use the entire data by definition, rather than deriving a new result that collapses back to the inputs. No equations or steps are shown to rename a fitted parameter as a prediction, import a uniqueness theorem from the authors' prior work as an external fact, or smuggle an ansatz via self-citation. The double-robustness claim rests on standard TMLE properties in RCTs, which are treated as established rather than proven anew here. This yields a minor self-citation score only because van der Laan is an author and TMLE originator, but the load-bearing argument does not reduce to an unverified self-reference. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard RCT randomization and TMLE double-robustness properties without new free parameters or invented entities.

axioms (2)

domain assumption Randomization protects against confounding between treatment assignment and outcome.
Invoked to justify borrowing information from outside the subgroup without bias.
standard math TMLE is doubly robust: consistent if either the outcome model or the treatment mechanism is correctly specified.
Standard TMLE property used to support unbiased subgroup estimation.

pith-pipeline@v0.9.0 · 5870 in / 1206 out tokens · 52780 ms · 2026-05-19T14:19:23.086334+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

[1]

R package version 1.4.2

URL https://doi.org/10.5281/zenodo.1342293. R package version 1.4.2. Jeremy R Coyle, Nima S Hejazi, Rachael V Phillips, Lars WP van der Laan, and Mark J van der Laan. hal9001: The scalable highly adaptive lasso,

work page doi:10.5281/zenodo.1342293
[2]

Jerome H Friedman

doi: 10.18637/jss.v033.i01. Jerome H Friedman. Multivariate adaptive regression splines.The annals of statistics, 19(1):1–67,

work page doi:10.18637/jss.v033.i01
[3]

URLhttps://doi.org/10

doi: 10.21105/joss.02526. URLhttps://doi.org/10. 21105/joss.02526. Nicholas C Henderson, Thomas A Louis, Chenguang Wang, and Ravi Varadhan. Bayesian analysis of heterogeneous treatment effects for patient-centered outcomes research.Health Services and Outcomes Research Methodology, 16(4):213–233,

work page doi:10.21105/joss.02526
[4]

David M Kent, Peter M Rothwell, John PA Ioannidis, Doug G Altman, and Rodney A Hayward

doi: 10.18637/jss.v070.i04. David M Kent, Peter M Rothwell, John PA Ioannidis, Doug G Altman, and Rodney A Hayward. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal.Trials, 11(1):85,

work page doi:10.18637/jss.v070.i04
[5]

32 Xi Lin, Jens Magelund Tarp, and Robin J Evans

doi: 10.18637/jss.v110.i06. 32 Xi Lin, Jens Magelund Tarp, and Robin J Evans. Data fusion for efficiency gain in ate estimation: a practical review with simulations.arXiv preprint arXiv:2407.01186,

work page doi:10.18637/jss.v110.i06
[6]

Highly adaptive empirical risk minimization with principal components.arXiv preprint arXiv:2603.18204,

Carlos Garc´ ıa Meixide, Mingxun Wang, Alejandro Schuler, and Mark J van der Laan. Highly adaptive empirical risk minimization with principal components.arXiv preprint arXiv:2603.18204,

work page arXiv
[7]

On a general class of orthogonal learners for the estimation of heterogeneous treat- ment effects.arXiv preprint arXiv:2303.12687, 2023

Pawel Morzywolek, Johan Decruyenaere, and Stijn Vansteelandt. On a general class of orthogonal learners for the estimation of heterogeneous treatment effects.arXiv preprint arXiv:2303.12687,

work page arXiv
[8]

An estimator-robust design for augmenting randomized controlled trial with external real-world data.arXiv preprint arXiv:2501.17835,

Sky Qiu, Jens Tarp, Andrew Mertens, and Mark van der Laan. An estimator-robust design for augmenting randomized controlled trial with external real-world data.arXiv preprint arXiv:2501.17835,

work page arXiv
[9]

Lassoed tree boosting.arXiv preprint arXiv:2205.10697, 2022a

Alejandro Schuler, Yi Li, and Mark van der Laan. Lassoed tree boosting.arXiv preprint arXiv:2205.10697, 2022a. Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher, Critical Path for Alzheimer’s Dis- ease, Alzheimer’s Disease Neuroimaging Initiative, and Alzheimer’s Disease Cooperative Study. Increasing the efficiency of randomized trial ...

work page arXiv
[10]

Adaptive debiased machine learning using data-driven model selection techniques.arXiv preprint arXiv:2307.12544,

Lars van der Laan, Marco Carone, Alex Luedtke, and Mark van der Laan. Adaptive debiased machine learning using data-driven model selection techniques.arXiv preprint arXiv:2307.12544,

work page arXiv
[11]

Highly Adaptive Principal Component Regression

Mingxun Wang, Alejandro Schuler, Mark van der Laan, and Carlos Garc´ ıa Meixide. Highly adaptive principal component regression.arXiv preprint arXiv:2602.10613,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Unified implementation and comparison of bayesian shrinkage methods for treatment effect estimation in subgroups.arXiv preprint arXiv:2603.21967,

34 Marcel Wolbers, Miriam Pedrera G´ omez, Alex Ocampo, and Isaac Gravestock. Unified implementation and comparison of bayesian shrinkage methods for treatment effect estimation in subgroups.arXiv preprint arXiv:2603.21967,

work page arXiv
[13]

Investigating Targeting Strategies and Truncation in TMLE for the Average Treatment Effect under Practical Positivity Violations

doi: 10.18637/jss.v077.i01. Yichen Xu, Susan Gruber, and Mark J van der Laan. Investigating targeting strategies and truncation in tmle for the average treatment effect under practical positivity violations.arXiv preprint arXiv:2604.20059,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18637/jss.v077.i01
[14]

Also, E0 S{mP (W)−m 0(W)} =E 0 S{QP (1, W,1)−Q 0(1, W,1)} −E 0 S{QP (1, W,0)−Q 0(1, W,0)}

gP (0|1, W) Y−Q P (1, W, A) =E 0 S g0(1|1, W) gP (1|1, W) Q0(1, W,1)−Q P (1, W,1) −E 0 S g0(0|1, W) gP (0|1, W) Q0(1, W,0)−Q P (1, W,0) . Also, E0 S{mP (W)−m 0(W)} =E 0 S{QP (1, W,1)−Q 0(1, W,1)} −E 0 S{QP (1, W,0)−Q 0(1, W,0)} . Combining the two gives RNΨ(P, P0) =E 0 S 1− g0(1|1, W) gP (1|1, W) QP (1, W,1)−Q 0(1, W,1) −E 0 S 1− g0(0|1, W) gP (0|1, W) QP...

work page 2025
[15]

andsl3(Coyle et al., 2021)) with a 3-fold cross-validation scheme. The Super Learner library includes generalized linear models (Nelder and Wedderburn, 1972), multivariate adaptive regression splines (Friedman, 1991), and generalized additive models (Hastie and Tibshirani, 1986; Hastie, 2017). The final learner is selected based on cross-validated log- li...

work page 2021

[1] [1]

R package version 1.4.2

URL https://doi.org/10.5281/zenodo.1342293. R package version 1.4.2. Jeremy R Coyle, Nima S Hejazi, Rachael V Phillips, Lars WP van der Laan, and Mark J van der Laan. hal9001: The scalable highly adaptive lasso,

work page doi:10.5281/zenodo.1342293

[2] [2]

Jerome H Friedman

doi: 10.18637/jss.v033.i01. Jerome H Friedman. Multivariate adaptive regression splines.The annals of statistics, 19(1):1–67,

work page doi:10.18637/jss.v033.i01

[3] [3]

URLhttps://doi.org/10

doi: 10.21105/joss.02526. URLhttps://doi.org/10. 21105/joss.02526. Nicholas C Henderson, Thomas A Louis, Chenguang Wang, and Ravi Varadhan. Bayesian analysis of heterogeneous treatment effects for patient-centered outcomes research.Health Services and Outcomes Research Methodology, 16(4):213–233,

work page doi:10.21105/joss.02526

[4] [4]

David M Kent, Peter M Rothwell, John PA Ioannidis, Doug G Altman, and Rodney A Hayward

doi: 10.18637/jss.v070.i04. David M Kent, Peter M Rothwell, John PA Ioannidis, Doug G Altman, and Rodney A Hayward. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal.Trials, 11(1):85,

work page doi:10.18637/jss.v070.i04

[5] [5]

32 Xi Lin, Jens Magelund Tarp, and Robin J Evans

doi: 10.18637/jss.v110.i06. 32 Xi Lin, Jens Magelund Tarp, and Robin J Evans. Data fusion for efficiency gain in ate estimation: a practical review with simulations.arXiv preprint arXiv:2407.01186,

work page doi:10.18637/jss.v110.i06

[6] [6]

Highly adaptive empirical risk minimization with principal components.arXiv preprint arXiv:2603.18204,

Carlos Garc´ ıa Meixide, Mingxun Wang, Alejandro Schuler, and Mark J van der Laan. Highly adaptive empirical risk minimization with principal components.arXiv preprint arXiv:2603.18204,

work page arXiv

[7] [7]

On a general class of orthogonal learners for the estimation of heterogeneous treat- ment effects.arXiv preprint arXiv:2303.12687, 2023

Pawel Morzywolek, Johan Decruyenaere, and Stijn Vansteelandt. On a general class of orthogonal learners for the estimation of heterogeneous treatment effects.arXiv preprint arXiv:2303.12687,

work page arXiv

[8] [8]

An estimator-robust design for augmenting randomized controlled trial with external real-world data.arXiv preprint arXiv:2501.17835,

Sky Qiu, Jens Tarp, Andrew Mertens, and Mark van der Laan. An estimator-robust design for augmenting randomized controlled trial with external real-world data.arXiv preprint arXiv:2501.17835,

work page arXiv

[9] [9]

Lassoed tree boosting.arXiv preprint arXiv:2205.10697, 2022a

Alejandro Schuler, Yi Li, and Mark van der Laan. Lassoed tree boosting.arXiv preprint arXiv:2205.10697, 2022a. Alejandro Schuler, David Walsh, Diana Hall, Jon Walsh, Charles Fisher, Critical Path for Alzheimer’s Dis- ease, Alzheimer’s Disease Neuroimaging Initiative, and Alzheimer’s Disease Cooperative Study. Increasing the efficiency of randomized trial ...

work page arXiv

[10] [10]

Adaptive debiased machine learning using data-driven model selection techniques.arXiv preprint arXiv:2307.12544,

Lars van der Laan, Marco Carone, Alex Luedtke, and Mark van der Laan. Adaptive debiased machine learning using data-driven model selection techniques.arXiv preprint arXiv:2307.12544,

work page arXiv

[11] [11]

Highly Adaptive Principal Component Regression

Mingxun Wang, Alejandro Schuler, Mark van der Laan, and Carlos Garc´ ıa Meixide. Highly adaptive principal component regression.arXiv preprint arXiv:2602.10613,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Unified implementation and comparison of bayesian shrinkage methods for treatment effect estimation in subgroups.arXiv preprint arXiv:2603.21967,

34 Marcel Wolbers, Miriam Pedrera G´ omez, Alex Ocampo, and Isaac Gravestock. Unified implementation and comparison of bayesian shrinkage methods for treatment effect estimation in subgroups.arXiv preprint arXiv:2603.21967,

work page arXiv

[13] [13]

Investigating Targeting Strategies and Truncation in TMLE for the Average Treatment Effect under Practical Positivity Violations

doi: 10.18637/jss.v077.i01. Yichen Xu, Susan Gruber, and Mark J van der Laan. Investigating targeting strategies and truncation in tmle for the average treatment effect under practical positivity violations.arXiv preprint arXiv:2604.20059,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18637/jss.v077.i01

[14] [14]

Also, E0 S{mP (W)−m 0(W)} =E 0 S{QP (1, W,1)−Q 0(1, W,1)} −E 0 S{QP (1, W,0)−Q 0(1, W,0)}

gP (0|1, W) Y−Q P (1, W, A) =E 0 S g0(1|1, W) gP (1|1, W) Q0(1, W,1)−Q P (1, W,1) −E 0 S g0(0|1, W) gP (0|1, W) Q0(1, W,0)−Q P (1, W,0) . Also, E0 S{mP (W)−m 0(W)} =E 0 S{QP (1, W,1)−Q 0(1, W,1)} −E 0 S{QP (1, W,0)−Q 0(1, W,0)} . Combining the two gives RNΨ(P, P0) =E 0 S 1− g0(1|1, W) gP (1|1, W) QP (1, W,1)−Q 0(1, W,1) −E 0 S 1− g0(0|1, W) gP (0|1, W) QP...

work page 2025

[15] [15]

andsl3(Coyle et al., 2021)) with a 3-fold cross-validation scheme. The Super Learner library includes generalized linear models (Nelder and Wedderburn, 1972), multivariate adaptive regression splines (Friedman, 1991), and generalized additive models (Hastie and Tibshirani, 1986; Hastie, 2017). The final learner is selected based on cross-validated log- li...

work page 2021