Efficient estimation of cumulative incidence curves via data fusion with surrogates: application to integrated analysis of vaccine trial and immunobridging data
Pith reviewed 2026-05-10 14:09 UTC · model grok-4.3
The pith
Data from historical vaccine trials fused with immunobridging studies estimates counterfactual cumulative incidence curves for variant vaccines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We develop methods of inference for the counterfactual cumulative incidence curve using participant-level data from both a historical vaccine efficacy trial and an immunobridging study. We further extend these methods to pathogens with multiple serotypes by estimating cause-specific cumulative incidence curves. We describe the identification assumptions, propose efficient and multiply robust estimators, and assess their finite-sample performance through simulation studies.
What carries the argument
The efficient and multiply robust estimators that fuse participant-level data from the historical efficacy trial and the immunobridging study under the identification assumptions that link the two sources.
If this is right
- The methods yield estimates of hypothetical cumulative incidence for a bivalent mRNA booster using data from the COVAIL trial.
- The methods provide a way to test the assumption of no controlled direct effects of the vaccine beyond the surrogate.
- The extension produces cause-specific cumulative incidence curves for multi-serotype pathogens such as dengue or influenza.
- Simulation studies confirm good finite-sample performance of the proposed estimators under the stated assumptions.
Where Pith is reading between the lines
- The approach could shorten the timeline for updating vaccines against new variants by reducing the need for repeated large efficacy trials.
- Similar data-fusion strategies might apply to other medical settings where surrogate endpoints are used to approve regimen changes.
- If the no-direct-effects assumption holds across variants, the same historical trial data could support repeated immunobridging updates.
Load-bearing premise
The vaccine affects disease risk only through the measured surrogate immune marker, with no remaining direct effects on the clinical endpoint.
What would settle it
A direct randomized comparison of the updated vaccine versus the original vaccine that produces disease incidence rates different from those predicted by the fused estimators.
Figures
read the original abstract
Refined vaccine regimens containing variant-matched inserts are often authorized based on historical phase 3 efficacy trials together with immunobridging studies. Phase 3 trials are essential for establishing immune biomarkers that reliably predict disease risk or vaccine efficacy against clinical endpoints. Once such immune correlates are identified, updated vaccine regimens can be approved through immunobridging designs that compare the immunogenicity of the updated regimen to that of an already-approved vaccine. We develop methods of inference for the counterfactual cumulative incidence curve using participant-level data from both a historical vaccine efficacy trial and an immunobridging study. We further extend these methods to pathogens with multiple serotypes -- such as dengue virus and influenza -- by estimating cause-specific cumulative incidence curves. We describe the identification assumptions, propose efficient and multiply robust estimators, and assess their finite-sample performance through simulation studies. We then apply the proposed methods to (1) estimating the hypothetical cumulative incidence curve for a bivalent mRNA booster and (2) testing a key assumption of no controlled direct effects, using data from the COVID-19 Variant Immunologic Landscape (COVAIL) Trial, a multistage randomized clinical study evaluating the safety and immunogenicity of a second COVID-19 booster dose.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops methods for estimating counterfactual cumulative incidence curves by fusing participant-level data from a historical vaccine efficacy trial (where both vaccine assignment and clinical outcomes are observed) with an immunobridging study (where only the surrogate immune marker is observed under new regimens). It extends the framework to pathogens with multiple serotypes via cause-specific cumulative incidence curves, states the required identification assumptions (including no controlled direct effects of the vaccine beyond the surrogate), proposes efficient and multiply robust estimators, evaluates finite-sample performance in simulations, and applies the methods to COVAIL trial data to estimate curves for a bivalent mRNA booster while testing the no controlled direct effects assumption.
Significance. If the identification assumptions hold, the fused estimators would enable more efficient inference for counterfactual vaccine efficacy curves without requiring new large-scale efficacy trials for each updated regimen, which is relevant for regulatory immunobridging. The multiply robust property and the cause-specific extension for multi-serotype pathogens are notable strengths. The simulation studies and the COVAIL application (including assumption testing) provide concrete evidence of applicability, though the central claims rest on the validity of the no controlled direct effects assumption.
major comments (2)
- [Identification assumptions and simulation studies] The identification strategy (described in the methods section) relies on the no controlled direct effects assumption to link the historical trial and immunobridging data. While the paper tests this assumption in the COVAIL application, the simulations evaluate performance only under the assumption holding and do not include sensitivity analyses quantifying bias under plausible violations (e.g., unmeasured pathways or serotype-specific effects); this is load-bearing for the validity of all counterfactual curve estimates.
- [Estimator derivation] The multiply robust estimators are proposed for the fused data setting. The manuscript should explicitly verify (perhaps via the influence function or asymptotic expansion) whether the multiple robustness property is preserved when the two data sources have different sampling mechanisms and missingness patterns, or whether additional conditions on the nuisance estimators are required.
minor comments (3)
- [COVAIL application] In the real-data application, clarify how the error bars or confidence intervals for the estimated counterfactual curves account for the uncertainty from both data sources and the surrogate modeling.
- [Extension to multiple serotypes] The notation for cause-specific cumulative incidence functions could be made more consistent across the single-serotype and multi-serotype sections to improve readability.
- [Introduction] Add a brief discussion of how the methods compare to existing approaches for surrogate endpoint analysis in vaccine trials (e.g., principal stratification or mediation methods) to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped clarify key aspects of our work. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Identification assumptions and simulation studies] The identification strategy (described in the methods section) relies on the no controlled direct effects assumption to link the historical trial and immunobridging data. While the paper tests this assumption in the COVAIL application, the simulations evaluate performance only under the assumption holding and do not include sensitivity analyses quantifying bias under plausible violations (e.g., unmeasured pathways or serotype-specific effects); this is load-bearing for the validity of all counterfactual curve estimates.
Authors: We agree that sensitivity analyses under violations of the no controlled direct effects assumption would strengthen the manuscript. The current simulations are designed to evaluate consistency and efficiency when the assumption holds, which is the standard approach for establishing the method's properties under correct identification. In the revision, we will add a dedicated simulation study that introduces controlled direct effects (including serotype-specific pathways) and reports the resulting bias, variance, and coverage of the estimators. This will provide a more complete picture of the assumption's practical importance. revision: yes
-
Referee: [Estimator derivation] The multiply robust estimators are proposed for the fused data setting. The manuscript should explicitly verify (perhaps via the influence function or asymptotic expansion) whether the multiple robustness property is preserved when the two data sources have different sampling mechanisms and missingness patterns, or whether additional conditions on the nuisance estimators are required.
Authors: The multiple robustness property is preserved under the heterogeneous sampling and missingness patterns because the efficient influence function is derived by treating the two data sources as distinct strata with known sampling probabilities. The outcome regression and propensity score estimators are fitted separately within each source, and cross-fitting ensures the required orthogonality. No further conditions on the nuisance estimators are needed beyond those already stated for consistency. We will add an explicit verification, including a sketch of the asymptotic expansion, to the methods section in the revision. revision: yes
Circularity Check
No circularity: derivation grounded in explicit identification assumptions and independent simulations
full rationale
The paper states identification assumptions (including no controlled direct effects of vaccine beyond the surrogate), proposes multiply robust estimators for counterfactual cumulative incidence curves via data fusion, evaluates finite-sample performance in separate simulation studies, and applies the methods to COVAIL trial data while testing the key assumption. No equations or steps reduce a claimed prediction or result to a fitted input by construction, and no load-bearing premise collapses to a self-citation chain or ansatz smuggled from prior author work. The central contribution remains independent of its outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Identification assumptions linking historical trial and immunobridging data for counterfactual cumulative incidence
Reference graph
Works this paper leans on
-
[1]
Issa J Dahabreh, Sarah E Robertson, and Miguel A Hern´ an. Generalizing and transporting in- ferences about the effects of treatment assignment subject to non-adherence.arXiv preprint arXiv:2211.04876,
-
[2]
Shuo Feng, Daniel J Phillips, Thomas White, Homesh Sayal, Parvinder K Aley, Sagida Bibi, Christina Dold, Michelle Fuskova, Sarah C Gilbert, Ian Hirsch, et al. Correlates of protection against symptomatic and asymptomatic SARS-CoV-2 infection.Nature Medicine, 27(11):2032– 2040,
work page 2032
-
[3]
Immune correlates analysis of the ENSEMBLE single Ad26
Youyi Fong, Adrian B McDermott, David Benkeser, Sanne Roels, Daniel J Stieh, An Vandebosch, Mathieu Le Gars, Griet A Van Roey, Christopher R Houchens, Karen Martins, et al. Immune correlates analysis of the ENSEMBLE single Ad26. COV2. S dose vaccine efficacy clinical trial. Nature Microbiology, 7(12):1996–2010,
work page 1996
-
[4]
Brian Gilbert, Ivan Dıaz, Kara E Rudolph, and Tat-Thang Vo. A novel decomposition to explain heterogeneity in observational and randomized studies of causality.arXiv preprint arXiv:2208.05543, 2022a. Peter B Gilbert and Ying Huang. Predicting overall vaccine efficacy in a new setting by re- calibrating baseline covariate and intermediate response endpoint...
-
[5]
Ellen Graham, Marco Carone, and Andrea Rotnitzky. Towards a unified theory for semiparametric data fusion with individual-level data.arXiv preprint arXiv:2409.09973,
-
[6]
Michael L Jackson, Jessie R Chung, Lisa A Jackson, C Hallie Phillips, Joyce Benoit, Arnold S Monto, Emily T Martin, Edward A Belongia, Huong Q McLean, Manjusha Gaglani, et al. In- fluenza vaccine effectiveness in the United States during the 2015–2016 season.New England Journal of Medicine, 377(6):534–543,
work page 2015
-
[7]
Kennedy, Sivaraman Balakrishnan, and Max G’Sell
Edward H. Kennedy, Sivaraman Balakrishnan, and Max G’Sell. Sharp instruments for classifying compliers and generalizing causal effects.Annals of Statistics, 48(4):2008–2030,
work page 2008
-
[8]
38 Supplemental Materials to “Efficient estimation of cumulative incidence curves via data fusion with surrogates: application to integrated analysis of vaccine trial and immunobridging data” by Pan Zhao, Peter B. Gilbert, Oliver Dukes, and Bo Zhang. A Proofs A.1 Proofs of Proposition 1, 2 and Theorem 4 We prove the first identification result via the “me...
work page 2012
-
[9]
A.5 Proof of Proposition 5 We first state a useful lemma from Kennedy et al
y(T) = Ψ(P), which completes the proof of multiple robustness. A.5 Proof of Proposition 5 We first state a useful lemma from Kennedy et al. [2020]. Lemma 1.Let ˆf(o)be a function estimated from a sampleO N = (O n+1, . . . , ON), and letP n denote the empirical measure over(O 1, . . . , On), which is independent ofO N. Then (Pn −P) ˆf−f =O P ∥ ˆf−f∥ n1/2 !...
work page 2020
-
[10]
− 1 κ ) {ˆµ(X, a, s)−µ(X, a, s)}f(s|X, A=a,Γ = 1)ds =o P n−1/2 , which imply Z ˆP,P =o P n−1/2 , by Cauchy-Schwarz. Summarizing the above results, we have ˆΨ−Ψ = (P n −P){ϕ ∗ a (P)}+o P n−1/2 , which completes the proof. A.6 Proof of Theorem 2 and 7 Proof.When the outcome is subject to ignorable right censoring, the efficient influence function ϕC∗ a,t ca...
work page 2006
-
[11]
GT (t|X, a, S)− Z s∈S GT (t|X, a, s)f(s|X, A=a,Γ = 1)ds − Γ κ Z s∈S GT (t|X, a, s)f(s|X, A=a,Γ = 1)ds−R(a, t; Γ = 1). The extension to censored competing risks data follows straightforwardly when we sety(T) = I{T≤t,∆ =j}, j= 1, . . . , J[Rytgaard and van der Laan, 2024]. A.7 Proof of Theorem 3 Proof.Denote byf ∗,G T∗ andG C∗ the probability limits of the ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.