A Causal Framework for Evaluating Jointly Longitudinal Outcomes and Surrogate Markers: A State-Space Approach
Pith reviewed 2026-05-10 14:28 UTC · model grok-4.3
The pith
A causal definition quantifies the proportion of treatment effect on a longitudinal primary outcome that is explained by the surrogate.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Within the potential outcomes framework, we propose a formal causal definition of the proportion of the treatment effect on the longitudinal primary outcome that is explained by the treatment effect on the longitudinal surrogate. For estimation, we leverage state-space models, together with the Kalman filter and smoother, enabling efficient estimation of treatment effects under realistic conditions of temporal evolution and patient-level variability. We introduce a nonparametric bootstrap strategy for state-space models, a temporal homogeneity test, and demonstrate the finite-sample performance of our proposed methods via a simulation study and application to a diabetes clinical trial.
What carries the argument
The state-space representation of the joint longitudinal processes, estimated via the Kalman filter and smoother, that delivers the causal proportion of treatment effect explained by the surrogate.
If this is right
- Surrogate validity can now be assessed across the full time trajectory instead of at isolated measurement times.
- The Kalman smoother provides efficient estimates of time-varying treatment effects while accounting for within-patient correlation.
- A bootstrap method supplies uncertainty intervals without requiring parametric assumptions on the error distributions.
- A homogeneity test can detect whether the surrogate-primary relationship remains stable over the study duration.
Where Pith is reading between the lines
- The same state-space structure could be adapted to settings with irregular observation times or missing data common in long-term follow-up studies.
- If the proportion remains high across multiple trials, trial designers might shorten follow-up by focusing resources on surrogate collection.
- Application to observational cohorts would require additional checks for the no-unmeasured-confounding assumption at each time step.
Load-bearing premise
The state-space model correctly captures the temporal evolution and patient-level variability of both surrogate and primary outcome, with no unmeasured confounding in the causal relationships over time.
What would settle it
In a simulation study where an unmodeled time-varying confounder is added to the data-generating process, the estimated proportion of treatment effect explained by the surrogate would systematically differ from the known true value.
Figures
read the original abstract
Surrogate markers offer the potential to reduce the burden of data collection by replacing costly or invasive primary outcomes with more accessible measurements, provided that they can faithfully indicate the effectiveness of a treatment. However, appropriate evaluation of a surrogate is particularly complex in longitudinal studies, where both outcomes and surrogates can evolve dynamically over time and interest lies not only in the treatment effect at one time, but rather treatment effects that may vary along the entire trajectory. In this paper, we develop a statistical framework for surrogate evaluation when both the surrogate and primary outcome are measured over time. Specifically, within the potential outcomes framework, we propose a formal causal definition of the proportion of the treatment effect on the longitudinal primary outcome that is explained by the treatment effect on the longitudinal surrogate. For estimation, we leverage state-space models, together with the Kalman filter and smoother, enabling efficient estimation of treatment effects under realistic conditions of temporal evolution and patient-level variability. We introduce a nonparametric bootstrap strategy for state-space models, a temporal homogeneity test, and demonstrate the finite-sample performance of our proposed methods via a simulation study and application to a diabetes clinical trial.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a causal definition, within the potential outcomes framework, of the proportion of the treatment effect on a longitudinal primary outcome that is explained by the treatment effect on a longitudinal surrogate marker. Estimation proceeds by fitting a state-space model to the joint trajectories, using the Kalman filter and smoother to recover time-varying treatment effects, followed by a nonparametric bootstrap for inference, a test for temporal homogeneity of the proportion, and validation through simulation and a diabetes clinical trial application.
Significance. If the central identification result holds, the framework supplies a principled, time-resolved surrogate evaluation tool for longitudinal trials where both outcomes evolve dynamically; the state-space representation and Kalman smoother are standard, efficient tools that naturally accommodate patient-level heterogeneity and serial dependence. The bootstrap procedure and homogeneity test are practical additions that strengthen usability.
major comments (3)
- [§3] §3 (causal definition and identification): the proportion is defined as a functional of the joint potential-outcome trajectories, yet the manuscript provides no explicit identification theorem showing that this functional is recoverable from the observed data distribution under the stated state-space assumptions (linear Gaussian transitions, no unmeasured time-varying confounding). Without such a result or a sensitivity analysis, it is unclear whether the Kalman-smoothed estimates equal the intended causal quantity or are biased by model misspecification.
- [Section 5] Simulation study (Section 5): all reported scenarios assume the data-generating process exactly matches the fitted state-space model; no experiments examine bias or coverage when unmeasured time-varying factors affect both surrogate and primary outcome, which is the load-bearing assumption highlighted in the skeptic note. This leaves the finite-sample performance claim incomplete for realistic longitudinal settings.
- [Section 6] Application (Section 6): the diabetes-trial analysis reports a point estimate and bootstrap interval for the proportion but does not include a formal check (e.g., via the proposed homogeneity test or residual diagnostics) that the state-space model adequately captures the joint dynamics; if the model is misspecified, the reported proportion cannot be interpreted causally.
minor comments (3)
- [§3] Notation for the state vector and transition matrices is introduced without a consolidated table; readers would benefit from an explicit listing of all parameters and their interpretations early in §3.
- [Introduction] The abstract and introduction cite prior surrogate literature only briefly; a short paragraph contrasting the new longitudinal proportion with existing single-time-point or cross-sectional definitions would improve context.
- [Figures 2-4] Figure captions for the simulation and application plots should state the exact sample size, number of bootstrap replicates, and whether the displayed intervals are pointwise or simultaneous.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major point below and describe the revisions we will incorporate to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (causal definition and identification): the proportion is defined as a functional of the joint potential-outcome trajectories, yet the manuscript provides no explicit identification theorem showing that this functional is recoverable from the observed data distribution under the stated state-space assumptions (linear Gaussian transitions, no unmeasured time-varying confounding). Without such a result or a sensitivity analysis, it is unclear whether the Kalman-smoothed estimates equal the intended causal quantity or are biased by model misspecification.
Authors: We agree that an explicit identification result would improve clarity. Under the linear Gaussian state-space assumptions together with the no unmeasured time-varying confounding condition required for the potential-outcomes interpretation, the observed-data distribution identifies the causal proportion via the Kalman smoother. In the revised manuscript we will add a formal identification theorem (new Theorem 1) in Section 3 establishing this equivalence and include a short discussion of sensitivity to violations of the no-unmeasured-confounding assumption. revision: yes
-
Referee: [Section 5] Simulation study (Section 5): all reported scenarios assume the data-generating process exactly matches the fitted state-space model; no experiments examine bias or coverage when unmeasured time-varying factors affect both surrogate and primary outcome, which is the load-bearing assumption highlighted in the skeptic note. This leaves the finite-sample performance claim incomplete for realistic longitudinal settings.
Authors: The referee is correct that the existing simulations assume correct model specification. To address this gap we will add a new simulation scenario in Section 5 that introduces unmeasured time-varying confounding (via an omitted common factor). The results will illustrate the bias that arises when the identifying assumption is violated, thereby clarifying the conditions under which the estimator retains its causal interpretation. revision: yes
-
Referee: [Section 6] Application (Section 6): the diabetes-trial analysis reports a point estimate and bootstrap interval for the proportion but does not include a formal check (e.g., via the proposed homogeneity test or residual diagnostics) that the state-space model adequately captures the joint dynamics; if the model is misspecified, the reported proportion cannot be interpreted causally.
Authors: We thank the referee for this observation. In the revised Section 6 we will report the p-value from the temporal homogeneity test and include residual diagnostics (innovation autocorrelation and Q-Q plots) for the fitted state-space model on the diabetes data. These checks will support the adequacy of the model and the causal interpretation of the reported proportion. revision: yes
Circularity Check
Causal definition of longitudinal surrogate proportion precedes and is independent of state-space estimation
full rationale
The paper defines the target causal quantity (proportion of treatment effect on longitudinal primary outcome explained by surrogate) first, using the potential outcomes framework. Estimation then applies standard state-space models, Kalman filter, and smoother as tools to recover this pre-defined quantity under stated assumptions. No equation or step reduces the definition to a fitted parameter, renames a known result, or relies on a self-citation chain for its validity. The state-space model encodes temporal dynamics for identification and estimation but does not construct the causal proportion itself. This is the most common non-circular case: a formal definition followed by a separate estimation procedure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ISSN 0006-341X. doi: 10.1111/biom.13310. Denis Agniel and Layla Parast. Robust evaluation of longitudinal surrogate markers with censored data.Journal of the Royal Statistical Society Series B: Statistical Methodology, 87(3):891–907, 12
-
[2]
S., Maskell, S., Gordon, N., & Clapp, T
doi: 10.1109/78.978374. Susan Athey, Raj Chetty, Guido W Imbens, and Hyunseung Kang. The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Working Paper 26463, National Bureau of Economic Research, November
-
[3]
doi: doi:10.1515/jci-2022-0077. Michael R Elliott. Surrogate endpoints in clinical trials.Annual Review of Statistics and its Application, 10:75–96,
-
[4]
Emily Hsiao, Lu Tian, and Layla Parast. Avoiding the surrogate paradox: an empirical framework for assessing assumptions.Journal of Nonparametric Statistics, pages 1–22, 2025a. Emily Hsiao, Lu Tian, and Layla Parast. Resilience measures for the surrogate paradox. arXiv preprint arXiv:2506.12194, 2025b. 59 Rafael Izbicki, Ann Lee, and Chad Schafer. High-di...
-
[5]
ISSN 0002-9262. doi: 10.1093/aje/kwac128. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B¨ unau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4):699–746,
-
[6]
Proceedings of the National Academy of Sciences , author =
doi: 10.1073/pnas.1614732113. Yue Wang and Jeremy MG Taylor. A measure of the proportion of treatment effect explained by a surrogate marker.Biometrics, 58(4):803–812,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.