A Causal Framework for Evaluating Jointly Longitudinal Outcomes and Surrogate Markers: A State-Space Approach

Layla Parast; Silvaneo V. dos Santos Jr.

arxiv: 2604.12882 · v1 · submitted 2026-04-14 · 📊 stat.ME

A Causal Framework for Evaluating Jointly Longitudinal Outcomes and Surrogate Markers: A State-Space Approach

Silvaneo V. dos Santos Jr. , Layla Parast This is my paper

Pith reviewed 2026-05-10 14:28 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal inferencesurrogate markerslongitudinal datastate-space modelspotential outcomesKalman filterclinical trialstreatment effect decomposition

0 comments

The pith

A causal definition quantifies the proportion of treatment effect on a longitudinal primary outcome that is explained by the surrogate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a causal framework to assess how well a surrogate marker captures the full effect of a treatment when both the surrogate and the main outcome are measured repeatedly over time. It defines this explanatory proportion inside the potential outcomes framework so that researchers can evaluate surrogates along entire trajectories rather than at single time points. This matters for trials where collecting the primary outcome is costly or invasive, because a valid longitudinal surrogate could reduce data burden while still indicating treatment success across the study period. Estimation relies on state-space models fitted with the Kalman filter and smoother to handle the natural evolution and individual differences in the data. The methods include a bootstrap procedure and a test for temporal homogeneity, and they are illustrated on simulated data plus a diabetes trial.

Core claim

Within the potential outcomes framework, we propose a formal causal definition of the proportion of the treatment effect on the longitudinal primary outcome that is explained by the treatment effect on the longitudinal surrogate. For estimation, we leverage state-space models, together with the Kalman filter and smoother, enabling efficient estimation of treatment effects under realistic conditions of temporal evolution and patient-level variability. We introduce a nonparametric bootstrap strategy for state-space models, a temporal homogeneity test, and demonstrate the finite-sample performance of our proposed methods via a simulation study and application to a diabetes clinical trial.

What carries the argument

The state-space representation of the joint longitudinal processes, estimated via the Kalman filter and smoother, that delivers the causal proportion of treatment effect explained by the surrogate.

If this is right

Surrogate validity can now be assessed across the full time trajectory instead of at isolated measurement times.
The Kalman smoother provides efficient estimates of time-varying treatment effects while accounting for within-patient correlation.
A bootstrap method supplies uncertainty intervals without requiring parametric assumptions on the error distributions.
A homogeneity test can detect whether the surrogate-primary relationship remains stable over the study duration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same state-space structure could be adapted to settings with irregular observation times or missing data common in long-term follow-up studies.
If the proportion remains high across multiple trials, trial designers might shorten follow-up by focusing resources on surrogate collection.
Application to observational cohorts would require additional checks for the no-unmeasured-confounding assumption at each time step.

Load-bearing premise

The state-space model correctly captures the temporal evolution and patient-level variability of both surrogate and primary outcome, with no unmeasured confounding in the causal relationships over time.

What would settle it

In a simulation study where an unmodeled time-varying confounder is added to the data-generating process, the estimated proportion of treatment effect explained by the surrogate would systematically differ from the known true value.

Figures

Figures reproduced from arXiv: 2604.12882 by Layla Parast, Silvaneo V. dos Santos Jr..

**Figure 1.** Figure 1: Simulation results examining the average bias of the proposed SSM estimator (red), compared to estimation using GEE (green), LMM (blue), and OLS (purple), across 15 simulations which vary with respect to the treatment effect trajectory (monotone, parabole, or random walk) and the PTE (0.25, 0.5,0.75,0.9, or 1.0), and 6 sample sizes; a horizontal black dashed line is shown at a width of 0 for reference. set… view at source ↗

**Figure 2.** Figure 2: Simulation results examining the proportion of rejections of the null hypothesis that the PTE is ≤ 0.75 using the proposed SSM estimator (red), compared to estimation using GEE (green), LMM (blue), and OLS (purple), across 15 simulations which vary with respect to the treatment effect trajectory (monotone, parabole, or random walk) and the PTE (0.25, 0.5,0.75,0.9, or 1.0), and 6 sample sizes; note that whe… view at source ↗

**Figure 3.** Figure 3: Simulation results examining the proportion of rejections of the null hypothesis that the PTE is constant over time i.e., temporal homogeneity, using the proposed MSD test compared to a Wald-based test at an α-level of 0.05; note that the null hypothesis is true in Scenario 1 and is false in Scenarios 2, 3, 4, and 5; a horizontal black dashed line is shown at 0.05 for reference. 27 [PITH_FULL_IMAGE:figure… view at source ↗

**Figure 4.** Figure 4: Diabetes clinical trial results examining change in hemoglobin A1c as a surrogate for change in albumin excretion rate: the solid black line is the estimated cumulative proportion of treatment effect explained (CPTE) using our proposed methods over time using 28 lags (a) versus using 0 lags (b); the shaded region corresponds to estimated 90% pointwise confidence intervals; the dashed black line is shown a… view at source ↗

read the original abstract

Surrogate markers offer the potential to reduce the burden of data collection by replacing costly or invasive primary outcomes with more accessible measurements, provided that they can faithfully indicate the effectiveness of a treatment. However, appropriate evaluation of a surrogate is particularly complex in longitudinal studies, where both outcomes and surrogates can evolve dynamically over time and interest lies not only in the treatment effect at one time, but rather treatment effects that may vary along the entire trajectory. In this paper, we develop a statistical framework for surrogate evaluation when both the surrogate and primary outcome are measured over time. Specifically, within the potential outcomes framework, we propose a formal causal definition of the proportion of the treatment effect on the longitudinal primary outcome that is explained by the treatment effect on the longitudinal surrogate. For estimation, we leverage state-space models, together with the Kalman filter and smoother, enabling efficient estimation of treatment effects under realistic conditions of temporal evolution and patient-level variability. We introduce a nonparametric bootstrap strategy for state-space models, a temporal homogeneity test, and demonstrate the finite-sample performance of our proposed methods via a simulation study and application to a diabetes clinical trial.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a causal proportion for how much a longitudinal surrogate explains treatment effects on a longitudinal outcome and estimates it with state-space models, but the claim stands or falls on whether that model captures every relevant dynamic.

read the letter

The main contribution is a potential-outcomes definition of the explained proportion that applies when both surrogate and outcome are measured repeatedly over time. They pair it with a state-space representation and Kalman smoother to get the treatment-effect trajectories, then add a nonparametric bootstrap and a test for temporal homogeneity. The simulation study and diabetes-trial application show the procedure can be run on realistic data sizes and produces numbers that behave as expected under the model they assume.

Referee Report

3 major / 3 minor

Summary. The paper proposes a causal definition, within the potential outcomes framework, of the proportion of the treatment effect on a longitudinal primary outcome that is explained by the treatment effect on a longitudinal surrogate marker. Estimation proceeds by fitting a state-space model to the joint trajectories, using the Kalman filter and smoother to recover time-varying treatment effects, followed by a nonparametric bootstrap for inference, a test for temporal homogeneity of the proportion, and validation through simulation and a diabetes clinical trial application.

Significance. If the central identification result holds, the framework supplies a principled, time-resolved surrogate evaluation tool for longitudinal trials where both outcomes evolve dynamically; the state-space representation and Kalman smoother are standard, efficient tools that naturally accommodate patient-level heterogeneity and serial dependence. The bootstrap procedure and homogeneity test are practical additions that strengthen usability.

major comments (3)

[§3] §3 (causal definition and identification): the proportion is defined as a functional of the joint potential-outcome trajectories, yet the manuscript provides no explicit identification theorem showing that this functional is recoverable from the observed data distribution under the stated state-space assumptions (linear Gaussian transitions, no unmeasured time-varying confounding). Without such a result or a sensitivity analysis, it is unclear whether the Kalman-smoothed estimates equal the intended causal quantity or are biased by model misspecification.
[Section 5] Simulation study (Section 5): all reported scenarios assume the data-generating process exactly matches the fitted state-space model; no experiments examine bias or coverage when unmeasured time-varying factors affect both surrogate and primary outcome, which is the load-bearing assumption highlighted in the skeptic note. This leaves the finite-sample performance claim incomplete for realistic longitudinal settings.
[Section 6] Application (Section 6): the diabetes-trial analysis reports a point estimate and bootstrap interval for the proportion but does not include a formal check (e.g., via the proposed homogeneity test or residual diagnostics) that the state-space model adequately captures the joint dynamics; if the model is misspecified, the reported proportion cannot be interpreted causally.

minor comments (3)

[§3] Notation for the state vector and transition matrices is introduced without a consolidated table; readers would benefit from an explicit listing of all parameters and their interpretations early in §3.
[Introduction] The abstract and introduction cite prior surrogate literature only briefly; a short paragraph contrasting the new longitudinal proportion with existing single-time-point or cross-sectional definitions would improve context.
[Figures 2-4] Figure captions for the simulation and application plots should state the exact sample size, number of bootstrap replicates, and whether the displayed intervals are pointwise or simultaneous.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and describe the revisions we will incorporate to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (causal definition and identification): the proportion is defined as a functional of the joint potential-outcome trajectories, yet the manuscript provides no explicit identification theorem showing that this functional is recoverable from the observed data distribution under the stated state-space assumptions (linear Gaussian transitions, no unmeasured time-varying confounding). Without such a result or a sensitivity analysis, it is unclear whether the Kalman-smoothed estimates equal the intended causal quantity or are biased by model misspecification.

Authors: We agree that an explicit identification result would improve clarity. Under the linear Gaussian state-space assumptions together with the no unmeasured time-varying confounding condition required for the potential-outcomes interpretation, the observed-data distribution identifies the causal proportion via the Kalman smoother. In the revised manuscript we will add a formal identification theorem (new Theorem 1) in Section 3 establishing this equivalence and include a short discussion of sensitivity to violations of the no-unmeasured-confounding assumption. revision: yes
Referee: [Section 5] Simulation study (Section 5): all reported scenarios assume the data-generating process exactly matches the fitted state-space model; no experiments examine bias or coverage when unmeasured time-varying factors affect both surrogate and primary outcome, which is the load-bearing assumption highlighted in the skeptic note. This leaves the finite-sample performance claim incomplete for realistic longitudinal settings.

Authors: The referee is correct that the existing simulations assume correct model specification. To address this gap we will add a new simulation scenario in Section 5 that introduces unmeasured time-varying confounding (via an omitted common factor). The results will illustrate the bias that arises when the identifying assumption is violated, thereby clarifying the conditions under which the estimator retains its causal interpretation. revision: yes
Referee: [Section 6] Application (Section 6): the diabetes-trial analysis reports a point estimate and bootstrap interval for the proportion but does not include a formal check (e.g., via the proposed homogeneity test or residual diagnostics) that the state-space model adequately captures the joint dynamics; if the model is misspecified, the reported proportion cannot be interpreted causally.

Authors: We thank the referee for this observation. In the revised Section 6 we will report the p-value from the temporal homogeneity test and include residual diagnostics (innovation autocorrelation and Q-Q plots) for the fitted state-space model on the diabetes data. These checks will support the adequacy of the model and the causal interpretation of the reported proportion. revision: yes

Circularity Check

0 steps flagged

Causal definition of longitudinal surrogate proportion precedes and is independent of state-space estimation

full rationale

The paper defines the target causal quantity (proportion of treatment effect on longitudinal primary outcome explained by surrogate) first, using the potential outcomes framework. Estimation then applies standard state-space models, Kalman filter, and smoother as tools to recover this pre-defined quantity under stated assumptions. No equation or step reduces the definition to a fitted parameter, renames a known result, or relies on a self-citation chain for its validity. The state-space model encodes temporal dynamics for identification and estimation but does not construct the causal proportion itself. This is the most common non-circular case: a formal definition followed by a separate estimation procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit list of fitted parameters, axioms, or new entities; the framework implicitly relies on standard state-space assumptions and causal identifiability conditions that are not detailed here.

pith-pipeline@v0.9.0 · 5501 in / 1091 out tokens · 25914 ms · 2026-05-10T14:28:56.835295+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

doi: 10.1111/biom.13310

ISSN 0006-341X. doi: 10.1111/biom.13310. Denis Agniel and Layla Parast. Robust evaluation of longitudinal surrogate markers with censored data.Journal of the Royal Statistical Society Series B: Statistical Methodology, 87(3):891–907, 12

work page doi:10.1111/biom.13310
[2]

S., Maskell, S., Gordon, N., & Clapp, T

doi: 10.1109/78.978374. Susan Athey, Raj Chetty, Guido W Imbens, and Hyunseung Kang. The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Working Paper 26463, National Bureau of Economic Research, November

work page doi:10.1109/78.978374
[3]

Michael R Elliott

doi: doi:10.1515/jci-2022-0077. Michael R Elliott. Surrogate endpoints in clinical trials.Annual Review of Statistics and its Application, 10:75–96,

work page doi:10.1515/jci-2022-0077 2022
[4]

Avoiding the surrogate paradox: an empirical framework for assessing assumptions.Journal of Nonparametric Statistics, pages 1–22, 2025a

Emily Hsiao, Lu Tian, and Layla Parast. Avoiding the surrogate paradox: an empirical framework for assessing assumptions.Journal of Nonparametric Statistics, pages 1–22, 2025a. Emily Hsiao, Lu Tian, and Layla Parast. Resilience measures for the surrogate paradox. arXiv preprint arXiv:2506.12194, 2025b. 59 Rafael Izbicki, Ann Lee, and Chad Schafer. High-di...

work page arXiv
[5]

doi: 10.1093/aje/kwac128

ISSN 0002-9262. doi: 10.1093/aje/kwac128. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B¨ unau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4):699–746,

work page doi:10.1093/aje/kwac128
[6]

Proceedings of the National Academy of Sciences , author =

doi: 10.1073/pnas.1614732113. Yue Wang and Jeremy MG Taylor. A measure of the proportion of treatment effect explained by a surrogate marker.Biometrics, 58(4):803–812,

work page doi:10.1073/pnas.1614732113

[1] [1]

doi: 10.1111/biom.13310

ISSN 0006-341X. doi: 10.1111/biom.13310. Denis Agniel and Layla Parast. Robust evaluation of longitudinal surrogate markers with censored data.Journal of the Royal Statistical Society Series B: Statistical Methodology, 87(3):891–907, 12

work page doi:10.1111/biom.13310

[2] [2]

S., Maskell, S., Gordon, N., & Clapp, T

doi: 10.1109/78.978374. Susan Athey, Raj Chetty, Guido W Imbens, and Hyunseung Kang. The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Working Paper 26463, National Bureau of Economic Research, November

work page doi:10.1109/78.978374

[3] [3]

Michael R Elliott

doi: doi:10.1515/jci-2022-0077. Michael R Elliott. Surrogate endpoints in clinical trials.Annual Review of Statistics and its Application, 10:75–96,

work page doi:10.1515/jci-2022-0077 2022

[4] [4]

Avoiding the surrogate paradox: an empirical framework for assessing assumptions.Journal of Nonparametric Statistics, pages 1–22, 2025a

Emily Hsiao, Lu Tian, and Layla Parast. Avoiding the surrogate paradox: an empirical framework for assessing assumptions.Journal of Nonparametric Statistics, pages 1–22, 2025a. Emily Hsiao, Lu Tian, and Layla Parast. Resilience measures for the surrogate paradox. arXiv preprint arXiv:2506.12194, 2025b. 59 Rafael Izbicki, Ann Lee, and Chad Schafer. High-di...

work page arXiv

[5] [5]

doi: 10.1093/aje/kwac128

ISSN 0002-9262. doi: 10.1093/aje/kwac128. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B¨ unau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4):699–746,

work page doi:10.1093/aje/kwac128

[6] [6]

Proceedings of the National Academy of Sciences , author =

doi: 10.1073/pnas.1614732113. Yue Wang and Jeremy MG Taylor. A measure of the proportion of treatment effect explained by a surrogate marker.Biometrics, 58(4):803–812,

work page doi:10.1073/pnas.1614732113