pith. the verified trust layer for science. sign in

arxiv: 2604.24360 · v1 · submitted 2026-04-27 · 📊 stat.ME

A Milestone-Based Framework for Characterizing Time-Varying Treatment Effects in Immunotherapy Trials

Pith reviewed 2026-05-08 02:20 UTC · model grok-4.3

classification 📊 stat.ME
keywords milestone-based frameworktime-varying treatment effectstau-based summaryimmunotherapy trialsnonproportional hazardshazard reversalsurvival probabilitiesoncology
0
0 comments X p. Extension

The pith

A milestone-based framework characterizes time-varying treatment effects by separating long-term survival from short-term ordering in immunotherapy trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a milestone-based framework for summarizing treatment differences in immunotherapy trials that exhibit heterogeneous and time-varying effects. The approach uses milestone survival probabilities to capture long-term outcomes and a tau-based summary to describe short-term treatment ordering among patients who do not reach the milestone. Illustrations from three phase III trials demonstrate its ability to identify hazard reversals and patterns difficult to summarize with conventional measures. Sympathetic readers would care as it offers a practical method to evaluate when and how benefits emerge in settings with nonproportional hazards.

Core claim

The milestone-based framework separates long-term survival beyond a clinically meaningful time point from earlier outcomes and provides a practical way to characterize patient heterogeneity in treatment response. The framework summarizes treatment differences through milestone survival probabilities and, among patients who do not reach the milestone, characterizes short-term treatment ordering over time using a tau-based summary that helps identify hazard reversal. It is illustrated using reconstructed individual-level data from three landmark phase III trials: CheckMate 067, CheckMate 227, and CLEAR.

What carries the argument

Milestone survival probabilities combined with a tau-based summary for short-term treatment ordering in the pre-milestone population.

Load-bearing premise

That a single clinically meaningful milestone time point can be chosen in advance and that the tau-based summary on the pre-milestone population adequately captures short-term ordering without further assumptions on the underlying hazard functions or censoring mechanisms.

What would settle it

A re-analysis of one of the example trials where the framework indicates no hazard reversal despite visual evidence from Kaplan-Meier curves showing crossing hazards, or inconsistent results from varying the milestone time.

Figures

Figures reproduced from arXiv: 2604.24360 by Jedd D. Wolchok, Martin T. Wells, Weijing Wang, Yi-Cheng Tai.

Figure 1
Figure 1. Figure 1: Comprehensive analysis from the 6.5-year follow-up of the CheckMate 067 trial, presenting metrics view at source ↗
Figure 2
Figure 2. Figure 2: Comprehensive analysis from the final 10-year report of the CheckMate 067 trial, presenting view at source ↗
Figure 3
Figure 3. Figure 3: Comprehensive analysis from the 5-year CheckMate 227 study, based on digitized OS Kaplan-Meier view at source ↗
Figure 4
Figure 4. Figure 4: Comprehensive analysis from the CLEAR trial, presenting metrics based on OS and PFS, along view at source ↗
read the original abstract

Immune checkpoint inhibitor--based therapies often produce heterogeneous survival responses, including early risk, delayed treatment benefit, and durable long-term survival in a subset of patients. In these settings, conventional summary measures such as the hazard ratio may not adequately describe how treatment effects evolve over follow-up. We propose a milestone-based framework that separates long-term survival beyond a clinically meaningful time point from earlier outcomes and provides a practical way to characterize patient heterogeneity in treatment response. The framework summarizes treatment differences through milestone survival probabilities and, among patients who do not reach the milestone, characterizes short-term treatment ordering over time using a tau-based summary that helps identify hazard reversal. We illustrate the approach using reconstructed individual-level data from three landmark phase III trials: CheckMate~067, CheckMate~227, and CLEAR. Across these examples, the framework captures patterns that are difficult to summarize with conventional measures, including settings in which early disadvantage coexists with later durable benefit. It also helps clarify when treatment benefit begins to emerge and how short-term and long-term effects differ within the same trial. This approach provides a clinically interpretable and statistically principled way to evaluate heterogeneous and time-varying treatment effects in oncology trials with nonproportional hazards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The manuscript proposes a milestone-based framework for characterizing time-varying treatment effects in immunotherapy trials. It separates long-term survival beyond a pre-specified clinically meaningful milestone time from short-term outcomes using milestone survival probabilities. For patients not reaching the milestone, it employs a tau-based summary to order treatments over time and identify potential hazard reversals. The framework is demonstrated on reconstructed individual-level data from three phase III trials: CheckMate 067, CheckMate 227, and CLEAR, highlighting patterns such as early risk with later durable benefit that are not captured by conventional hazard ratios.

Significance. If the technical details of the tau-based summary are rigorously justified and the method proves robust to censoring and choice of milestone, this framework could provide a valuable, interpretable alternative to standard summary measures for non-proportional hazards in oncology, aiding clinical interpretation of heterogeneous responses in immunotherapy.

major comments (4)
  1. [Methods] Methods section (description of tau-based summary): The definition, derivation, and statistical properties of the tau-based summary for short-term treatment ordering on the pre-milestone subpopulation are not provided; it is unclear whether this functional correctly orders treatments or identifies hazard reversal without additional assumptions on the underlying hazard processes or independent censoring.
  2. [Methods] Methods and Application sections (milestone time point): The framework relies on pre-specifying a single clinically meaningful milestone t* such that P(T > t*) cleanly separates long-term from short-term effects, but no justification, sensitivity analysis to modest shifts in t*, or discussion of information loss is given; this choice is load-bearing for the central separation claim.
  3. [Results] Results section: No simulation studies are included to evaluate the framework's performance under controlled scenarios with known time-varying hazards, varying censoring rates, or competing risks, leaving the reliability of the milestone survival probabilities and tau-summary unverified despite the use of reconstructed KM data from the three trials.
  4. [Application] Application section (data handling): The analysis uses reconstructed individual-level data from published Kaplan-Meier curves in CheckMate 067, 227, and CLEAR without explicit discussion of how right-censoring or potential reconstruction inaccuracies are incorporated into the tau-based calculations or milestone probabilities.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'tau-based summary' is introduced without a brief inline definition or pointer to its exact formula in the main text.
  2. [Figures] Figure captions: Ensure all figures explicitly label the specific milestone times t* used in each trial example for clarity.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Methods] Methods section (description of tau-based summary): The definition, derivation, and statistical properties of the tau-based summary for short-term treatment ordering on the pre-milestone subpopulation are not provided; it is unclear whether this functional correctly orders treatments or identifies hazard reversal without additional assumptions on the underlying hazard processes or independent censoring.

    Authors: We appreciate this observation. The tau-based summary is intended as a rank-based measure (analogous to Kendall's tau) that compares the ordering of survival times between treatment arms within the pre-milestone group to detect crossings or reversals in the cumulative incidence. In the revised manuscript, we will add a formal definition, derivation from the joint survival distribution, and discussion of its properties, including the assumption of independent censoring and how it identifies hazard reversals when the integrated difference in hazards changes sign. This will clarify its use without requiring proportional hazards. revision: yes

  2. Referee: [Methods] Methods and Application sections (milestone time point): The framework relies on pre-specifying a single clinically meaningful milestone t* such that P(T > t*) cleanly separates long-term from short-term effects, but no justification, sensitivity analysis to modest shifts in t*, or discussion of information loss is given; this choice is load-bearing for the central separation claim.

    Authors: We agree that the choice of t* is critical. In the revision, we will justify the choices of t* for each trial based on clinical literature and trial characteristics (e.g., 12 or 24 months for immunotherapy benefit emergence). We will also include sensitivity analyses showing how the milestone survival probabilities and tau summaries change with small perturbations in t* (such as ±2 or ±3 months) and discuss the trade-off in information loss versus interpretability. These additions will be made to the Methods and Results sections. revision: yes

  3. Referee: [Results] Results section: No simulation studies are included to evaluate the framework's performance under controlled scenarios with known time-varying hazards, varying censoring rates, or competing risks, leaving the reliability of the milestone survival probabilities and tau-summary unverified despite the use of reconstructed KM data from the three trials.

    Authors: The primary aim of the manuscript is to introduce the framework and illustrate its application to real trial data to demonstrate clinical interpretability. However, we recognize the importance of simulation-based validation. In the revised version, we will incorporate a simulation study section that assesses the framework under scenarios with non-proportional hazards, different censoring mechanisms, and sample sizes, to verify the accuracy of the estimates and the tau-summary's ability to detect reversals. revision: yes

  4. Referee: [Application] Application section (data handling): The analysis uses reconstructed individual-level data from published Kaplan-Meier curves in CheckMate 067, 227, and CLEAR without explicit discussion of how right-censoring or potential reconstruction inaccuracies are incorporated into the tau-based calculations or milestone probabilities.

    Authors: We will revise the Application section to include a detailed description of the data reconstruction method (using the algorithm of Guyot et al. or similar), how the reconstructed data preserve the original censoring information from the KM curves, and the potential limitations due to reconstruction error. We will also explain that the milestone probabilities are directly estimated from the KM curves, and the tau-summary is computed on the reconstructed event times, with bootstrap for inference to account for variability. revision: yes

Circularity Check

0 steps flagged

Milestone framework is a purely descriptive summary tool with no self-referential derivation

full rationale

The paper presents a milestone-based framework as a descriptive statistical tool that separates long-term survival probabilities from short-term tau-based ordering on the pre-milestone subpopulation. No equations, predictions, or first-principles results are claimed that reduce by construction to fitted parameters, self-citations, or ansatzes imported from prior work. The approach is illustrated on reconstructed trial data without any load-bearing steps that equate outputs to inputs. This is a standard non-circular descriptive method.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework relies on the existence of a pre-specified clinically meaningful milestone time and on the tau measure being a valid ordering statistic for censored survival data; no new entities are postulated.

free parameters (2)
  • milestone time point
    Chosen by investigators as clinically meaningful; directly affects which patients are classified as long-term survivors.
  • tau threshold or definition
    The exact form of the tau-based summary is not specified in the abstract but functions as a tunable ordering metric.
axioms (2)
  • domain assumption Standard right-censoring assumptions hold and do not distort the pre-milestone ordering.
    Implicit in any survival analysis using reconstructed trial data.
  • ad hoc to paper A single milestone time can separate short-term from long-term effects without loss of important information.
    Central modeling choice introduced by the framework.

pith-pipeline@v0.9.0 · 5524 in / 1408 out tokens · 37573 ms · 2026-05-08T02:20:55.055969+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    20 Luis G

    doi: 10.1056/NEJMoa1910231. 20 Luis G. Paz-Ares, Suresh S. Ramalingam, Tudor-Eliade Ciuleanu, Jong-Seok Lee, Laszlo Urban, Reyes Bernab´ e Caro, Keunchil Park, Hiroshi Sakai, Yuichiro Ohe, Makoto Nishio, Clarisse Audigier-Valette, Jacobus A. Burgers, Adam Pluzanski, Randeep Sangha, Carlos Gallardo, Masayuki Takeda, Helena Linardou, Lorena Lupinacci, Ki Hy...

  2. [2]

    URLhttps://doi.org/10.1200/JCO.23.01569

    doi: 10.1200/JCO.23.01569. URLhttps://doi.org/10.1200/JCO.23.01569. 21