One-sample survival tests in the presence of non-proportional hazards in oncology clinical trials
Pith reviewed 2026-05-19 07:55 UTC · model grok-4.3
The pith
Max-Combo test outperforms one-sample log-rank across non-proportional hazards in single-arm oncology trials.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing score tests under piecewise exponential and accelerated hazards models and combining them with a restricted mean survival time statistic into a max-Combo procedure, the resulting test is more powerful than the one-sample log-rank test for single-arm trials under any examined non-proportional hazards pattern.
What carries the argument
The max-Combo test, which takes the maximum of adjusted statistics from several component score tests each matched to a different non-proportional hazards pattern.
If this is right
- Single-arm trials can now be powered against a wider range of treatment-effect shapes including delayed and crossing hazards.
- Trial designers gain a menu of score tests that can be chosen or combined according to the expected pattern of benefit.
- The same framework supplies a direct way to incorporate restricted mean survival time into the comparison with historical controls.
Where Pith is reading between the lines
- The method could be adapted for other time-to-event endpoints such as progression-free survival if similar historical data exist.
- Routine sensitivity analyses that vary the historical curve within its estimation uncertainty would strengthen claims based on these tests.
- The combination approach suggests a template for constructing robust tests in other single-sample settings outside oncology.
Load-bearing premise
The survival curve of the external control group is known accurately from historical data with little uncertainty or model error.
What would settle it
A simulation or real-data re-analysis in which the external control survival curve is deliberately misspecified to check whether type I error inflates or power collapses for the max-Combo procedure.
read the original abstract
In oncology, conduct well-powered time-to-event randomized clinical trials may be challenging due to limited patietns number. Many designs for single-arm trials (SATs) have recently emerged as an alternative to overcome this issue. They rely on the (modified) one-sample log-rank test (OSLRT) under the proportional hazards to compare the survival curves of an experimental and an external control group. We extend Finkelstein's formulation of OSLRT as a score test by using a piecewise exponential model for early, middle and delayed treatment effects and an accelerated hazards model for crossing hazards. We adapt the restricted mean survival time based test and construct a combination test procedure (max-Combo) to SATs. The performance of the developed are evaluated through a simulation study. The score tests are as conservative as the OSLRT and have the highest power when the data generation matches the model underlying score tests. The max-Combo test is more powerful than the OSLRT whatever the scenarios and is thus an interesting approach as compared to a score test. Uncertainty on the survival curve estimated of the external control group and its model misspecification may have a significant impact on performance. For illustration, we apply the developed tests on real data examples.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends Finkelstein's one-sample log-rank test (OSLRT) for single-arm oncology trials to non-proportional hazards settings by deriving score tests under piecewise exponential models (early, middle, delayed effects) and an accelerated hazards model (crossing hazards). It further adapts a restricted mean survival time test and constructs a max-Combo combination procedure. Performance is evaluated in simulations across these scenarios, with the central claim that the max-Combo test is more powerful than the OSLRT in all cases while score tests match OSLRT conservatism but gain power under model match; uncertainty from estimating the external control curve is flagged as potentially impactful. Real-data illustrations are provided.
Significance. If the reported power advantages of max-Combo hold after propagating estimation uncertainty from the external control (a common feature of SATs), the work would supply practical, more robust alternatives to standard OSLRT for small-sample oncology trials with non-PH patterns. The explicit model-based extensions and combination test are technically straightforward and directly address a recognized limitation of PH-based one-sample tests.
major comments (1)
- [Abstract / Simulation study] Abstract and simulation study: The claim that 'the max-Combo test is more powerful than the OSLRT whatever the scenarios' rests on simulations that treat the external control survival curve as known and fixed when generating data and computing statistics. The abstract itself states that 'Uncertainty on the survival curve estimated of the external control group and its model misspecification may have a significant impact on performance,' yet the reported results do not appear to incorporate finite-sample estimation error (e.g., via bootstrap resampling of historical data or sampling from an estimated control distribution). This omission is load-bearing for the superiority claim, as the advantage may not persist in the realistic SAT setting where the control curve must itself be estimated from limited historical data.
minor comments (2)
- [Abstract] Abstract: Typo 'patietns' should be 'patients'; grammar 'The performance of the developed are evaluated' should be revised for subject-verb agreement.
- The manuscript would benefit from explicit statements of the exact simulation parameters (sample sizes, censoring rates, number of replications, and how the external control curve is generated or fixed) to allow full reproducibility of the power comparisons.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review of our manuscript on extending one-sample survival tests to non-proportional hazards settings. The major comment raises an important point about the realism of our simulation design. We address it directly below and will revise the manuscript to incorporate additional analyses that propagate estimation uncertainty from the external control curve.
read point-by-point responses
-
Referee: Abstract and simulation study: The claim that 'the max-Combo test is more powerful than the OSLRT whatever the scenarios' rests on simulations that treat the external control survival curve as known and fixed when generating data and computing statistics. The abstract itself states that 'Uncertainty on the survival curve estimated of the external control group and its model misspecification may have a significant impact on performance,' yet the reported results do not appear to incorporate finite-sample estimation error (e.g., via bootstrap resampling of historical data or sampling from an estimated control distribution). This omission is load-bearing for the superiority claim, as the advantage may not persist in the realistic SAT setting where the control curve must itself be estimated from limited historical data.
Authors: We agree that this is a valid and substantive concern. Our current simulations were intentionally constructed under the assumption of a known external control curve to isolate and evaluate the operating characteristics of the proposed score tests, restricted mean survival time test, and max-Combo procedure across the targeted non-proportional hazards patterns (early, middle, delayed, and crossing effects). This design choice follows the standard approach in many methodological papers on one-sample tests to first establish performance under idealized conditions before layering in additional sources of variability. The abstract does flag the potential impact of estimation uncertainty and model misspecification, but we acknowledge that the superiority claim for max-Combo would be strengthened by explicit quantification of this effect. We will therefore revise the simulation study to include scenarios in which the control survival curve is estimated from finite historical data (e.g., via bootstrap resampling or parametric fitting with sampling from the estimated distribution in each replicate). These new results will be reported alongside the existing ones, with appropriate discussion of how the relative power of max-Combo versus OSLRT changes under realistic estimation error. We believe this addition will directly address the referee's point without altering the core methodological contributions. revision: yes
Circularity Check
No circularity: explicit model-based test constructions evaluated on independent simulations
full rationale
The paper explicitly extends Finkelstein's OSLRT formulation into score tests under piecewise exponential and accelerated hazards models, adapts the RMST test, and defines the max-Combo combination procedure from these components. Power comparisons are obtained from simulation studies that generate data under specified scenarios (early/middle/delayed effects, crossing hazards) and treat these as external benchmarks. No derivation step reduces a claimed result to a fitted parameter from the same dataset, nor does any central claim rest on a self-citation chain or ansatz smuggled from prior author work. The noted uncertainty in external control curve estimation affects simulation realism but does not create a definitional loop within the paper's own equations or procedures.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption External control survival curve can be estimated without substantial bias from historical data
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We extend Finkelstein’s formulation of OSLRT as a score test under PH by using a piecewise exponential model with change-points (CPs) for early, middle and delayed treatment effects and an accelerated hazards model for crossing hazards. … The max-Combo test is more powerful than the OSLRT whatever the scenarios
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The developed score tests are as conservative as the OSLRT … Uncertainty on the survival curve estimate of the external control group and model misspecification may have a significant impact on performance.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.