Re-examining and calibrating weighted survival analysis for causal inference

Tobias Gerhard; Wenfu Xu; Yi Zhang; Zhiqiang Tan

arxiv: 2605.15702 · v1 · pith:5EFIEVCOnew · submitted 2026-05-15 · 📊 stat.ME

Re-examining and calibrating weighted survival analysis for causal inference

Wenfu Xu , Yi Zhang , Tobias Gerhard , Zhiqiang Tan This is my paper

Pith reviewed 2026-05-20 16:32 UTC · model grok-4.3

classification 📊 stat.ME

keywords causal inferencesurvival analysisweighted Kaplan-Meiercalibrated estimationinverse probability weightingaugmented estimationhigh-dimensional datatime-to-event outcomes

0 comments

The pith

Linking weighted Kaplan-Meier to augmented IPW and adding calibrated estimation improves point and variance inference for causal survival outcomes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper re-examines the weighted Kaplan-Meier estimator used in causal studies with time-to-event data. It formally embeds this estimator inside the augmented inverse probability weighted framework so that both the survival probability estimate and its variance follow from the same general theory. To fix problems with existing weighted approaches such as poor coverage or instability when many covariates are present, the authors introduce calibrated estimation procedures that adjust the weights in both low- and high-dimensional regimes. These calibrated estimators come with supporting theory and are shown in simulations to produce coverage rates closer to the nominal level and shorter confidence intervals than standard weighted methods. An application to adjunctive psychotropic treatment for schizophrenia illustrates the same gains on real data.

Core claim

The weighted Kaplan-Meier estimator is shown to be a member of the augmented inverse probability weighted class, allowing consistent point estimation and valid variance estimation under the usual causal assumptions; calibrated estimation then modifies the weights to satisfy additional balancing conditions, yielding estimators for survival probabilities and hazard ratios that retain asymptotic validity while improving finite-sample behavior in both low- and high-dimensional covariate settings.

What carries the argument

Calibrated estimation, the procedure that solves for weights satisfying moment conditions derived from the propensity-score model while staying close to the original inverse-probability weights.

If this is right

Coverage proportions in finite samples move closer to the target nominal level.
Confidence intervals become shorter while remaining valid.
The same calibration approach extends to high-dimensional covariate settings with theoretical support.
Hazard-ratio estimates obtained via weighted Breslow-Peto methods can be improved by the same calibration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar calibration could be applied to other survival estimators such as the Aalen-Johansen estimator for competing risks.
The approach may reduce sensitivity to mild misspecification of the propensity-score model in observational studies.
In practice the shorter intervals could change conclusions about treatment effectiveness in large electronic-health-record analyses.

Load-bearing premise

The calibration step succeeds in correcting bias without adding new instability, even when the number of covariates is large.

What would settle it

A simulation in which the calibrated estimators show coverage rates farther from the nominal 95 percent or produce wider intervals than the uncalibrated weighted Kaplan-Meier estimator.

read the original abstract

Causal inference with time-to-event outcomes is fundamental in various scientific studies. In a static setup with fitted propensity scores, weighted Kaplan-Meier estimation for survival probabilities and weighted Breslow-Peto estimation for hazard ratios have been widely used, but their statistical properties have been overlooked or studied only to a limited extent. We re-examine the weighted Kaplan-Meier method by formally linking it with the general framework of augmented inverse probability weighted estimation including both point and variance estimation. Furthermore, to address limitations of existing weighted methods for survival analysis, we develop new methods and associated theory through calibrated estimation in both low-dimensional and high-dimensional settings. We present a simulation study and an empirical application on the effectiveness of adjunctive psychotropic treatments for patients with schizophrenia. The calibrated methods yield coverage proportions closer to target ones in the simulation study, and produce shorter confidence intervals in both simulation and empirical studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript re-examines weighted Kaplan-Meier estimation for survival probabilities and weighted Breslow-Peto estimation for hazard ratios in a static causal inference setup with fitted propensity scores. It formally links these to the augmented inverse probability weighted (AIPW) framework for both point and variance estimation, then develops new calibrated estimation procedures and associated theory in low-dimensional and high-dimensional settings to address limitations of existing weighted survival methods. Simulation results are reported to show coverage proportions closer to nominal levels and shorter confidence intervals, with an empirical application to adjunctive psychotropic treatments for schizophrenia patients.

Significance. If the calibrated estimators preserve valid inference while improving finite-sample performance, the work could strengthen causal survival analysis in observational studies, especially those with high-dimensional covariates. The explicit AIPW linkage and provision of variance formulas represent a methodological strength; the simulation and real-data results add practical value if the high-dimensional calibration remains stable under realistic misspecification.

major comments (2)

[High-dimensional calibrated estimation] High-dimensional calibrated estimation section: the central claim that calibration yields valid inference and shorter intervals rests on the assumption that the calibration step does not inject additional bias or instability when nuisance models (propensity or censoring) are estimated under sparsity or regularization; the manuscript should supply explicit conditions on the calibration weights or additional simulations with p/n non-negligible to address this.
[Simulation study] Simulation study: coverage and interval-length improvements are reported, yet without the explicit variance formulas, data exclusion rules, or sensitivity checks to nuisance-model misspecification, the support for the claim that calibrated methods outperform standard weighted KM/Breslow cannot be fully verified.

minor comments (2)

[Abstract] The abstract could more explicitly list the concrete limitations of existing weighted methods (e.g., poor coverage or overly wide intervals) that the calibrated procedures are designed to remedy.
[Notation and definitions] Notation for the calibrated weights and the AIPW augmentation term should be introduced with a single consistent symbol set to improve readability across the low- and high-dimensional sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and robustness of our manuscript. Below we address each major comment in turn.

read point-by-point responses

Referee: High-dimensional calibrated estimation section: the central claim that calibration yields valid inference and shorter intervals rests on the assumption that the calibration step does not inject additional bias or instability when nuisance models (propensity or censoring) are estimated under sparsity or regularization; the manuscript should supply explicit conditions on the calibration weights or additional simulations with p/n non-negligible to address this.

Authors: Our theoretical development in the high-dimensional setting establishes that the calibration procedure preserves the asymptotic properties of the AIPW estimator under standard sparsity assumptions on the nuisance functions. The calibration weights are chosen to minimize variance subject to unbiasedness constraints, which by construction does not introduce bias. We will revise the manuscript to make these conditions more explicit and to include additional simulation results with p/n ratios that are non-negligible (e.g., p/n = 0.05 and 0.1) to empirically verify stability under regularization. revision: yes
Referee: Simulation study: coverage and interval-length improvements are reported, yet without the explicit variance formulas, data exclusion rules, or sensitivity checks to nuisance-model misspecification, the support for the claim that calibrated methods outperform standard weighted KM/Breslow cannot be fully verified.

Authors: We note that the variance formulas are explicitly provided in the theoretical results (Theorems 2 and 4) and implemented in the simulations as described in Section 5.1. Data exclusion is applied for propensity scores outside [0.05, 0.95] to prevent extreme weights, as is common in the literature. However, we acknowledge that sensitivity checks for nuisance model misspecification are not included in the current version. We will add such analyses in the revision to strengthen the empirical evidence. revision: partial

Circularity Check

0 steps flagged

Independent linking to AIPW framework and separate development of calibrated estimators show no reduction by construction.

full rationale

The paper's core steps consist of formally linking the weighted Kaplan-Meier estimator to the augmented inverse probability weighted (AIPW) framework for both point and variance estimation, followed by the development of new calibrated estimation procedures in low- and high-dimensional settings. These are presented as distinct contributions that address limitations of existing weighted methods rather than deriving directly from fitted inputs or self-citations. No equations or claims reduce the new estimators to prior fitted quantities by construction, and the simulation study plus empirical application provide external checks. This yields only minor circularity at most, consistent with a score of 2 for a self-contained derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard causal inference assumptions such as correct propensity score specification and no unmeasured confounding; no free parameters, new entities, or ad-hoc axioms are mentioned in the abstract.

axioms (1)

domain assumption Standard causal assumptions including no unmeasured confounding and correct specification of propensity scores.
Required for validity of weighted and IPW-based estimators in observational survival data.

pith-pipeline@v0.9.0 · 5678 in / 1202 out tokens · 55593 ms · 2026-05-20T16:32:53.033733+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop new methods and associated theory through calibrated estimation in both low-dimensional and high-dimensional settings... calibrated augmented IPW estimators ˆS1k,CAL and ˆθCAL
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The calibrated methods yield coverage proportions closer to target ones... shorter confidence intervals

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

and Tan, Z

Ghosh, S. and Tan, Z. (2022) Doubly robust semiparametric in ference using regularized calibrated estimation with high-dimensional data, Bernoulli, 28, 1675–1703

work page 2022
[2]

(2020) Model-assisted inference for treatment eﬀect s using regularized calibrated estima- tion with high-dimensional data, Annals of Statistics , 48, 811–837

Tan, Z. (2020) Model-assisted inference for treatment eﬀect s using regularized calibrated estima- tion with high-dimensional data, Annals of Statistics , 48, 811–837

work page 2020
[3]

(2023) Consistent and robust inference in hazard pro bability and odds models with discrete-time survival data, Lifetime Data Analysis , 29, 555–584

Tan, Z. (2023) Consistent and robust inference in hazard pro bability and odds models with discrete-time survival data, Lifetime Data Analysis , 29, 555–584. 27

work page 2023

[1] [1]

and Tan, Z

Ghosh, S. and Tan, Z. (2022) Doubly robust semiparametric in ference using regularized calibrated estimation with high-dimensional data, Bernoulli, 28, 1675–1703

work page 2022

[2] [2]

(2020) Model-assisted inference for treatment eﬀect s using regularized calibrated estima- tion with high-dimensional data, Annals of Statistics , 48, 811–837

Tan, Z. (2020) Model-assisted inference for treatment eﬀect s using regularized calibrated estima- tion with high-dimensional data, Annals of Statistics , 48, 811–837

work page 2020

[3] [3]

(2023) Consistent and robust inference in hazard pro bability and odds models with discrete-time survival data, Lifetime Data Analysis , 29, 555–584

Tan, Z. (2023) Consistent and robust inference in hazard pro bability and odds models with discrete-time survival data, Lifetime Data Analysis , 29, 555–584. 27

work page 2023