On the Conservativeness of Robust Variance Estimators in Propensity Score Weighted Cox Models
Pith reviewed 2026-05-10 10:40 UTC · model grok-4.3
The pith
Robust variance estimators are not necessarily conservative in propensity score weighted Cox models when using non-ATE weights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under non-ATE weighting schemes in propensity score weighted Cox models, the robust variance estimator that ignores the variability from estimating the propensity scores is not necessarily larger than the variance estimator that accounts for it; analytical comparisons, simulations, and real data examples show cases where the robust variance is smaller, leading to potential undercoverage of confidence intervals.
What carries the argument
The asymptotic comparison between the robust sandwich variance (omitting the weight-estimation term) and the full variance estimator that includes the derivative of the estimating equations with respect to the propensity score parameters, evaluated under different weight functions in the partial likelihood.
If this is right
- The robust variance remains conservative when ATE weights are used.
- For ATT, ATC, and similar non-ATE weights the robust variance can be smaller than the variance that accounts for weight estimation.
- Variance estimators that incorporate uncertainty from propensity score estimation are required to maintain nominal coverage when non-ATE weights are applied.
- These patterns appear under the standard regularity conditions for the Cox model and propensity score estimation.
Where Pith is reading between the lines
- Software packages that default to the robust variance for weighted Cox models may need to switch their default when users select non-ATE schemes.
- The same conservativeness question can be examined in other survival models such as the accelerated failure time model or discrete-time hazard models.
- Identifying the precise features of the weight function that determine whether the robust variance exceeds the full variance would allow analysts to decide a priori which estimator to use.
Load-bearing premise
The comparison and simulations rely on the propensity score model being correctly specified and on the examined non-ATE weighting schemes being representative of common practice.
What would settle it
A large-scale simulation that draws repeated samples from a known population, computes the Monte Carlo variance of the treatment coefficient under a fixed non-ATE weight, and checks whether the average robust variance lies below that Monte Carlo variance.
read the original abstract
In propensity score weighted analysis, robust variance that does not account for weight estimation is commonly used. In propensity score weighted Cox models (CoxPSW), the robust variance is known to be conservative when weights for the average treatment effect (ATE) are used, but it remains unclear whether this conservativeness also holds for other weighting schemes. This study evaluated the performance of the robust variance in CoxPSW when weights other than ATE are applied. We conducted an asymptotic comparison between the robust variance and a variance estimator that accounts for weight estimation under non-ATE weights. Their performance was further evaluated through simulation studies and real data analysis. The analytical results, simulations, and real data analysis indicated that the robust variance is not necessarily conservative in CoxPSW when weights other than ATE are used. These findings suggest that variance estimators that account for weight estimation should be used when applying non-ATE weights in CoxPSW.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines whether the robust variance estimator (ignoring propensity score weight estimation) remains conservative in Cox proportional hazards models weighted by propensity scores when using non-ATE weighting schemes such as ATT, ATC, or overlap weights. It performs an asymptotic comparison of this robust variance against a fuller estimator that accounts for weight estimation, supplements the comparison with simulation studies, and illustrates the findings with real data analysis. The central conclusion is that the robust variance is not necessarily conservative under non-ATE weights, so variance estimators incorporating weight estimation are recommended.
Significance. If the asymptotic result and supporting evidence hold, the finding has direct implications for causal inference practice with time-to-event outcomes, as many applied analyses rely on robust variances in PS-weighted Cox models. Demonstrating that conservativeness fails for common non-ATE estimands would justify routine use of more complete variance estimators. The paper's combination of analytic derivation, Monte Carlo evaluation, and empirical example is a methodological strength.
major comments (2)
- [Asymptotic comparison section] The asymptotic comparison (detailed in the methods section on variance derivations) shows that the robust variance is not necessarily larger than the weight-adjusted estimator under non-ATE weights. However, this comparison implicitly relies on regularity conditions (Donsker classes, Lipschitz continuity of the weight map, and standard Cox partial-likelihood regularity) that are not explicitly verified or stated for the specific non-ATE schemes examined (ATT, overlap weights, etc.). Any violation could reverse the reported inequality.
- [Simulation studies section] Simulation results are invoked to support the claim that the robust variance is not conservative, yet the manuscript provides no tabulated coverage rates, bias, or variance ratios for the non-ATE scenarios that would allow direct assessment of whether the asymptotic finding translates to finite samples under the same regularity conditions.
minor comments (2)
- [Abstract] The abstract would benefit from naming the exact non-ATE weighting functions studied and briefly indicating the simulation design (sample sizes, censoring rates, propensity score model).
- [Methods] Notation for the weight functions and the two variance estimators should be introduced consistently in the methods section to facilitate comparison with the ATE case already in the literature.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments highlight important aspects of rigor and presentation that we address below. We provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [Asymptotic comparison section] The asymptotic comparison (detailed in the methods section on variance derivations) shows that the robust variance is not necessarily larger than the weight-adjusted estimator under non-ATE weights. However, this comparison implicitly relies on regularity conditions (Donsker classes, Lipschitz continuity of the weight map, and standard Cox partial-likelihood regularity) that are not explicitly verified or stated for the specific non-ATE schemes examined (ATT, overlap weights, etc.). Any violation could reverse the reported inequality.
Authors: We agree that the regularity conditions underlying the asymptotic comparison merit explicit discussion. In the revised manuscript we will add a short paragraph immediately following the variance derivations that confirms these conditions hold for the non-ATE schemes. Specifically, the ATT, ATC, and overlap weights are bounded and Lipschitz continuous functions of the propensity score (itself estimated under standard parametric or nonparametric assumptions), placing the weighted score functions in a Donsker class. The remaining Cox partial-likelihood regularity conditions are the same as those already invoked for the ATE case and are satisfied under the paper’s maintained assumptions of correct specification and positivity. With these conditions stated, the reported asymptotic inequality is preserved and no reversal occurs. revision: yes
-
Referee: [Simulation studies section] Simulation results are invoked to support the claim that the robust variance is not conservative, yet the manuscript provides no tabulated coverage rates, bias, or variance ratios for the non-ATE scenarios that would allow direct assessment of whether the asymptotic finding translates to finite samples under the same regularity conditions.
Authors: We acknowledge that the simulation section would be strengthened by explicit numerical summaries. In the revision we will insert a new table (or expand the existing simulation table) that reports, for each non-ATE weighting scheme, the empirical coverage of nominal 95% intervals, the bias of the hazard-ratio estimator, and the ratio of the robust variance to the full variance estimator across all simulated scenarios. These quantities will be presented alongside the existing figures so that readers can directly verify that the lack of conservativeness observed in the asymptotics is also evident in finite samples. revision: yes
Circularity Check
No significant circularity; asymptotic comparison and simulations are independent of inputs
full rationale
The paper's central claim rests on an asymptotic comparison of the robust variance estimator against one that accounts for weight estimation, under non-ATE weighting schemes, together with simulation studies and real-data checks. No derivation step reduces the result to a fitted quantity by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified. The comparison is presented as a direct mathematical evaluation under stated regularity conditions for the Cox model and propensity-score estimation; these conditions are external to the target inequality and do not presuppose the conservativeness result. Simulations and data analysis supply separate empirical corroboration. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard regularity conditions for asymptotic normality of Cox model estimators and propensity score weights
Reference graph
Works this paper leans on
-
[1]
Shu D, Young JG, Toh S, Wang R
1. Shu D, Young JG, Toh S, Wang R. Variance estimation in inverse probability weighted Cox models. Biometrics. 2021;77(3):1101-1117
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.