On the Conservativeness of Robust Variance Estimators in Propensity Score Weighted Cox Models

Fumitaka Shimizu; Hiroya Morita; Masataka Taguri; Shunichiro Orihara

arxiv: 2604.15104 · v1 · submitted 2026-04-16 · 📊 stat.ME

On the Conservativeness of Robust Variance Estimators in Propensity Score Weighted Cox Models

Hiroya Morita , Shunichiro Orihara , Fumitaka Shimizu , Masataka Taguri This is my paper

Pith reviewed 2026-05-10 10:40 UTC · model grok-4.3

classification 📊 stat.ME

keywords propensity score weightingCox modelrobust varianceconservative estimatorweighted survival analysistreatment effect estimationsandwich variance

0 comments

The pith

Robust variance estimators are not necessarily conservative in propensity score weighted Cox models when using non-ATE weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether the standard robust variance estimator, which treats estimated propensity score weights as fixed, continues to be conservative once the weighting scheme moves away from the average treatment effect. An asymptotic comparison of the robust sandwich variance against the full variance that includes the influence function for the propensity score parameters shows that the inequality does not hold for other common weights. Monte Carlo simulations and a real-data re-analysis confirm that the robust variance can fall below the correct variance, producing confidence intervals that are too narrow.

Core claim

Under non-ATE weighting schemes in propensity score weighted Cox models, the robust variance estimator that ignores the variability from estimating the propensity scores is not necessarily larger than the variance estimator that accounts for it; analytical comparisons, simulations, and real data examples show cases where the robust variance is smaller, leading to potential undercoverage of confidence intervals.

What carries the argument

The asymptotic comparison between the robust sandwich variance (omitting the weight-estimation term) and the full variance estimator that includes the derivative of the estimating equations with respect to the propensity score parameters, evaluated under different weight functions in the partial likelihood.

If this is right

The robust variance remains conservative when ATE weights are used.
For ATT, ATC, and similar non-ATE weights the robust variance can be smaller than the variance that accounts for weight estimation.
Variance estimators that incorporate uncertainty from propensity score estimation are required to maintain nominal coverage when non-ATE weights are applied.
These patterns appear under the standard regularity conditions for the Cox model and propensity score estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Software packages that default to the robust variance for weighted Cox models may need to switch their default when users select non-ATE schemes.
The same conservativeness question can be examined in other survival models such as the accelerated failure time model or discrete-time hazard models.
Identifying the precise features of the weight function that determine whether the robust variance exceeds the full variance would allow analysts to decide a priori which estimator to use.

Load-bearing premise

The comparison and simulations rely on the propensity score model being correctly specified and on the examined non-ATE weighting schemes being representative of common practice.

What would settle it

A large-scale simulation that draws repeated samples from a known population, computes the Monte Carlo variance of the treatment coefficient under a fixed non-ATE weight, and checks whether the average robust variance lies below that Monte Carlo variance.

read the original abstract

In propensity score weighted analysis, robust variance that does not account for weight estimation is commonly used. In propensity score weighted Cox models (CoxPSW), the robust variance is known to be conservative when weights for the average treatment effect (ATE) are used, but it remains unclear whether this conservativeness also holds for other weighting schemes. This study evaluated the performance of the robust variance in CoxPSW when weights other than ATE are applied. We conducted an asymptotic comparison between the robust variance and a variance estimator that accounts for weight estimation under non-ATE weights. Their performance was further evaluated through simulation studies and real data analysis. The analytical results, simulations, and real data analysis indicated that the robust variance is not necessarily conservative in CoxPSW when weights other than ATE are used. These findings suggest that variance estimators that account for weight estimation should be used when applying non-ATE weights in CoxPSW.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows robust variance in CoxPSW is not necessarily conservative for non-ATE weights, with a clean asymptotic comparison and supporting simulations.

read the letter

This paper's main message is that the robust variance estimator, which ignores propensity score estimation, is conservative for ATE weights in weighted Cox models but loses that guarantee for other schemes such as ATT, ATC, or overlap weights. They resolve the open question by deriving an asymptotic comparison that can show the robust variance falling below the fuller estimator under non-ATE weights, then check the pattern in simulations and one real dataset. The extension is straightforward and useful because many applied papers default to robust variance without checking the weight type. The analytical step is the clearest part: it directly contrasts the two variance expressions and identifies where the inequality reverses. Simulations appear to cover multiple weighting schemes and sample sizes, which gives the claim some practical weight. The real-data example is thin on details but at least illustrates that the difference can appear in actual numbers. The soft spot is the reliance on standard regularity conditions for the Cox partial likelihood and the propensity score estimator when the weight map changes; if those conditions are not verified or discussed for the specific non-ATE functions, the comparison could be narrower than stated. The paper does not appear to overclaim or hide fitting issues. This is the sort of targeted methodological note that matters to people doing causal survival analysis in epidemiology or clinical research. It does not open new territory but fills a gap that affects routine variance reporting. I would send it to peer review because the question is well-defined, the approach is direct, and the result has immediate implications for how analysts choose variance estimators in weighted Cox models.

Referee Report

2 major / 2 minor

Summary. The paper examines whether the robust variance estimator (ignoring propensity score weight estimation) remains conservative in Cox proportional hazards models weighted by propensity scores when using non-ATE weighting schemes such as ATT, ATC, or overlap weights. It performs an asymptotic comparison of this robust variance against a fuller estimator that accounts for weight estimation, supplements the comparison with simulation studies, and illustrates the findings with real data analysis. The central conclusion is that the robust variance is not necessarily conservative under non-ATE weights, so variance estimators incorporating weight estimation are recommended.

Significance. If the asymptotic result and supporting evidence hold, the finding has direct implications for causal inference practice with time-to-event outcomes, as many applied analyses rely on robust variances in PS-weighted Cox models. Demonstrating that conservativeness fails for common non-ATE estimands would justify routine use of more complete variance estimators. The paper's combination of analytic derivation, Monte Carlo evaluation, and empirical example is a methodological strength.

major comments (2)

[Asymptotic comparison section] The asymptotic comparison (detailed in the methods section on variance derivations) shows that the robust variance is not necessarily larger than the weight-adjusted estimator under non-ATE weights. However, this comparison implicitly relies on regularity conditions (Donsker classes, Lipschitz continuity of the weight map, and standard Cox partial-likelihood regularity) that are not explicitly verified or stated for the specific non-ATE schemes examined (ATT, overlap weights, etc.). Any violation could reverse the reported inequality.
[Simulation studies section] Simulation results are invoked to support the claim that the robust variance is not conservative, yet the manuscript provides no tabulated coverage rates, bias, or variance ratios for the non-ATE scenarios that would allow direct assessment of whether the asymptotic finding translates to finite samples under the same regularity conditions.

minor comments (2)

[Abstract] The abstract would benefit from naming the exact non-ATE weighting functions studied and briefly indicating the simulation design (sample sizes, censoring rates, propensity score model).
[Methods] Notation for the weight functions and the two variance estimators should be introduced consistently in the methods section to facilitate comparison with the ATE case already in the literature.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of rigor and presentation that we address below. We provide point-by-point responses to the major comments.

read point-by-point responses

Referee: [Asymptotic comparison section] The asymptotic comparison (detailed in the methods section on variance derivations) shows that the robust variance is not necessarily larger than the weight-adjusted estimator under non-ATE weights. However, this comparison implicitly relies on regularity conditions (Donsker classes, Lipschitz continuity of the weight map, and standard Cox partial-likelihood regularity) that are not explicitly verified or stated for the specific non-ATE schemes examined (ATT, overlap weights, etc.). Any violation could reverse the reported inequality.

Authors: We agree that the regularity conditions underlying the asymptotic comparison merit explicit discussion. In the revised manuscript we will add a short paragraph immediately following the variance derivations that confirms these conditions hold for the non-ATE schemes. Specifically, the ATT, ATC, and overlap weights are bounded and Lipschitz continuous functions of the propensity score (itself estimated under standard parametric or nonparametric assumptions), placing the weighted score functions in a Donsker class. The remaining Cox partial-likelihood regularity conditions are the same as those already invoked for the ATE case and are satisfied under the paper’s maintained assumptions of correct specification and positivity. With these conditions stated, the reported asymptotic inequality is preserved and no reversal occurs. revision: yes
Referee: [Simulation studies section] Simulation results are invoked to support the claim that the robust variance is not conservative, yet the manuscript provides no tabulated coverage rates, bias, or variance ratios for the non-ATE scenarios that would allow direct assessment of whether the asymptotic finding translates to finite samples under the same regularity conditions.

Authors: We acknowledge that the simulation section would be strengthened by explicit numerical summaries. In the revision we will insert a new table (or expand the existing simulation table) that reports, for each non-ATE weighting scheme, the empirical coverage of nominal 95% intervals, the bias of the hazard-ratio estimator, and the ratio of the robust variance to the full variance estimator across all simulated scenarios. These quantities will be presented alongside the existing figures so that readers can directly verify that the lack of conservativeness observed in the asymptotics is also evident in finite samples. revision: yes

Circularity Check

0 steps flagged

No significant circularity; asymptotic comparison and simulations are independent of inputs

full rationale

The paper's central claim rests on an asymptotic comparison of the robust variance estimator against one that accounts for weight estimation, under non-ATE weighting schemes, together with simulation studies and real-data checks. No derivation step reduces the result to a fitted quantity by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified. The comparison is presented as a direct mathematical evaluation under stated regularity conditions for the Cox model and propensity-score estimation; these conditions are external to the target inequality and do not presuppose the conservativeness result. Simulations and data analysis supply separate empirical corroboration. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard asymptotic regularity conditions for Cox models and propensity score estimation that are not detailed in the abstract; no free parameters or invented entities are mentioned.

axioms (1)

standard math Standard regularity conditions for asymptotic normality of Cox model estimators and propensity score weights
Invoked implicitly for the asymptotic comparison between robust and weight-adjusted variance estimators

pith-pipeline@v0.9.0 · 5467 in / 1153 out tokens · 42597 ms · 2026-05-10T10:40:09.063142+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages

[1]

Shu D, Young JG, Toh S, Wang R

1. Shu D, Young JG, Toh S, Wang R. Variance estimation in inverse probability weighted Cox models. Biometrics. 2021;77(3):1101-1117

work page 2021

[1] [1]

Shu D, Young JG, Toh S, Wang R

1. Shu D, Young JG, Toh S, Wang R. Variance estimation in inverse probability weighted Cox models. Biometrics. 2021;77(3):1101-1117

work page 2021