pith. sign in

arxiv: 2605.10088 · v2 · pith:IBWGQQ37new · submitted 2026-05-11 · 📊 stat.ME

Sample size and power calculations for causal inference with time-to-event outcomes

Pith reviewed 2026-05-20 22:46 UTC · model grok-4.3

classification 📊 stat.ME
keywords sample size calculationpower analysiscausal inferencetime-to-event datamarginal hazard ratioinverse probability weightingCox proportional hazards modelpropensity score
0
0 comments X

The pith

A new analytical sample size formula for marginal hazard ratios in causal survival studies uses the asymptotic variance of the inverse probability weighted Cox estimator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an analytical sample size and power formula for estimating the marginal hazard ratio from a marginal structural Cox model in causal inference with time-to-event data. It first extends robust sandwich variance theory to obtain the closed-form asymptotic variance of the inverse probability weighted partial likelihood estimator. This variance expression then yields a sample size formula that holds for any prespecified effect size. The formula applies directly to randomized trials using only treatment proportion, effect size, and event rate, while observational studies require one extra input: an overlap coefficient that summarizes covariate balance between groups. It also supplies a general variance inflation method for any propensity score balancing weights and corrects inaccuracies in traditional log-rank sample size approaches.

Core claim

By extending the robust sandwich variance to the inverse probability weighted partial likelihood estimator under the marginal structural Cox proportional hazards model, the paper obtains an explicit asymptotic variance formula that supports a closed-form sample size calculation valid at any chosen marginal hazard ratio for both randomized trials and observational studies.

What carries the argument

The asymptotic variance of the inverse probability weighted partial likelihood estimator for the marginal structural Cox proportional hazards model

If this is right

  • Randomized trial sample size calculations need only treatment proportion, effect size, and event rate as inputs.
  • Observational study calculations require one additional overlap coefficient that captures covariate similarity between groups.
  • The same baseline variance supports a general inflation adjustment for any choice of propensity score balancing weights.
  • The formula corrects systematic misstatements that appear in classic log-rank-based sample size methods when applied to causal estimands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The overlap coefficient could be estimated directly from observed data to guide whether an observational study is feasible before full data collection.
  • The variance formula might be adapted to other survival models such as accelerated failure time or additive hazards once their sandwich forms are derived.
  • Routine use of the online calculator could reduce the frequency of underpowered observational survival studies by making covariate overlap an explicit design input.

Load-bearing premise

The marginal structural Cox proportional hazards model must be correctly specified and the propensity score model must produce valid weights so that the inverse probability weighted estimator remains consistent for the marginal hazard ratio.

What would settle it

A Monte Carlo simulation in which the empirical coverage of the derived sample size formula reaches the target power level when the marginal structural Cox model and propensity score weights are correctly specified, but falls short when either is misspecified.

read the original abstract

This paper develops power and sample size formulas for causal inference with time-to-event outcomes. The target estimand is the marginal hazard ratio: the coefficient of a marginal structural Cox proportional hazard model with treatment as the only predictor. We extend the robust sandwich variance theory and derive the analytical form of the asymptotic variance for the inverse probability weighted partial likelihood estimator. Building on this, we derive a new analytical sample size formula valid at any prespecified effect size, applicable to both randomized trials and observational studies. For randomized trials, the formula requires only the canonical inputs of treatment proportion, effect size, and event rate. The new formula corrects the mischaracterization of classic log-rank-based formulas. For observational studies, one additional input suffices: an overlap coefficient summarizing covariate similarity between comparison groups. We further develop a variance inflation approach applicable to any propensity score balancing weights, anchored to the corrected baseline variance. We provide an online calculator and an R package 'PSpower' to implement the method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript develops analytical power and sample size formulas for estimating the marginal hazard ratio in time-to-event studies under causal inference. It extends robust sandwich variance theory to obtain the asymptotic variance of the inverse-probability-weighted partial-likelihood estimator in a marginal structural Cox model, then derives a closed-form sample-size expression that applies to both randomized trials (using only treatment proportion, effect size, and event rate) and observational studies (adding a single overlap coefficient). A variance-inflation approach for general propensity-score balancing weights is also presented, together with an R package 'PSpower' and online calculator.

Significance. If the central derivations are correct, the work supplies a practical, analytically grounded tool for study planning that corrects known limitations of log-rank-based formulas and extends them to observational settings with minimal additional inputs. The explicit provision of reproducible software (R package and calculator) is a clear strength that supports immediate usability and verification.

major comments (1)
  1. [§3.2] §3.2 (Asymptotic variance of the IPW partial-likelihood estimator): The derived variance expression for observational data relies on a marginal overlap coefficient but does not incorporate the influence-function contribution arising from estimation of the propensity-score parameters. Standard semiparametric results for IPW estimators require that the full score for the PS model be included in the sandwich; omitting it produces an understated variance that is load-bearing for the subsequent sample-size formula.
minor comments (2)
  1. [Abstract] Abstract: The claim that the new formula 'corrects the mischaracterization of classic log-rank-based formulas' would be clearer if a specific prior formula (with equation reference) were cited as an example of the error being fixed.
  2. [Software] Software section: No numerical validation (e.g., simulation comparing analytic variance to Monte-Carlo variance under estimated PS) is described; adding a small table of such checks would improve credibility without altering the central claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address the single major comment below and indicate the revisions we will make to improve the presentation of our variance derivation.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Asymptotic variance of the IPW partial-likelihood estimator): The derived variance expression for observational data relies on a marginal overlap coefficient but does not incorporate the influence-function contribution arising from estimation of the propensity-score parameters. Standard semiparametric results for IPW estimators require that the full score for the PS model be included in the sandwich; omitting it produces an understated variance that is load-bearing for the subsequent sample-size formula.

    Authors: We appreciate the referee's observation on the semiparametric influence function. Our derivation in §3.2 obtains the asymptotic variance of the IPW partial-likelihood estimator for the marginal structural Cox model by extending the robust sandwich formula and summarizing the effect of the weights through a single marginal overlap coefficient. This choice yields a closed-form expression that depends only on quantities available at the design stage (treatment proportion, event rate, effect size, and overlap). We acknowledge that the standard efficient influence function for IPW estimators augments the estimating equation with the score of the propensity-score model, and that omitting this term does not produce the fully efficient asymptotic variance. Because the sample-size formula is intended for use before any propensity-score model has been selected, incorporating a specific PS score would require additional parametric assumptions that would undermine the generality and simplicity of the method. We will revise the manuscript to state this modeling assumption explicitly, to note that the reported variance corresponds to the case of known propensity scores, and to discuss that the resulting sample-size recommendation is therefore slightly conservative in practice. We view this as a partial revision that preserves the practical utility of the formula while addressing the referee's concern. revision: partial

Circularity Check

0 steps flagged

Derivation of asymptotic variance for IPW partial likelihood and sample size formula builds directly on established robust sandwich theory without reducing to self-definition or fitted inputs by construction.

full rationale

The paper states it extends robust sandwich variance theory to obtain an explicit asymptotic variance for the IPW-weighted partial likelihood estimator of the marginal HR, then derives the sample size formula from that variance. No equations or steps in the provided description reduce the target formula to a fitted quantity, a self-citation chain, or an ansatz smuggled in by prior work of the same authors. The central claim remains an analytical derivation from standard semiparametric variance results, applicable to both RCTs and observational studies via an overlap coefficient; this is self-contained against external benchmarks and receives only a minor score for any incidental self-citation that is not load-bearing.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard survival analysis and causal inference assumptions plus one user-supplied summary measure; no new entities are postulated.

free parameters (1)
  • overlap coefficient
    User-provided scalar summarizing covariate similarity between treatment groups in observational studies; required as an additional input for the sample size formula.
axioms (2)
  • domain assumption Marginal structural Cox proportional hazards model is correctly specified for the target estimand
    The paper defines the target as the coefficient of this model with treatment as sole predictor.
  • domain assumption Propensity score model yields weights that produce consistent IPW estimation
    Required for the asymptotic variance derivation to apply to the estimator.

pith-pipeline@v0.9.0 · 5696 in / 1429 out tokens · 54452 ms · 2026-05-20T22:46:41.861618+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages

  1. [1]

    Statistical methods in medical research , volume=

    The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes , author=. Statistical methods in medical research , volume=. 2017 , publisher=

  2. [2]

    Wood, Natalya Pya, and Benjamin Säfken

    Li, F. and Morgan, K. L. and Zaslavsky, A. M. , title =. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2016.1260466 , year =

  3. [3]

    American journal of epidemiology , volume=

    Addressing extreme propensity scores via the overlap weights , author=. American journal of epidemiology , volume=. 2019 , publisher=

  4. [4]

    Journal of the American Statistical Association , volume=

    Randomization analysis of experimental data: The Fisher randomization test comment , author=. Journal of the American Statistical Association , volume=. 1980 , publisher=

  5. [5]

    1983 , JOURNAL =

    Rosenbaum, P R and Rubin, D B , TITLE =. 1983 , JOURNAL =

  6. [6]

    Epidemiology , volume=

    The hazards of hazard ratios , author=. Epidemiology , volume=. 2010 , publisher=

  7. [7]

    Statistics in medicine , volume=

    Power analysis for multivariable Cox regression models , author=. Statistics in medicine , volume=. 2019 , publisher=

  8. [8]

    Statistics in Medicine , volume=

    Informing power and sample size calculations when using inverse probability of treatment weighting using the propensity score , author=. Statistics in Medicine , volume=. 2021 , publisher=

  9. [9]

    Journal of computational biology , volume=

    Estimating dataset size requirements for classifying DNA microarray data , author=. Journal of computational biology , volume=. 2003 , publisher=

  10. [10]

    Biometrics , volume=

    Power and sample size for observational studies of point exposure effects , author=. Biometrics , volume=. 2022 , publisher=

  11. [11]

    Statistics in Medicine , volume=

    Sample size calculation for randomized trials via inverse probability of response weighting when outcome data are missing at random , author=. Statistics in Medicine , volume=. 2023 , publisher=

  12. [12]

    Aitchison and S

    J. Aitchison and S. M. Shen , journal =. Logistic-Normal Distributions: Some Properties and Uses , urldate =

  13. [13]

    and Littleword, J.E

    Hardy, G.H. and Littleword, J.E. and P. Inequalities , publisher =

  14. [14]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1972 , publisher=

  15. [15]

    Econometrica , volume=

    Efficient estimation of average treatment effects using the estimated propensity score , author=. Econometrica , volume=. 2003 , publisher=

  16. [16]

    1995 , publisher=

    Probability and measure , author=. 1995 , publisher=

  17. [17]

    Statistics in medicine , volume=

    Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , author=. Statistics in medicine , volume=. 2004 , publisher=

  18. [18]

    Bulletin of the Calcutta Mathematical Society , volume=

    On a measure of divergence between two statistical populations defined by their probability distribution , author=. Bulletin of the Calcutta Mathematical Society , volume=

  19. [19]

    Journal of Educational and Behavioral Statistics , volume=

    Statistical power for causally defined indirect effects in group-randomized trials with individual-level mediators , author=. Journal of Educational and Behavioral Statistics , volume=. 2017 , publisher=

  20. [20]

    Biometrika , volume=

    Dealing with limited overlap in estimation of average treatment effects , author=. Biometrika , volume=. 2009 , publisher=

  21. [21]

    Behavior Research Methods , volume=

    Sample size and power calculations for causal mediation analysis: a tutorial and shiny app , author=. Behavior Research Methods , volume=. 2024 , publisher=

  22. [22]

    1996 , JOURNAL =

    Connors, A F and Speroff, T and Dawson, N V and Thomas, C and Harrell, F E and Wagner, D and Desbiens, N and Goldman, L and Wu, A W and Califf, R M and Fulkerson, W J and Vidaillet, H and Broste, S and Bellamy, P and Lynn, J and Knaus, W A , TITLE =. 1996 , JOURNAL =

  23. [23]

    Statistics in medicine , volume=

    Propensity score weighting for covariate adjustment in randomized clinical trials , author=. Statistics in medicine , volume=. 2021 , publisher=

  24. [24]

    Statistics in medicine , volume=

    Inverse probability weighting for covariate adjustment in randomized studies , author=. Statistics in medicine , volume=. 2014 , publisher=

  25. [25]

    Annals of Statistics , pages=

    Sample size and power calculations for causal inference in observational studies , author=. Annals of Statistics , pages=

  26. [26]

    Statistics in medicine , volume=

    On the propensity score weighting analysis with survival outcome: Estimands, estimation, and inference , author=. Statistics in medicine , volume=. 2018 , publisher=

  27. [27]

    Biometrika , volume=

    The asymptotic properties of nonparametric tests for comparing survival distributions , author=. Biometrika , volume=. 1981 , doi=

  28. [28]

    Biometrics , pages=

    Sample-size formula for the proportional-hazards regression model , author=. Biometrics , pages=. 1983 , publisher=

  29. [29]

    Controlled Clinical Trials , volume=

    Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates , author=. Controlled Clinical Trials , volume=. 2000 , doi=

  30. [30]

    Biometrics , volume=

    Tables of the number of patients required in clinical trials using the log-rank test , author=. Biometrics , volume=. 1982 , publisher=

  31. [31]

    Biometrics , volume=

    Variance estimation in inverse probability weighted Cox models , author=. Biometrics , volume=. 2021 , publisher=

  32. [32]

    Evaluation & the Health Professions , volume=

    An overview of variance inflation factors for sample-size calculation , author=. Evaluation & the Health Professions , volume=. 2003 , publisher=

  33. [33]

    Journal of the American statistical Association , volume=

    The robust inference for the Cox proportional hazards model , author=. Journal of the American statistical Association , volume=. 1989 , publisher=

  34. [34]

    Powell and James H

    James L. Powell and James H. Stock and Thomas M. Stoker , journal =. Semiparametric Estimation of Index Coefficients , urldate =

  35. [35]

    Biometrical Journal , volume=

    Closed-form variance estimator for weighted propensity score estimators with survival outcome , author=. Biometrical Journal , volume=. 2018 , publisher=

  36. [36]

    1981 , issn =

    Introduction to sample size determination and power analysis for clinical trials , journal =. 1981 , issn =. doi:https://doi.org/10.1016/0197-2456(81)90001-5 , author =

  37. [37]

    Biometrics , pages=

    Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification , author=. Biometrics , pages=. 1986 , publisher=

  38. [38]

    Statistics in medicine , volume=

    Sample size and power for a logrank test and Cox proportional hazards model with multiple groups and strata, or a quantitative covariate with multiple strata , author=. Statistics in medicine , volume=. 2013 , publisher=

  39. [39]

    Clinical Trials , volume=

    Causal interpretation of the hazard ratio in randomized clinical trials , author=. Clinical Trials , volume=. 2024 , publisher=

  40. [40]

    2020 , publisher=

    Causal inference: What if , author=. 2020 , publisher=

  41. [41]

    1965 , publisher=

    Survey sampling , author=. 1965 , publisher=

  42. [42]

    2017 , publisher=

    Sample size calculations in clinical research , author=. 2017 , publisher=

  43. [43]

    Biometrika , volume=

    Partial likelihood , author=. Biometrika , volume=. 1975 , publisher=

  44. [44]

    Binder , journal =

    David A. Binder , journal =. On the Variances of Asymptotically Normal Estimators from Complex Surveys , urldate =

  45. [45]

    Binder , journal =

    David A. Binder , journal =. Fitting Cox's Proportional Hazards Models from Survey Data , urldate =

  46. [46]

    1997 , publisher=

    Counting processes and survival analysis , author=. 1997 , publisher=

  47. [47]

    Biometrika , volume =

    PAIK, MYUNGHEE CHO and TSAI, WEI-YANN , title =. Biometrika , volume =. 1997 , issn =

  48. [48]

    Lin, D. Y. and Wei, L. J. and Ying, Zhiliang , title =. Biometrika , volume =. 1998 , issn =

  49. [49]

    C. A. Struthers and J. D. Kalbfleisch , journal =. Misspecified Proportional Hazard Models , urldate =

  50. [50]

    Epidemiology , volume=

    Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men , author=. Epidemiology , volume=

  51. [51]

    Statistical methods in medical research , volume=

    Propensity score weighting under limited overlap and model misspecification , author=. Statistical methods in medical research , volume=. 2020 , publisher=

  52. [52]

    Biometrics , pages=

    Sample sizes based on the log-rank statistic in complex clinical trials , author=. Biometrics , pages=. 1988 , publisher=

  53. [53]

    2006 , publisher=

    Semiparametric theory and missing data , author=. 2006 , publisher=

  54. [54]

    Lifetime Data Analysis , volume=

    Subtleties in the interpretation of hazard contrasts , author=. Lifetime Data Analysis , volume=. 2020 , publisher=

  55. [55]

    1980 , publisher=

    Approximation theorems of mathematical statistics , author=. 1980 , publisher=

  56. [56]

    P. K. Andersen and R. D. Gill , journal =. Cox's Regression Model for Counting Processes: A Large Sample Study , urldate =

  57. [57]

    Journal of Official Statistics , volume=

    Weighting for unequal Pi , author=. Journal of Official Statistics , volume=. 1992 , publisher=

  58. [58]

    Journal of official Statistics , volume=

    Methods for design effects , author=. Journal of official Statistics , volume=. 1995 , publisher=

  59. [59]

    Vaart, A. W. van der , year=. Asymptotic Statistics , publisher=

  60. [60]

    Biometrika , volume =

    Lin, DY , title =. Biometrika , volume =. 2000 , issn =

  61. [61]

    Lifetime data analysis , volume=

    Exposure stratified case-cohort designs , author=. Lifetime data analysis , volume=. 2000 , publisher=

  62. [62]

    Lu Tian and David Zucker and L. J. Wei , journal =. On the Cox Model with Time-Varying Regression Coefficients , urldate =

  63. [63]

    and Wellner, Jon A

    Breslow, Norman E. and Wellner, Jon A. , title =. Scandinavian Journal of Statistics , volume =. doi:https://doi.org/10.1111/j.1467-9469.2006.00523.x , year =

  64. [64]

    , title =

    Zhang, Min and Schaubel, Douglas E. , title =. Biometrics , volume =. doi:https://doi.org/10.1111/j.1541-0420.2012.01759.x , year =

  65. [65]

    BMC medical research methodology , volume=

    Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome , author=. BMC medical research methodology , volume=. 2013 , publisher=

  66. [66]

    Biostatistics , volume =

    Xu, Ronghui and O’Quigley, John , title =. Biostatistics , volume =. 2000 , issn =

  67. [67]

    2004 , issn =

    Adjusted survival curves with inverse probability weights , journal =. 2004 , issn =. doi:https://doi.org/10.1016/j.cmpb.2003.10.004 , author =

  68. [68]

    A Paradox concerning Nuisance Parameters and Projected Estimating Functions , urldate =

    Masayuki Henmi and Shinto Eguchi , journal =. A Paradox concerning Nuisance Parameters and Projected Estimating Functions , urldate =

  69. [69]

    Epidemiology , volume=

    Marginal structural models and causal inference in epidemiology , author=. Epidemiology , volume=. 2000 , publisher=

  70. [70]

    Statistics in medicine , volume=

    Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis , author=. Statistics in medicine , volume=. 2016 , publisher=

  71. [71]

    and Mayo, Matthew S

    Phadnis, Milind A. and Mayo, Matthew S. , title =. Biometrical Journal , volume =. doi:https://doi.org/10.1002/bimj.202000043 , year =

  72. [72]

    Statistical Method in Medical Research , volume=

    Asymptotic validity of Schoenfeld’s sample size formula for the Cox proportional hazards model via theWald test approach , author=. Statistical Method in Medical Research , volume=

  73. [73]

    American Journal of Epidemiology , volume =

    Cheng, Chao and Li, Fan and Thomas, Laine E and Li, Fan , title =. American Journal of Epidemiology , volume =. 2022 , issn =

  74. [74]

    Annals of Internal Medicine , volume =

    Fluorouracil plus Levamisole as Effective Adjuvant Therapy after Resection of Stage III Colon Carcinoma: A Final Report , author=. Annals of Internal Medicine , volume =. 1995 , doi =

  75. [75]

    Moertel and Thomas R

    Charles G. Moertel and Thomas R. Fleming and John S. Macdonald and Daniel G. Haller and John A. Laurie and Phyllis J. Goodman and James S. Ungerleider and William A. Emerson and Douglas C. Tormey and John H. Glick and Michael H. Veeder and James A. Mailliard , title =. New England Journal of Medicine , volume =

  76. [76]

    Health Services and Outcomes research methodology , volume=

    Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization , author=. Health Services and Outcomes research methodology , volume=. 2001 , publisher=

  77. [77]

    American Journal of Epidemiology , volume=

    Using big data to emulate a target trial when a randomized trial is not available , author=. American Journal of Epidemiology , volume=. 2016 , publisher=. doi:10.1093/aje/kwv254 , url=

  78. [78]

    Annals of internal medicine , volume=

    The target trial framework for causal inference from observational data: why and when is it helpful? , author=. Annals of internal medicine , volume=. 2025 , publisher=

  79. [80]

    Andersen, P. K. and R. D. Gill (1982). Cox's regression model for counting processes: A large sample study. The Annals of Statistics\/ 10\/ (4), 1100--1120

  80. [81]

    Austin, P. C. (2021). Informing power and sample size calculations when using inverse probability of treatment weighting using the propensity score. Statistics in Medicine\/ 40\/ (27), 6150--6163

Showing first 80 references.