Robust estimation of occupation probabilities for coarsened multistate processes
Pith reviewed 2026-06-29 03:05 UTC · model grok-4.3
The pith
Augmented inverse probability weighted estimators yield robust occupation probability estimates for coarsened multistate processes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors derive augmented inverse probability weighted estimators for occupation probabilities under coarsening at random. These estimators are doubly robust to misspecification of either the coarsening mechanism or the conditional expectations of the state indicators, and they are efficient when both are correct. The approach identifies the target parameters from observed data without requiring the multistate process to satisfy the Markov property.
What carries the argument
Augmented inverse probability weighted estimators, which weight observed state indicators by the inverse probability of coarsening and augment with predictions from an outcome model.
If this is right
- Consistency holds if at least one of the coarsening or outcome models is correctly specified.
- Efficiency is achieved when both models are correct.
- The estimators apply to processes with time-varying confounders.
- No Markov assumption is needed for identification or estimation.
Where Pith is reading between the lines
- These estimators could be adapted for use in longitudinal studies with similar coarsening patterns.
- Violations of coarsening at random due to unmeasured confounding would require additional methods like instrumental variables for correction.
- Combining the approach with flexible machine learning models for the nuisance parameters may enhance applicability in complex data settings.
Load-bearing premise
Coarsening at random must hold, so that whether data is observed depends only on what has been observed so far.
What would settle it
Generate data from a multistate process where the coarsening probability depends on an unobserved state; the proposed estimators will then exhibit bias, whereas they will be unbiased when coarsening depends only on observed data.
Figures
read the original abstract
We derive augmented inverse probability weighted estimators for occupation probabilities of multistate models under two levels of coarsening; right-censoring and baseline exposure. The key exchangeability assumption for identification is coarsening at random, while allowing for time-varying confounders, but not requiring Markov properties. Using existing techniques from causal inference and missing data literature, the derived estimators have highly desirable robustness and efficiency properties. These properties are demonstrated through both theoretical results, and a simulation study.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives augmented inverse probability weighted estimators for occupation probabilities of multistate models under two levels of coarsening (right-censoring and baseline exposure). Identification uses the coarsening-at-random assumption, permitting time-varying confounders but not requiring Markov properties. The estimators are claimed to achieve double robustness and efficiency by applying standard techniques from causal inference and missing-data literature. These properties are supported by theoretical results and a simulation study.
Significance. If the derivations are correct, the work offers a useful extension of robust estimation methods to multistate processes with coarsened data, avoiding Markov assumptions while handling time-varying confounders. Credit is due for explicitly leveraging existing AIPW techniques to obtain the stated robustness and efficiency properties rather than deriving ad-hoc estimators. This is relevant for applications in epidemiology and survival analysis where occupation probabilities are of interest under incomplete observation.
minor comments (2)
- Abstract: the phrase 'highly desirable robustness and efficiency properties' is vague; replace with a precise statement of the double-robustness and efficiency results (e.g., consistency under correct specification of either the coarsening or outcome model).
- The simulation study section should report the exact data-generating processes, sample sizes, and performance metrics (bias, variance, coverage) for each estimator to allow direct comparison with the theoretical claims.
Simulated Author's Rebuttal
We thank the referee for their positive assessment and recommendation of minor revision. The referee summary correctly captures the paper's focus on augmented IPW estimators for occupation probabilities under coarsening at random in multistate processes, allowing time-varying confounders without Markov assumptions. No major comments were provided in the report.
Circularity Check
No significant circularity identified
full rationale
The paper derives augmented IPW estimators for occupation probabilities by invoking standard identification results and double-robustness techniques from the existing causal inference and missing-data literature (coarsening at random, time-varying confounders, no Markov assumption required). No load-bearing step reduces by construction to a quantity defined only inside the paper, to a fitted parameter renamed as a prediction, or to a self-citation chain whose content is itself unverified. The central claims remain externally grounded and are additionally checked by theory and simulation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Coarsening at random assumption holds and permits identification
Reference graph
Works this paper leans on
-
[1]
Causal Inference: What If
Hern\'. Causal Inference: What If. , DOI=
-
[2]
A Distribution-Free Theory of Nonparametric Regression
László Györfi and Michael Kohler and Adam Krzyżak and Harro Walk. A Distribution-Free Theory of Nonparametric Regression. 2002. doi:10.1007/0-387-22442-4_1
-
[3]
Semiparametric Theory and Missing Data. 2006. doi:10.1007/0-387-37345-4_2
-
[4]
Applied Probability and Queues. 2010. doi:https://doi.org/10.1007/b97236
-
[5]
Vaart, A. W. van der , year=. Asymptotic Statistics , DOI=
-
[6]
and Kuh, Edwin and Welsch, Roy E
Belsley, David A. and Kuh, Edwin and Welsch, Roy E. , year=. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , DOI=
-
[7]
Anders Munch , title =
-
[8]
and van der Laan, Mark J
Hubbard, Alan E. and van der Laan, Mark J. and Robins, James M. Nonparametric locally efficient estimation of the Treatment Specific Survival distribution with right Censored Data and Covariates in Observational Studies. Statistical Models in Epidemiology, the Environment, and Clinical Trials. 2000
2000
-
[9]
van der Laan and Alan Hubbard , journal =
Mark J. van der Laan and Alan Hubbard , journal =. Locally Efficient Estimation of the Quality-Adjusted Lifetime Distribution with Right-Censored Data and Covariates , urldate =
-
[10]
van der Laan and Alan E
Mark J. van der Laan and Alan E. Hubbard and James M. Robins , journal =. Locally Efficient Estimation of a Multivariate Survival Function in Longitudinal Studies , urldate =
-
[11]
Van der Laan and Alan E
Mark J. Van der Laan and Alan E. Hubbard , journal =. Locally Efficient Estimation of the Survival Distribution with Right- Censored Data and Covariates when Collection of Data is Delayed , urldate =
-
[12]
Robins , journal =
Heejung Bang and James M. Robins , journal =. Doubly Robust Estimation in Missing Data and Causal Inference Models , urldate =
-
[13]
Semiparametric Theory and Empirical Processes in Causal Inference
Kennedy, Edward H. Semiparametric Theory and Empirical Processes in Causal Inference. Statistical Causal Inferences and Their Applications in Public Health Research. 2016. doi:10.1007/978-3-319-41259-7_8
-
[14]
Bickel and Jaimyoung Kwon , journal =
Peter J. Bickel and Jaimyoung Kwon , journal =. INFERENCE FOR SEMIPARAMETRIC MODELS: SOME QUESTIONS AND AN ANSWER , urldate =
-
[15]
van der Laan, Mark J. and Robins, James M. Unified Methods for Censored Longitudinal Data and Causality. 2003. doi:10.1007/978-0-387-21700-0_1
-
[16]
Jordan, Michael and Kleinberg, Jon and Sch \"o lkopf, Bernhard and Kelly, Frank P. and Witten, Ian. On Probabilistic Conditional Independence Structures. 2005. doi:10.1007/1-84628-083-4_2
-
[17]
1993 , edition =
Per Kragh Andersen and. 1993 , edition =
1993
-
[18]
and Robins, James M
van der Laan, Mark J. and Robins, James M. and Richard, Gill. Locally Efficient Estimation in Censored Data Models: Theory and Examples. Technical report. 1999
1999
-
[19]
, title=
Overgaard, M. , title=. Mathematical Methods of Statistics , year=
-
[20]
Scandinavian Journal of Statistics , volume =
Overgaard, Morten and Hansen, Stefan Nygaard , title =. Scandinavian Journal of Statistics , volume =. doi:https://doi.org/10.1111/sjos.12487 , url =
-
[21]
The Annals of Statistics , pages=
A survey of product-integration with a view toward application in survival analysis , author=. The Annals of Statistics , pages=. 1990 , volume=
1990
-
[22]
van der Laan and James M
Mark J. van der Laan and James M. Robins , journal =. Locally Efficient Estimation with Current Status Data and Time-Dependent Covariates , urldate =
-
[23]
Statistics & Decisions , doi =
On Robins’ formula , author =. Statistics & Decisions , doi =
-
[24]
and van der Laan, Mark J
Gill, Richard D. and van der Laan, Mark J. and Robins, James M. Coarsening at Random: Characterizations, Conjectures, Counter-Examples. Proceedings of the First Seattle Symposium in Biostatistics. 1997
1997
-
[25]
Heitjan , journal =
Daniel F. Heitjan , journal =. Ignorability, Sufficiency and Ancillarity , urldate =
-
[26]
Heitjan and Donald B
Daniel F. Heitjan and Donald B. Rubin , title =. The Annals of Statistics , number =. 1991 , doi =
1991
-
[27]
Lopuhaä, H. P. and Nane, G. F. , title =. Communications in Statistics - Theory and Methods , number =. 2013 , URL =
2013
-
[28]
The International Journal of Biostatistics , doi =
A Doubly Robust Censoring Unbiased Transformation , author =. The International Journal of Biostatistics , doi =
-
[29]
Kennedy , title =
Edward H. Kennedy , title =. Electronic Journal of Statistics , number =. 2023 , doi =
2023
-
[30]
Censored Regression: Local Linear Approximations and Their Applications , urldate =
Jianqing Fan and Irène Gijbels , journal =. Censored Regression: Local Linear Approximations and Their Applications , urldate =
-
[31]
and Rotnitzky, Andrea and Scharfstein, Daniel O
Robins, James M. and Rotnitzky, Andrea and Scharfstein, Daniel O. Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models. Statistical Models in Epidemiology, the Environment, and Clinical Trials. 2000
2000
-
[32]
2019 , journal=
A unifying approach for doubly-robust _1 regularized estimation of causal contrasts , author=. 2019 , journal=
2019
-
[33]
Rotnitzky, A and Smucler, E and Robins, J M , title = ". Biometrika , volume =. 2020 , month =. doi:10.1093/biomet/asaa054 , url =
-
[34]
Robins, James M. and Rotnitzky, Andrea. Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers. AIDS Epidemiology: Methodological Issues. 1992. doi:10.1007/978-1-4757-1229-2_14
-
[35]
Robins, James M. and Ritov, Ya'acov , title =. Statistics in Medicine , volume =. doi:https://doi.org/10.1002/(SICI)1097-0258(19970215)16:3<285::AID-SIM535>3.0.CO;2-\# , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/ year =
-
[36]
van der Laan and Nicholas P
Mark J. van der Laan and Nicholas P. Jewell , journal =. Current Status and Right-Censored Data Structures When Observing a Marker at the Censoring Time , urldate =
-
[37]
O. E. Barndorff-Nielsen and P. Blaesild , journal =. Orthogeodesic Models , urldate =
-
[38]
Biometrical Journal , volume =
Ozenne, Brice Maxime Hugues and Scheike, Thomas Harder and Stærk, Laila and Gerds, Thomas Alexander , title =. Biometrical Journal , volume =. doi:https://doi.org/10.1002/bimj.201800298 , url =
-
[39]
Cai, Weixin and van der Laan, Mark J. , title =. Biometrics , volume =. doi:https://doi.org/10.1111/biom.13172 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/biom.13172 , year =
-
[40]
Munch, Anders and Breum, Marie Skov and Martinussen, Torben and Gerds, Thomas A. , title =. Scandinavian Journal of Statistics , volume =. doi:https://doi.org/10.1111/sjos.12644 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/sjos.12644 , year =
-
[41]
Lin, D. Y. , title=. Lifetime Data Analysis , year=. doi:10.1007/s10985-007-9048-y , url=
-
[42]
Scheike and Mei-Jie Zhang and Thomas A
Thomas H. Scheike and Mei-Jie Zhang and Thomas A. Gerds , journal =. Predicting Cumulative Incidence Probability by Direct Binomial Regression , urldate =
-
[43]
Rytgaard, Helene C. W. and Eriksson, Frank and van der Laan, Mark J. , title =. Biometrics , year=. doi:https://doi.org/10.1111/biom.13856 , url =
-
[44]
Kennedy and Sivaraman Balakrishnan and Max G’Sell , title =
Edward H. Kennedy and Sivaraman Balakrishnan and Max G’Sell , title =. The Annals of Statistics , number =. 2020 , doi =
2020
-
[45]
Aaron Fisher and Edward H. Kennedy , title =. The American Statistician , volume =. 2021 , publisher =. doi:10.1080/00031305.2020.1717620 , URL =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2020.1717620 2021
-
[46]
Journal of Causal Inference , doi =
Incremental intervention effects in studies with dropout and many timepoints\# , author =. Journal of Causal Inference , doi =
-
[47]
and Rubin, Dan
van der Laan, Mark J. and Rubin, Dan. A General Imputation Methodology for Nonparametric Regression with Censored Data. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 194. 2005
2005
-
[48]
and Gerds, T
Graw, F. and Gerds, T. A. and Schumacher, M. O n pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009
2009
-
[49]
Lieli , journal =
Jason Abrevaya and Yu-Chin Hsu and Robert P. Lieli , journal =. Estimating Conditional Average Treatment Effects , volume =
-
[50]
2023 , journal=
Efficient Generalization and Transportation , author=. 2023 , journal=
2023
-
[51]
2023 , journal=
Covariate-assisted bounds on causal effects with instrumental variables , author=. 2023 , journal=
2023
-
[52]
2023 , journal=
Semiparametric doubly robust targeted double machine learning: a review , author=. 2023 , journal=
2023
-
[53]
Efficient error models for fault-tolerant architectures and the Pauli twirling approximation
Oliver Hines, Oliver Dukes, Karla Diaz-Ordaz and Stijn Vansteelandt , title =. The American Statistician , volume =. 2022 , publisher =. doi:10.1080/00031305.2021.2021984 , URL =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2021.2021984 2022
-
[54]
Bias-Reduced Doubly Robust Estimation , urldate =
Karel Vermeulen and Stijn Vansteelandt , journal =. Bias-Reduced Doubly Robust Estimation , urldate =
-
[55]
2023 , journal=
Causal inference for the expected number of recurrent events in the presence of a terminal event , author=. 2023 , journal=
2023
-
[56]
van der Laan and Steve Butler , journal =
Maja Miloslavsky and Sündüz Keleş and Mark J. van der Laan and Steve Butler , journal =. Recurrent Events Analysis in the Presence of Time-Dependent Covariates and Dependent Censoring , urldate =
-
[57]
P. K. Andersen and R. D. Gill , journal =. Cox's Regression Model for Counting Processes: A Large Sample Study , urldate =
-
[58]
2022 , doi =
Rune Hoff and Niklas Maltzahn and Rachel Louise Hasting and Suzanne L Merkus and Karina Undem and Petter Kristensen and Ingrid Sivesind Mehlum and Jon Michael Gran , title =. 2022 , doi =. https://bmjopen.bmj.com/content/12/11/e062558.full.pdf , journal =
2022
-
[59]
Statistical Methods in Medical Research , volume =
Per Kragh Andersen and Niels Keiding , title =. Statistical Methods in Medical Research , volume =
-
[60]
A SIMPLE STOCHASTIC MODEL OF RECOVERY, RELAPSE, DEATH AND LOSS OF PATIENTS , volume =
Evelyn Fix and Jerzy Neyman , journal =. A SIMPLE STOCHASTIC MODEL OF RECOVERY, RELAPSE, DEATH AND LOSS OF PATIENTS , volume =
-
[61]
Hoem and Niels Keiding and Hannu Kulokari and Bent Natvig and Ole Barndorff-Nielsen and Jørgen Hilden , journal =
Jan M. Hoem and Niels Keiding and Hannu Kulokari and Bent Natvig and Ole Barndorff-Nielsen and Jørgen Hilden , journal =. The Statistical Theory of Demographic Rates: A Review of Current Developments [with Discussion and Reply] , volume =
-
[62]
Multi-state models and outcome prediction in bone marrow transplantation , volume =
Keiding, Niels and Klein, John and Horowitz, Mary , year =. Multi-state models and outcome prediction in bone marrow transplantation , volume =
-
[63]
Statistics in Medicine , volume =
Grand, Mia Klinten and Putter, Hein , title =. Statistics in Medicine , volume =. doi:https://doi.org/10.1002/sim.6771 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.6771 , year =
-
[64]
2012 , journal =
Multistate models in health insurance , author =. 2012 , journal =
2012
-
[65]
Least squares after model selection in high-dimensional sparse models , urldate =
Belloni, Alexandre and Chernozhukov, Victor , journal =. Least squares after model selection in high-dimensional sparse models , urldate =
-
[66]
The Review of Economic Studies , volume =
Belloni, Alexandre and Chernozhukov, Victor and Hansen, Christian , title = ". The Review of Economic Studies , volume =. 2013 , month =. doi:10.1093/restud/rdt044 , url =
-
[67]
2016 , journal =
Post-Selection Inference for Generalized Linear Models With Many Controls , author =. 2016 , journal =
2016
-
[68]
and Vansteelandt, S
Dukes, O. and Vansteelandt, S. H ow to obtain valid tests and confidence intervals after propensity score variable selection?. Stat Methods Med Res. 2020
2020
-
[69]
Alan and Schneeweiss, Sebastian and Rothman, Kenneth J
Brookhart, M. Alan and Schneeweiss, Sebastian and Rothman, Kenneth J. and Glynn, Robert J. and Avorn, Jerry and Stürmer, Til , title = ". American Journal of Epidemiology , volume =. 2006 , month =. doi:10.1093/aje/kwj149 , url =
-
[70]
Myers, J. A. and Rassen, J. A. and Gagne, J. J. and Huybrechts, K. F. and Schneeweiss, S. and Rothman, K. J. and Joffe, M. M. and Glynn, R. J. E ffects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011
2011
-
[71]
and Dukes, O
Van Lancker, K. and Dukes, O. and Vansteelandt, S. E nsuring valid inference for C ox hazard ratios after variable selection. Biometrics. 2023
2023
-
[72]
Robust Inference for Event Probabilities with Non-
Glidden, David V , journal=. Robust Inference for Event Probabilities with Non-. 2002 , publisher=
2002
-
[73]
Validity of the
Datta, Somnath and Satten, Glen A , Journal =. Validity of the
-
[74]
Estimation of Integrated Transition Hazards and Stage Occupation Probabilities for Non-
Datta, Somnath and Satten, Glen A , Journal =. Estimation of Integrated Transition Hazards and Stage Occupation Probabilities for Non-
-
[75]
2023 , journal =
Communication-Efficient Distributed Estimation and Inference for Cox's Model , author=. 2023 , journal =
2023
-
[76]
ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL , urldate =
Jian Huang and Tingni Sun and Zhiliang Ying and Yi Yu and Cun-Hui Zhang , journal =. ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL , urldate =
-
[77]
The Dantzig Selector in Cox's Proportional Hazards Model , urldate =
Anestis Antoniadis and Piotr Fryzlewicz and Frédérique Letué , journal =. The Dantzig Selector in Cox's Proportional Hazards Model , urldate =
-
[78]
Electronic Journal of Statistics , number =
Florentina Bunea and Alexandre Tsybakov and Marten Wegkamp , title =. Electronic Journal of Statistics , number =. 2007 , doi =
2007
-
[79]
Bickel and Ya’acov Ritov and Alexandre B
Peter J. Bickel and Ya’acov Ritov and Alexandre B. Tsybakov , title =. The Annals of Statistics , number =. 2009 , doi =
2009
-
[80]
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =
Fan, Jianqing and Lv, Jinchi , title =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =. doi:https://doi.org/10.1111/j.1467-9868.2008.00674.x , url =. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9868.2008.00674.x , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.