Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

Chad Hochberg; Elizabeth A. Stuart; Jason Akulian; Jesus E. Vazquez; Jiayi Tong; Theodore J. Iwashyna; Yicheng Shen

arxiv: 2605.20125 · v1 · pith:KF5H3S3Ynew · submitted 2026-05-19 · 📊 stat.ME · math.ST· stat.TH

Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

Jesus E. Vazquez , Yicheng Shen , Jason Akulian , Chad Hochberg , Theodore J. Iwashyna , Elizabeth A. Stuart , Jiayi Tong This is my paper

Pith reviewed 2026-05-20 03:26 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH

keywords federated learningmissing datacomplete case analysisinverse probability weightingmulti-site studiescalibrated estimationsandwich variance

0 comments

The pith

In federated learning with missing data, complete-case analysis is preferred over inverse-probability weighting when site-level conditions hold, and a calibrated method combines weights across sites to stay consistent if at least one model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for analyzing incomplete data across multiple sites without sharing individual records. It identifies concrete conditions under which the simple complete-case estimator outperforms inverse-probability weighting, and supplies a new calibrated weighting procedure that pools candidate models from different sites while remaining consistent provided any one of those models is correctly specified. Consistency requirements are imposed only locally at each site, so the overall federated estimator inherits validity from the participating locations. A sandwich variance estimator is derived that incorporates uncertainty from the weight calibration step. The approach is demonstrated on risk-factor analysis for mortality in patients with pleural infections.

Core claim

The central claim is that, under stated site-level consistency conditions, the complete-case estimator is preferred to the inverse-probability-weighted estimator in federated settings; when complete-case analysis is invalid, a calibrated estimator that aggregates candidate weighting models across sites remains consistent whenever at least one candidate model is correctly specified, with validity inherited from the local properties and with a sandwich variance that accounts for weight-estimation uncertainty.

What carries the argument

Calibrated weight estimation that combines candidate weighting models across sites while remaining consistent if at least one is correctly specified.

If this is right

The federated estimator is consistent whenever the local complete-case or local weighting estimators satisfy the stated site-level conditions.
A sandwich variance formula correctly accounts for the extra variability introduced by estimating the weights.
The method can be applied directly to multi-site medical studies that must respect privacy constraints while handling missing covariates or outcomes.
When complete-case analysis is biased, the calibrated weighting procedure recovers consistent estimates without requiring a single correctly specified model at every site.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same site-level inheritance logic could be tested in other privacy-preserving settings such as differential privacy or secure multi-party computation.
Extending the framework to time-to-event or longitudinal outcomes would require only replacing the local estimating equations while preserving the aggregation and calibration steps.
Empirical checks could compare the calibrated estimator against oracle pooled analysis on de-identified benchmark datasets to quantify efficiency loss from federation.

Load-bearing premise

The federated estimator inherits validity from site-level consistency conditions, so that local properties determine the overall result when data are aggregated without sharing.

What would settle it

A simulation or real multi-site dataset in which every candidate weighting model at every site is misspecified yet the federated calibrated estimator still converges to the true parameter would falsify the consistency claim.

Figures

Figures reproduced from arXiv: 2605.20125 by Chad Hochberg, Elizabeth A. Stuart, Jason Akulian, Jesus E. Vazquez, Jiayi Tong, Theodore J. Iwashyna, Yicheng Shen.

**Figure 1.** Figure 1: Oracle denotes the case of no missingness. IPW (oracle) denotes the case when the true probability of a complete observation is used (no estimation). IPW (pooled) denotes weights estimates obtained using pooled data. IPW (site-specific) denotes weights obtained using only site-specific data. IPW (calibrated) denotes weights were calibrated using external weighting models. estimator exhibited bias regardles… view at source ↗

read the original abstract

Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is a calibrated cross-site weighting method for federated missing data that claims consistency if at least one local model is correct, plus practical rules for choosing complete cases versus IPW.

read the letter

The one or two things to know: this paper gives conditions for when complete case analysis beats inverse probability weighting in federated learning with missing data, and introduces a calibrated way to combine weighting models across sites that stays consistent if at least one is correct. They lay out the framework clearly, derive the sandwich variance estimator for the weights, and show it on a real medical dataset about pleural infections and mortality. That example is helpful for seeing the practical differences. What stands out is how they keep the consistency at the site level so the federated version can inherit it. This seems like a reasonable extension for privacy-preserving analyses. The soft spot is the aggregation step in the calibrated weights. If combining models from multiple sites doesn't fully preserve the property that only one needs to be right, especially under the site-level consistency, then the global estimator could have issues. The stress test flags this, and I'd want to see the specific math or checks for when only a fraction of sites have correct models. This paper is for methodologists in statistics and epidemiology who deal with distributed data and missingness. A reader working on similar problems would find the guidance useful. It deserves a serious referee because the application is timely and the claims are specific enough to evaluate. I recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a framework for federated learning with missing data. It identifies conditions under which the complete-case (CC) estimator is preferred over inverse-probability weighting (IPW). For settings where CC fails, it introduces a calibrated weight estimation procedure that pools candidate propensity models across sites and claims consistency provided at least one model is correctly specified. Consistency conditions are asserted at the site level so that the federated estimator inherits local validity. A sandwich variance estimator that accounts for weight-estimation uncertainty is derived, and the approach is illustrated on a multi-site analysis of 90-day mortality risk factors among patients with pleural infections treated by intrapleural enzyme therapy.

Significance. If the central consistency and variance results hold, the work would supply a practical, privacy-preserving method for handling missing data in distributed medical studies. The site-level consistency framing and the sandwich estimator are potentially useful contributions, and the empirical illustration demonstrates applicability. The calibrated aggregation step, however, is the load-bearing component whose validity must be verified before the framework can be recommended for general use.

major comments (2)

[§3] §3 (calibrated weight estimation): The manuscript states that consistency conditions are given at the site level and that the federated estimator therefore inherits validity from local properties. The calibrated procedure combines candidate weighting models across sites while claiming consistency if at least one is correctly specified. No explicit aggregation formula or proof sketch is supplied showing that the union-model consistency property is preserved when only a subset of sites contain a correct propensity model. This step is load-bearing for the central claim; a counter-example or detailed proof of the transfer is required.
[§4] §4 (sandwich variance): The sandwich variance is asserted to account for uncertainty in weight estimation. It is not shown how the estimator incorporates the additional variability induced by the cross-site calibration of the weights. Without this accounting, the reported standard errors may be invalid and the coverage properties of the resulting confidence intervals cannot be guaranteed.

minor comments (2)

[Abstract] Abstract: The conditions under which the CC estimator is preferred to IPW are mentioned but not stated explicitly; a one-sentence summary of those conditions would improve readability.
[Empirical illustration] The empirical illustration would benefit from a brief description of the number of participating sites, the observed missingness rate, and the candidate propensity models that were combined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and will revise the manuscript to supply the requested details on the aggregation formula, consistency proof, and variance derivation.

read point-by-point responses

Referee: [§3] §3 (calibrated weight estimation): The manuscript states that consistency conditions are given at the site level and that the federated estimator therefore inherits validity from local properties. The calibrated procedure combines candidate weighting models across sites while claiming consistency if at least one is correctly specified. No explicit aggregation formula or proof sketch is supplied showing that the union-model consistency property is preserved when only a subset of sites contain a correct propensity model. This step is load-bearing for the central claim; a counter-example or detailed proof of the transfer is required.

Authors: We thank the referee for highlighting this gap. The current manuscript asserts site-level consistency but does not supply an explicit aggregation formula or proof sketch for the federated case when only a subset of sites contain a correct propensity model. In the revision we will add the aggregation formula for the calibrated weights and a detailed proof that the union-model consistency property transfers to the federated estimator under the stated conditions. We will also include a brief illustrative example clarifying the role of the subset of correct sites. revision: yes
Referee: [§4] §4 (sandwich variance): The sandwich variance is asserted to account for uncertainty in weight estimation. It is not shown how the estimator incorporates the additional variability induced by the cross-site calibration of the weights. Without this accounting, the reported standard errors may be invalid and the coverage properties of the resulting confidence intervals cannot be guaranteed.

Authors: We agree that the current derivation does not explicitly show how the sandwich variance accounts for variability induced by the cross-site calibration step. In the revised manuscript we will expand the variance section to derive the additional terms arising from the calibration procedure and update the sandwich formula accordingly, ensuring all sources of weight-estimation uncertainty are incorporated. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard estimators with independent consistency claims

full rationale

The paper introduces a framework combining complete-case and inverse-probability-weighted estimators under federated constraints, then proposes a calibrated multi-model weighting procedure whose consistency is asserted to hold if at least one candidate model is correct. These claims rest on site-level consistency assumptions that are stated as external conditions rather than derived from the federated aggregation itself. No equation or step reduces a target quantity to a fitted parameter or self-citation by construction; the sandwich variance and inheritance statements are presented as derived consequences of the local properties rather than tautological re-labelings. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard missing-data and federated-learning background assumptions plus the key domain assumption that site-level consistency transfers to the federated estimator; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Consistency conditions stated at the site level ensure the federated estimator inherits validity from local properties.
Directly invoked in the abstract to justify the overall framework validity.

pith-pipeline@v0.9.0 · 5684 in / 1442 out tokens · 53945 ms · 2026-05-20T03:26:35.959443+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Why” behind including “Y

The “Why” behind including “Y” in your imputation model , author=. Statistical Methods in Medical Research , volume=. 2024 , publisher=

work page 2024
[2]

The American Statistician , volume=

Understanding the implications of a complete case analysis for regression models with a right-censored covariate , author=. The American Statistician , volume=. 2024 , publisher=

work page 2024
[3]

Multiple imputation for multilevel data with continuous and binary variables , author=

work page
[4]

and Carpenter, James R

Bartlett, Jonathan W. and Carpenter, James R. and Tilling, Kate and Vansteelandt, Stijn , journal=. Improving upon the efficiency of complete case analysis when covariates are. 2014 , publisher=

work page 2014
[5]

, author=

The moderator--mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. , author=. Journal of personality and social psychology , volume=. 1986 , publisher=

work page 1986
[6]

Sociological Methods & Research , volume=

Using inverse probability weighting to address post-outcome collider bias , author=. Sociological Methods & Research , volume=. 2024 , publisher=

work page 2024
[7]

Cancer Discovery , volume=

Federated deep learning enables cancer subtyping by proteomics , author=. Cancer Discovery , volume=. 2025 , publisher=

work page 2025
[8]

Nature Communications , volume =

Multiple imputation for analysis of incomplete data in distributed health data networks , author =. Nature Communications , volume =. 2020 , doi =

work page 2020
[9]

Statistica Sinica , pages=

A split-and-conquer approach for analysis of extraordinarily large data , author=. Statistica Sinica , pages=. 2014 , publisher=

work page 2014
[10]

Computational statistics & data analysis , volume=

A unified framework of multiply robust estimation approaches for handling incomplete data , author=. Computational statistics & data analysis , volume=. 2023 , publisher=

work page 2023
[11]

Journal of Causal Inference , volume=

Causal effect on a target population: A sensitivity analysis to handle missing covariates , author=. Journal of Causal Inference , volume=. 2022 , publisher=

work page 2022
[12]

Statistics in medicine , volume=

Developing more generalizable prediction models from pooled studies and large clustered data sets , author=. Statistics in medicine , volume=. 2021 , publisher=

work page 2021
[13]

Leverage real-world longitudinal data in large clinical research networks for

Duan, Rui and Chen, Zhaoyi and Tong, Jiayi and Luo, Chongliang and Lyu, Tianchen and Tao, Cui and Maraganore, Demetrius and Bian, Jiang and Chen, Yong , booktitle=. Leverage real-world longitudinal data in large clinical research networks for

work page
[14]

Heckman imputation models for binary or continuous

Galimard, Jacques-Emmanuel and Chevret, Sylvie and Curis, Emmanuel and Resche-Rigon, Matthieu , journal=. Heckman imputation models for binary or continuous. 2018 , publisher=

work page 2018
[15]

Journal of the American Statistical Association , volume=

Multiply robust estimation in regression analysis with missing data , author=. Journal of the American Statistical Association , volume=. 2014 , publisher=

work page 2014
[16]

Danish Medical Journal , volume=

Validation of the RAPID score in a Danish population with pleural infection , author=. Danish Medical Journal , volume=. 2024 , publisher=

work page 2024
[17]

A general framework for imputation in surveys , journal =

Haziza, David and Beaumont, Jean-Fran. A general framework for imputation in surveys , journal =. 2017 , volume =

work page 2017
[18]

The annals of statistics , pages=

Ignorability and coarse data , author=. The annals of statistics , pages=. 1991 , publisher=

work page 1991
[19]

International Journal of Epidemiology , volume=

Accounting for missing data in statistical analyses: multiple imputation is not always the answer , author=. International Journal of Epidemiology , volume=. 2019 , publisher=

work page 2019
[20]

Statistics in medicine , volume=

Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE , author=. Statistics in medicine , volume=. 2015 , publisher=

work page 2015
[21]

Research Synthesis Methods , pages=

Hierarchical imputation of categorical variables in the presence of systematically and sporadically missing data , author=. Research Synthesis Methods , pages=. 2025 , publisher=

work page 2025
[22]

Journal of the American Statistical Association , year=

Communication-efficient distributed statistical inference , author=. Journal of the American Statistical Association , year=

work page
[23]

NPJ digital medicine , volume=

Digital twins for health: a scoping review , author=. NPJ digital medicine , volume=. 2024 , publisher=

work page 2024
[24]

Foundations and trends

Advances and open problems in federated learning , author=. Foundations and trends. 2021 , publisher=

work page 2021
[25]

, title =

Kim, Jae Kwang and Fuller, Wayne A. , title =. Biometrika , year =

work page
[26]

A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort

Kundu, Ritoban and Shi, Xu and Kleinsasser, Michael and Fritsche, Lars G and Salvatore, Maxwell and Mukherjee, Bhramar , journal=. A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort. 2026 , publisher=

work page 2026
[27]

Journal of Machine Learning Research , volume=

Communication-efficient sparse regression , author=. Journal of Machine Learning Research , volume=

work page
[28]

Biometrika , volume=

Demystifying a class of multiply robust estimators , author=. Biometrika , volume=. 2020 , publisher=

work page 2020
[29]

Journal of Biomedical Informatics , volume=

FedScore: A privacy-preserving framework for federated scoring system development , author=. Journal of Biomedical Informatics , volume=. 2023 , publisher=

work page 2023
[30]

Journal of the American Medical Informatics Association , volume=

Federated and distributed learning applications for electronic health records and structured medical data: a scoping review , author=. Journal of the American Medical Informatics Association , volume=. 2023 , publisher=

work page 2023
[31]

Annual review of biomedical data science , volume=

Centralized and federated models for the analysis of clinical data , author=. Annual review of biomedical data science , volume=. 2024 , publisher=

work page 2024
[32]

Journal of Biomedical Informatics , volume=

FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records , author=. Journal of Biomedical Informatics , volume=. 2025 , publisher=

work page 2025
[33]

AMIA Annual Symposium Proceedings , volume=

Federated multiple imputation for variables that are missing not at random in distributed electronic health records , author=. AMIA Annual Symposium Proceedings , volume=

work page
[34]

medRxiv , pages=

D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity , author=. medRxiv , pages=. 2025 , publisher=

work page 2025
[35]

Journal of the American Statistical Association , volume=

Pattern-mixture models for multivariate incomplete data , author=. Journal of the American Statistical Association , volume=. 1993 , publisher=

work page 1993
[36]

arXiv preprint arXiv:2403.05229 , year=

Developing federated time-to-event scores using heterogeneous real-world survival data , author=. arXiv preprint arXiv:2403.05229 , year=

work page arXiv
[37]

Journal of the American Statistical association , volume=

Regression with missing X's: a review , author=. Journal of the American Statistical association , volume=. 1992 , publisher=

work page 1992
[38]

Sociological Methods & Research , volume=

A comparison of three popular methods for handling missing data: complete-case analysis, inverse probability weighting, and multiple imputation , author=. Sociological Methods & Research , volume=. 2024 , publisher=

work page 2024
[39]

Frontiers in Digital Health , volume=

Technical and legal aspects of federated learning in bioinformatics: applications, challenges and opportunities , author=. Frontiers in Digital Health , volume=. 2025 , publisher=

work page 2025
[40]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Every missingness not at random model has a missingness at random counterpart with equal fit , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=

work page 2008
[41]

American journal of epidemiology , volume=

Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies , author=. American journal of epidemiology , volume=. 2018 , publisher=

work page 2018
[42]

Journal of the American Statistical Association , volume=

Graphical models for processing missing data , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021
[43]

Statistical methods in medical research , volume=

Multiple imputation by chained equations for systematically and sporadically missing multilevel data , author=. Statistical methods in medical research , volume=. 2018 , publisher=

work page 2018
[44]

Research Data Assistance Center , year =

work page
[45]

Journal of the American statistical Association , volume=

Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=

work page 1994
[46]

and Rotnitzky, Andrea and Zhao, Lue Ping , title =

Robins, James M. and Rotnitzky, Andrea and Zhao, Lue Ping , title =. Journal of the American Statistical Association , volume =

work page
[47]

Statistical methods in medical research , volume=

Review of inverse probability weighting for dealing with missing data , author=. Statistical methods in medical research , volume=. 2013 , publisher=

work page 2013
[48]

Frontiers in Psychiatry , volume=

Digital twins and the future of precision mental health , author=. Frontiers in Psychiatry , volume=. 2023 , publisher=

work page 2023
[49]

Bioinformatics , volume=

MissForest—non-parametric missing value imputation for mixed-type data , author=. Bioinformatics , volume=. 2012 , publisher=

work page 2012
[50]

2006 , publisher=

Semiparametric theory and missing data , author=. 2006 , publisher=

work page 2006
[51]

Wiley Interdisciplinary Reviews: Computational Statistics , volume=

Review of Simulation Studies Evaluating Imputation Methods in High-Dimensional Datasets , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2025 , publisher=

work page 2025
[52]

arXiv preprint arXiv:2409.04684 , year=

Establishing the Parallels and Differences Between Right-Censored and Missing Covariates , author=. arXiv preprint arXiv:2409.04684 , year=

work page arXiv
[53]

and Henry, Christopher and Stock, Eileen M

White, Heath D. and Henry, Christopher and Stock, Eileen M. and Arroliga, Alejandro C. and Ghamande, Shekhar , journal=. Predicting long-term outcomes in pleural infections. 2015 , publisher=

work page 2015
[54]

2025 , publisher=

Wu, Qiong and Reps, Jenna M and Li, Lu and Zhang, Bingyu and Lu, Yiwen and Tong, Jiayi and Zhang, Dazheng and Lumley, Thomas and Brand, Milou T and Van Zandt, Mui and others , journal=. 2025 , publisher=

work page 2025
[55]

npj Digital Medicine , volume=

Unlocking efficiency in real-world collaborative studies: a multi-site international study with one-shot lossless GLMM algorithm , author=. npj Digital Medicine , volume=. 2025 , publisher=

work page 2025
[56]

Journal of the American Medical Informatics Association , volume=

Managing re-identification risks while providing access to the All of Us research program , author=. Journal of the American Medical Informatics Association , volume=. 2023 , publisher=

work page 2023
[57]

Journal of Healthcare Informatics Research , volume=

Federated learning for healthcare informatics , author=. Journal of Healthcare Informatics Research , volume=. 2021 , publisher=

work page 2021
[58]

PloS one , volume=

A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data , author=. PloS one , volume=. 2023 , publisher=

work page 2023
[59]

arXiv preprint arXiv:2508.01615 , year=

TCDiff: Triplex Cascaded Diffusion for High-fidelity Multimodal EHRs Generation with Incomplete Clinical Data , author=. arXiv preprint arXiv:2508.01615 , year=

work page arXiv
[60]

ACM Transactions on Intelligent Systems and Technology , year=

Federated inverse probability treatment weighting for individual treatment effect estimation , author=. ACM Transactions on Intelligent Systems and Technology , year=

work page
[61]

Knowledge-Based Systems , volume=

Federated conditional generative adversarial nets imputation method for air quality missing data , author=. Knowledge-Based Systems , volume=. 2021 , publisher=

work page 2021

[1] [1]

Why” behind including “Y

The “Why” behind including “Y” in your imputation model , author=. Statistical Methods in Medical Research , volume=. 2024 , publisher=

work page 2024

[2] [2]

The American Statistician , volume=

Understanding the implications of a complete case analysis for regression models with a right-censored covariate , author=. The American Statistician , volume=. 2024 , publisher=

work page 2024

[3] [3]

Multiple imputation for multilevel data with continuous and binary variables , author=

work page

[4] [4]

and Carpenter, James R

Bartlett, Jonathan W. and Carpenter, James R. and Tilling, Kate and Vansteelandt, Stijn , journal=. Improving upon the efficiency of complete case analysis when covariates are. 2014 , publisher=

work page 2014

[5] [5]

, author=

The moderator--mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. , author=. Journal of personality and social psychology , volume=. 1986 , publisher=

work page 1986

[6] [6]

Sociological Methods & Research , volume=

Using inverse probability weighting to address post-outcome collider bias , author=. Sociological Methods & Research , volume=. 2024 , publisher=

work page 2024

[7] [7]

Cancer Discovery , volume=

Federated deep learning enables cancer subtyping by proteomics , author=. Cancer Discovery , volume=. 2025 , publisher=

work page 2025

[8] [8]

Nature Communications , volume =

Multiple imputation for analysis of incomplete data in distributed health data networks , author =. Nature Communications , volume =. 2020 , doi =

work page 2020

[9] [9]

Statistica Sinica , pages=

A split-and-conquer approach for analysis of extraordinarily large data , author=. Statistica Sinica , pages=. 2014 , publisher=

work page 2014

[10] [10]

Computational statistics & data analysis , volume=

A unified framework of multiply robust estimation approaches for handling incomplete data , author=. Computational statistics & data analysis , volume=. 2023 , publisher=

work page 2023

[11] [11]

Journal of Causal Inference , volume=

Causal effect on a target population: A sensitivity analysis to handle missing covariates , author=. Journal of Causal Inference , volume=. 2022 , publisher=

work page 2022

[12] [12]

Statistics in medicine , volume=

Developing more generalizable prediction models from pooled studies and large clustered data sets , author=. Statistics in medicine , volume=. 2021 , publisher=

work page 2021

[13] [13]

Leverage real-world longitudinal data in large clinical research networks for

Duan, Rui and Chen, Zhaoyi and Tong, Jiayi and Luo, Chongliang and Lyu, Tianchen and Tao, Cui and Maraganore, Demetrius and Bian, Jiang and Chen, Yong , booktitle=. Leverage real-world longitudinal data in large clinical research networks for

work page

[14] [14]

Heckman imputation models for binary or continuous

Galimard, Jacques-Emmanuel and Chevret, Sylvie and Curis, Emmanuel and Resche-Rigon, Matthieu , journal=. Heckman imputation models for binary or continuous. 2018 , publisher=

work page 2018

[15] [15]

Journal of the American Statistical Association , volume=

Multiply robust estimation in regression analysis with missing data , author=. Journal of the American Statistical Association , volume=. 2014 , publisher=

work page 2014

[16] [16]

Danish Medical Journal , volume=

Validation of the RAPID score in a Danish population with pleural infection , author=. Danish Medical Journal , volume=. 2024 , publisher=

work page 2024

[17] [17]

A general framework for imputation in surveys , journal =

Haziza, David and Beaumont, Jean-Fran. A general framework for imputation in surveys , journal =. 2017 , volume =

work page 2017

[18] [18]

The annals of statistics , pages=

Ignorability and coarse data , author=. The annals of statistics , pages=. 1991 , publisher=

work page 1991

[19] [19]

International Journal of Epidemiology , volume=

Accounting for missing data in statistical analyses: multiple imputation is not always the answer , author=. International Journal of Epidemiology , volume=. 2019 , publisher=

work page 2019

[20] [20]

Statistics in medicine , volume=

Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE , author=. Statistics in medicine , volume=. 2015 , publisher=

work page 2015

[21] [21]

Research Synthesis Methods , pages=

Hierarchical imputation of categorical variables in the presence of systematically and sporadically missing data , author=. Research Synthesis Methods , pages=. 2025 , publisher=

work page 2025

[22] [22]

Journal of the American Statistical Association , year=

Communication-efficient distributed statistical inference , author=. Journal of the American Statistical Association , year=

work page

[23] [23]

NPJ digital medicine , volume=

Digital twins for health: a scoping review , author=. NPJ digital medicine , volume=. 2024 , publisher=

work page 2024

[24] [24]

Foundations and trends

Advances and open problems in federated learning , author=. Foundations and trends. 2021 , publisher=

work page 2021

[25] [25]

, title =

Kim, Jae Kwang and Fuller, Wayne A. , title =. Biometrika , year =

work page

[26] [26]

A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort

Kundu, Ritoban and Shi, Xu and Kleinsasser, Michael and Fritsche, Lars G and Salvatore, Maxwell and Mukherjee, Bhramar , journal=. A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort. 2026 , publisher=

work page 2026

[27] [27]

Journal of Machine Learning Research , volume=

Communication-efficient sparse regression , author=. Journal of Machine Learning Research , volume=

work page

[28] [28]

Biometrika , volume=

Demystifying a class of multiply robust estimators , author=. Biometrika , volume=. 2020 , publisher=

work page 2020

[29] [29]

Journal of Biomedical Informatics , volume=

FedScore: A privacy-preserving framework for federated scoring system development , author=. Journal of Biomedical Informatics , volume=. 2023 , publisher=

work page 2023

[30] [30]

Journal of the American Medical Informatics Association , volume=

Federated and distributed learning applications for electronic health records and structured medical data: a scoping review , author=. Journal of the American Medical Informatics Association , volume=. 2023 , publisher=

work page 2023

[31] [31]

Annual review of biomedical data science , volume=

Centralized and federated models for the analysis of clinical data , author=. Annual review of biomedical data science , volume=. 2024 , publisher=

work page 2024

[32] [32]

Journal of Biomedical Informatics , volume=

FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records , author=. Journal of Biomedical Informatics , volume=. 2025 , publisher=

work page 2025

[33] [33]

AMIA Annual Symposium Proceedings , volume=

Federated multiple imputation for variables that are missing not at random in distributed electronic health records , author=. AMIA Annual Symposium Proceedings , volume=

work page

[34] [34]

medRxiv , pages=

D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity , author=. medRxiv , pages=. 2025 , publisher=

work page 2025

[35] [35]

Journal of the American Statistical Association , volume=

Pattern-mixture models for multivariate incomplete data , author=. Journal of the American Statistical Association , volume=. 1993 , publisher=

work page 1993

[36] [36]

arXiv preprint arXiv:2403.05229 , year=

Developing federated time-to-event scores using heterogeneous real-world survival data , author=. arXiv preprint arXiv:2403.05229 , year=

work page arXiv

[37] [37]

Journal of the American Statistical association , volume=

Regression with missing X's: a review , author=. Journal of the American Statistical association , volume=. 1992 , publisher=

work page 1992

[38] [38]

Sociological Methods & Research , volume=

A comparison of three popular methods for handling missing data: complete-case analysis, inverse probability weighting, and multiple imputation , author=. Sociological Methods & Research , volume=. 2024 , publisher=

work page 2024

[39] [39]

Frontiers in Digital Health , volume=

Technical and legal aspects of federated learning in bioinformatics: applications, challenges and opportunities , author=. Frontiers in Digital Health , volume=. 2025 , publisher=

work page 2025

[40] [40]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Every missingness not at random model has a missingness at random counterpart with equal fit , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2008 , publisher=

work page 2008

[41] [41]

American journal of epidemiology , volume=

Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies , author=. American journal of epidemiology , volume=. 2018 , publisher=

work page 2018

[42] [42]

Journal of the American Statistical Association , volume=

Graphical models for processing missing data , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021

[43] [43]

Statistical methods in medical research , volume=

Multiple imputation by chained equations for systematically and sporadically missing multilevel data , author=. Statistical methods in medical research , volume=. 2018 , publisher=

work page 2018

[44] [44]

Research Data Assistance Center , year =

work page

[45] [45]

Journal of the American statistical Association , volume=

Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=

work page 1994

[46] [46]

and Rotnitzky, Andrea and Zhao, Lue Ping , title =

Robins, James M. and Rotnitzky, Andrea and Zhao, Lue Ping , title =. Journal of the American Statistical Association , volume =

work page

[47] [47]

Statistical methods in medical research , volume=

Review of inverse probability weighting for dealing with missing data , author=. Statistical methods in medical research , volume=. 2013 , publisher=

work page 2013

[48] [48]

Frontiers in Psychiatry , volume=

Digital twins and the future of precision mental health , author=. Frontiers in Psychiatry , volume=. 2023 , publisher=

work page 2023

[49] [49]

Bioinformatics , volume=

MissForest—non-parametric missing value imputation for mixed-type data , author=. Bioinformatics , volume=. 2012 , publisher=

work page 2012

[50] [50]

2006 , publisher=

Semiparametric theory and missing data , author=. 2006 , publisher=

work page 2006

[51] [51]

Wiley Interdisciplinary Reviews: Computational Statistics , volume=

Review of Simulation Studies Evaluating Imputation Methods in High-Dimensional Datasets , author=. Wiley Interdisciplinary Reviews: Computational Statistics , volume=. 2025 , publisher=

work page 2025

[52] [52]

arXiv preprint arXiv:2409.04684 , year=

Establishing the Parallels and Differences Between Right-Censored and Missing Covariates , author=. arXiv preprint arXiv:2409.04684 , year=

work page arXiv

[53] [53]

and Henry, Christopher and Stock, Eileen M

White, Heath D. and Henry, Christopher and Stock, Eileen M. and Arroliga, Alejandro C. and Ghamande, Shekhar , journal=. Predicting long-term outcomes in pleural infections. 2015 , publisher=

work page 2015

[54] [54]

2025 , publisher=

Wu, Qiong and Reps, Jenna M and Li, Lu and Zhang, Bingyu and Lu, Yiwen and Tong, Jiayi and Zhang, Dazheng and Lumley, Thomas and Brand, Milou T and Van Zandt, Mui and others , journal=. 2025 , publisher=

work page 2025

[55] [55]

npj Digital Medicine , volume=

Unlocking efficiency in real-world collaborative studies: a multi-site international study with one-shot lossless GLMM algorithm , author=. npj Digital Medicine , volume=. 2025 , publisher=

work page 2025

[56] [56]

Journal of the American Medical Informatics Association , volume=

Managing re-identification risks while providing access to the All of Us research program , author=. Journal of the American Medical Informatics Association , volume=. 2023 , publisher=

work page 2023

[57] [57]

Journal of Healthcare Informatics Research , volume=

Federated learning for healthcare informatics , author=. Journal of Healthcare Informatics Research , volume=. 2021 , publisher=

work page 2021

[58] [58]

PloS one , volume=

A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data , author=. PloS one , volume=. 2023 , publisher=

work page 2023

[59] [59]

arXiv preprint arXiv:2508.01615 , year=

TCDiff: Triplex Cascaded Diffusion for High-fidelity Multimodal EHRs Generation with Incomplete Clinical Data , author=. arXiv preprint arXiv:2508.01615 , year=

work page arXiv

[60] [60]

ACM Transactions on Intelligent Systems and Technology , year=

Federated inverse probability treatment weighting for individual treatment effect estimation , author=. ACM Transactions on Intelligent Systems and Technology , year=

work page

[61] [61]

Knowledge-Based Systems , volume=

Federated conditional generative adversarial nets imputation method for air quality missing data , author=. Knowledge-Based Systems , volume=. 2021 , publisher=

work page 2021