Federated Survival Analysis in Healthcare: A Multi-Model Evaluation on Cross-Institutional Heterogeneous Breast Cancer Data
Pith reviewed 2026-06-26 08:46 UTC · model grok-4.3
The pith
Federated learning lets survival models for breast cancer match or exceed centralized performance while keeping patient data local.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On a cross-institutional breast cancer cohort with naturally heterogeneous clients, federated learning with FedAvg or FedProx consistently outperforms local training for Cox, DeepSurv, and RSF models and approaches or exceeds centralized performance. RSF provides the best balance of discrimination, calibration, and robustness, while performance depends on client diversity and FedAdam is less stable.
What carries the argument
The systematic comparison of Cox Proportional Hazards, DeepSurv, and Random Survival Forest models across centralized, local, and federated (FedAvg, FedProx, FedAdam) training paradigms on heterogeneous distributed clients.
If this is right
- Random Survival Forest offers the best overall balance when robustness across heterogeneous clients matters most.
- FedAvg and FedProx are more stable and effective federated optimizers than FedAdam for the gradient-based survival models.
- Model and training-paradigm selection should be guided by measured client diversity, privacy requirements, and the need for interpretability.
- Federated survival analysis can produce usable clinical predictions from distributed data that would otherwise remain siloed.
Where Pith is reading between the lines
- The same evaluation design could be applied to other time-to-event outcomes such as disease progression in different cancer types to check whether RSF and FedAvg retain their advantage.
- Hospitals facing high client heterogeneity might begin with local RSF and add federated steps only when performance gains justify the added coordination.
- The observed dependence on client diversity suggests that future federated survival work may need client-specific weighting or clustering to maintain performance in more extreme distribution shifts.
- The decision-oriented guidelines could serve as a starting point for institutional policies on when to adopt federated versus centralized survival modeling.
Load-bearing premise
The cross-institutional breast cancer cohort possesses naturally heterogeneous distributed clients whose variation is representative of real-world clinical data distributions.
What would settle it
Replication on another set of institutions where federated training fails to outperform local training across the same three models would falsify the central performance claim.
Figures
read the original abstract
Survival analysis is central to clinical decision-making, yet reliable time-to-event models require large, diverse cohorts that are rarely available at a single institution, while privacy regulations restrict the centralization of patient data. Federated learning (FL) offers a privacy-preserving alternative by training shared models without exchanging raw data, but its effectiveness for survival modeling under realistic, heterogeneous conditions remains insufficiently understood. This paper presents a systematic, multi-model evaluation of federated survival analysis on a cross-institutional breast cancer cohort with naturally heterogeneous distributed clients. Three representative survival models, the Cox Proportional Hazards model, DeepSurv, and Random Survival Forest (RSF), are compared across centralized, local, and federated training, and three federated optimization strategies (FedAvg, FedProx, and FedAdam) are assessed for the gradient-based models. Results show that FL consistently outperforms local training and approaches, and occasionally exceeds, centralized performance, while RSF offers the best overall balance of discrimination, calibration, and robustness across heterogeneous clients. We further find that performance depends on the diversity of client distributions, and that FedAvg and FedProx are stronger and more stable than FedAdam. Based on these findings, we derive practical, decision-oriented guidelines mapping data, privacy, interpretability, and resource constraints to recommended model and training-paradigm choices for federated survival modeling in healthcare.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates three survival models (Cox PH, DeepSurv, RSF) under centralized, local, and federated training (FedAvg, FedProx, FedAdam) on a cross-institutional breast cancer cohort. It claims FL consistently outperforms local training and approaches or exceeds centralized performance, RSF provides the best balance of discrimination/calibration/robustness, performance depends on client diversity, FedAvg/FedProx are more stable than FedAdam, and derives practical guidelines mapping constraints to model/training choices.
Significance. If the empirical comparisons hold under representative heterogeneity, the work supplies actionable evidence for privacy-preserving survival modeling in healthcare, clarifying when federated approaches are viable and identifying RSF as robust. It fills a gap in multi-model FL survival evaluations on real distributed clinical data.
major comments (1)
- [Abstract and §3] Abstract and §3 (Data/Methods): The headline claims that FL outperforms local training, approaches/exceeds centralized performance, and that RSF is most robust rest on the assertion of 'naturally heterogeneous distributed clients' representative of real-world clinical variation. No quantitative client-level diagnostics (per-client covariate means/variances, censoring fractions, event-time distributions, or inter-client divergence metrics) are reported to substantiate this representativeness. Without them, the stability findings may be specific to this split rather than general.
minor comments (1)
- [§4] Table captions and §4 (Results) could explicitly list the number of clients, total samples, and per-client event rates to allow readers to assess heterogeneity scale directly.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on substantiating client heterogeneity. We address the major comment below and will incorporate the suggested diagnostics to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Data/Methods): The headline claims that FL outperforms local training, approaches/exceeds centralized performance, and that RSF is most robust rest on the assertion of 'naturally heterogeneous distributed clients' representative of real-world clinical variation. No quantitative client-level diagnostics (per-client covariate means/variances, censoring fractions, event-time distributions, or inter-client divergence metrics) are reported to substantiate this representativeness. Without them, the stability findings may be specific to this split rather than general.
Authors: We agree that explicit quantitative client-level diagnostics are needed to support claims of representativeness and to allow readers to evaluate generalizability. The current manuscript describes the data sources and notes natural heterogeneity arising from distinct institutions but does not include the requested per-client statistics. In the revision we will add a dedicated subsection to §3 reporting: (i) per-client means and standard deviations for key covariates, (ii) censoring fractions, (iii) summary statistics of observed event times, and (iv) pairwise inter-client divergence measures (e.g., Wasserstein distance on normalized feature distributions and KL divergence on binned survival times). These additions will directly address the concern that stability results may be split-specific. revision: yes
Circularity Check
No circularity: empirical model comparison with no derivations
full rationale
The paper performs an empirical evaluation of existing survival models (Cox PH, DeepSurv, RSF) and federated optimization strategies (FedAvg, FedProx, FedAdam) on a cross-institutional dataset. No mathematical derivations, predictions from fitted parameters, or uniqueness theorems are claimed. All reported results are direct experimental outcomes (discrimination, calibration, robustness metrics) rather than quantities defined by construction from the paper's own inputs. Self-citations, if present, are not load-bearing for any central claim. The work is self-contained as a standard benchmark study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
High-performance medicine: the convergence of human and artificial intelligence,
E. J. Topol, “High-performance medicine: the convergence of human and artificial intelligence,”Nature medicine, vol. 25, pp. 44–56, 2019
2019
-
[2]
Artificial intelligence index report 2025,
N. Maslej, L. Fattorini, R. Perrault, Y . Gil, V . Parli, N. Kariuki, E. Capstick, A. Reuel, E. Brynjolfsson, J. Etchemendyet al., “Artificial intelligence index report 2025,”arXiv preprint arXiv:2504.07139, 2025
arXiv 2025
-
[3]
Key challenges for delivering clinical impact with artificial intelligence,
C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. Corrado, and D. King, “Key challenges for delivering clinical impact with artificial intelligence,”BMC medicine, vol. 17, no. 1, p. 195, 2019. N. MORENO-BLASCOet al.: EVALUATION OF FEDERATED SURVIVAL ANAL YSIS ON HETEROGENEOUS BREAST CANCER DATA 13 TABLE IX PRACTICALGUIDELINES FORFEDERATEDSURVIVALMODELING ...
2019
-
[4]
The future of digital health with federated learning,
N. Rieke, J. Hancox, W. Li, F. Milletari, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Heinet al., “The future of digital health with federated learning,”NPJ digital medicine, vol. 3, no. 1, p. 119, 2020
2020
-
[5]
[Online]
European Commission, “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation),” 2016. [Online]. Available: https://gdpr-info.eu/
2016
-
[6]
The Health Insurance Portability and Accountability Act of 1996 (HIPAA),
U. D. of Health and H. Services, “The Health Insurance Portability and Accountability Act of 1996 (HIPAA),” 1996. [Online]. Available: https://www.hhs.gov/hipaa/index.html
1996
-
[7]
Ethical machine learning in healthcare,
I. Y . Chen, E. Pierson, S. Rose, S. Joshi, K. Ferryman, and M. Ghassemi, “Ethical machine learning in healthcare,”Annual review of biomedical data science, vol. 4, no. 1, pp. 123–144, 2021
2021
-
[8]
Communication-efficient learning of deep networks from decentralized data,
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics. Pmlr, 2017, pp. 1273– 1282
2017
-
[9]
Federated learning for predicting clinical outcomes in patients with covid-19,
I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsaiet al., “Federated learning for predicting clinical outcomes in patients with covid-19,”Nature medicine, vol. 27, no. 10, pp. 1735–1743, 2021
2021
-
[10]
Privacy-preserving feder- ated brain tumour segmentation,
W. Li, F. Milletar `ı, D. Xu, N. Rieke, J. Hancox, W. Zhu, M. Baust, Y . Cheng, S. Ourselin, M. J. Cardosoet al., “Privacy-preserving feder- ated brain tumour segmentation,” inInternational workshop on machine learning in medical imaging. Springer, 2019, pp. 133–141
2019
-
[11]
Advances and open problems in federated learning,
P. Kairouz and H. B. McMahan, “Advances and open problems in federated learning,”Foundations and trends in machine learning, vol. 14, no. 1-2, pp. 1–210, 2021
2021
-
[12]
On the Convergence of FedAvg on Non-IID Data,
X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the Convergence of FedAvg on Non-IID Data,” inInternational Conference on Learning Representations (ICLR), 2020
2020
-
[13]
Inverting gradients-how easy is it to break privacy in federated learning?
J. Geiping, H. Bauermeister, H. Dr ¨oge, and M. Moeller, “Inverting gradients-how easy is it to break privacy in federated learning?”Ad- vances in neural information processing systems, vol. 33, pp. 16 937– 16 947, 2020
2020
-
[14]
Deep leakage from gradients,
L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,”Advances in neural information processing systems, vol. 32, 2019
2019
-
[15]
Differential privacy: Gradient leakage attacks in federated learning environments,
M. Fernandez-de Retana, U. Zulaika, R. S ´anchez-Corcuera, and A. Almeida, “Differential privacy: Gradient leakage attacks in federated learning environments,”arXiv preprint arXiv:2510.23931, 2025
arXiv 2025
-
[16]
Survival analysis part i: basic concepts and first analyses,
T. G. Clark, M. J. Bradburn, S. B. Love, and D. G. Altman, “Survival analysis part i: basic concepts and first analyses,”British journal of cancer, vol. 89, no. 2, pp. 232–238, 2003
2003
-
[17]
D. G. Kleinbaum and M. Klein,Survival analysis a self-learning text. Springer, 1996
1996
-
[18]
Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,
H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Je- mal, and F. Bray, “Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,”CA: a cancer journal for clinicians, vol. 71, no. 3, pp. 209–249, 2021
2020
-
[19]
Scaling survival analysis in healthcare with federated survival forests: A comparative study on heart failure and breast cancer genomics,
A. Archetti, F. Ieva, and M. Matteucci, “Scaling survival analysis in healthcare with federated survival forests: A comparative study on heart failure and breast cancer genomics,”Future Generation Computer Systems, vol. 149, pp. 343–358, 2023
2023
-
[20]
Federated survival analysis with discrete-time cox models,
M. Andreux, A. Manoel, R. Menuet, C. Saillard, and C. Simpson, “Federated survival analysis with discrete-time cox models,”arXiv preprint arXiv:2006.08997, 2020
arXiv 2006
-
[21]
Developing federated time-to-event scores using heterogeneous real-world survival data,
S. Li, Z. Wang, Y . Shang, Q. Wu, C. Hong, Y . Ning, D. Miao, M. E. H. Ong, B. Chakraborty, and N. Liu, “Developing federated time-to-event scores using heterogeneous real-world survival data,”Computers in Biology and Medicine, vol. 197, p. 111084, 2025
2025
-
[22]
Fedpseudo: Pseudo value-based deep learning models for federated survival analysis,
M. M. Rahman and S. Purushotham, “Fedpseudo: Pseudo value-based deep learning models for federated survival analysis,”arXiv preprint arXiv:2207.05247, 2022
arXiv 2022
-
[23]
Flicu: a federated learning workflow for intensive care unit mortality prediction,
L. Mondrejevski, I. Miliou, A. Montanino, D. Pitts, J. Hollm ´en, and P. Papapetrou, “Flicu: a federated learning workflow for intensive care unit mortality prediction,” in2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2022, pp. 32–37
2022
-
[24]
Collaborative federated learning for healthcare: Multi-modal covid-19 diagnosis at the edge,
A. Qayyum, K. Ahmad, M. A. Ahsan, A. Al-Fuqaha, and J. Qadir, “Collaborative federated learning for healthcare: Multi-modal covid-19 diagnosis at the edge,”IEEE Open Journal of the Computer Society, vol. 3, pp. 172–184, 2022
2022
-
[25]
Hierarchical federated learning based anomaly detection using digital twins for smart healthcare,
D. Gupta, O. Kayode, S. Bhatt, M. Gupta, and A. S. Tosun, “Hierarchical federated learning based anomaly detection using digital twins for smart healthcare,” in2021 IEEE 7th international conference on collaboration and internet computing (CIC). IEEE, 2021, pp. 16–25
2021
-
[26]
Metafed: Feder- ated learning among federations with cyclic knowledge distillation for personalized healthcare,
Y . Chen, W. Lu, X. Qin, J. Wang, and X. Xie, “Metafed: Feder- ated learning among federations with cyclic knowledge distillation for personalized healthcare,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 11, pp. 16 671–16 682, 2023
2023
-
[27]
Flamby: Datasets and benchmarks for cross-silo federated learning in realistic healthcare settings,
J. Ogier du Terrail, S.-S. Ayed, E. Cyffers, F. Grimberg, C. He, R. Loeb, P. Mangold, T. Marchand, O. Marfoq, E. Mushtaqet al., “Flamby: Datasets and benchmarks for cross-silo federated learning in realistic healthcare settings,”Advances in Neural Information Processing Systems, vol. 35, pp. 5315–5334, 2022
2022
-
[28]
The cancer genome atlas pan-cancer analysis project,
J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, and J. M. Stuart, “The cancer genome atlas pan-cancer analysis project,”Nature genetics, vol. 45, no. 10, pp. 1113–1120, 2013
2013
-
[29]
The nci genomic data commons,
A. P. Heath, V . Ferretti, S. Agrawal, M. An, J. C. Angelakos, R. Arya, 14 A PREPRINT TABLE XI FEDERATEDPER-CLIENTPERFORMANCEACROSSDIFFERENTNUMBERS OFCLIENTS Client CoxPH (FedA vg) DeepSurv (FedA vg) RSF C-Index(↑)AUC(↑)IBS(↓)C-Index(↑)AUC(↑)IBS(↓)C-Index(↑)AUC(↑)IBS(↓) 5 Clients C0 0.712±0.139 0.669±0.137 0.169±0.023 0.804±0.055 0.763±0.054 0.162±0.011 0...
2021
-
[30]
Regression models and life-tables,
D. R. Cox, “Regression models and life-tables,”Journal of the royal statistical society: Series B (methodological), vol. 34, no. 2, pp. 187– 202, 1972
1972
-
[31]
Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network,
J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger, “Deepsurv: personalized treatment recommender system using a cox proportional hazards deep neural network,”BMC medical research methodology, vol. 18, no. 1, p. 24, 2018
2018
-
[32]
Deephit: A deep learning approach to survival analysis with competing risks,
C. Lee, W. Zame, J. Yoon, and M. Van Der Schaar, “Deephit: A deep learning approach to survival analysis with competing risks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018
2018
-
[33]
Improved survival analysis by learning shared genomic information from pan-cancer data,
S. Kim, K. Kim, J. Choe, I. Lee, and J. Kang, “Improved survival analysis by learning shared genomic information from pan-cancer data,” Bioinformatics, vol. 36, no. Supplement 1, pp. i389–i398, 2020
2020
-
[34]
H. Ishwaran, U. Kogalur, E. Blackstone, and M. Lauer, “Random survival forests,”Ann. Appl. Statist., vol. 2, no. 3, pp. 841–860, 2008. [Online]. Available: https://arXiv.org/abs/0811.1645v1
Pith/arXiv arXiv 2008
-
[35]
Random forests,
L. Breiman, “Random forests,”Machine learning, vol. 45, no. 1, pp. 5–32, 2001
2001
-
[36]
Random survival forest algorithm for risk stratification and survival prediction in gastric neuroendocrine neoplasms,
T. Liao, T. Su, Y . Lu, L. Huang, W.-Y . Wei, and L.-H. Feng, “Random survival forest algorithm for risk stratification and survival prediction in gastric neuroendocrine neoplasms,”Scientific Reports, vol. 14, no. 1, p. 26969, 2024
2024
-
[37]
Federated optimization in heterogeneous networks,
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V . Smith, “Federated optimization in heterogeneous networks,”Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020
2020
-
[38]
Adaptive federated optimization,
S. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Kone ˇcn`y, S. Kumar, and H. B. McMahan, “Adaptive federated optimization,”arXiv preprint arXiv:2003.00295, 2020
Pith/arXiv arXiv 2003
-
[39]
Adam: A method for stochastic optimization,
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014
Pith/arXiv arXiv 2014
-
[40]
Flower: A friendly federated learning research framework,
D. J. Beutel, T. Topal, A. Mathur, X. Qiu, J. Fernandez-Marques, Y . Gao, L. Sani, K. H. Li, T. Parcollet, P. P. B. de Gusm ˜AG ¸ oet al., “Flower: A friendly federated learning research framework,”arXiv preprint arXiv:2007.14390, 2020
Pith/arXiv arXiv 2007
-
[41]
Evaluating the Yield of Medical Tests,
F. E. Harrell Jr., R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, “Evaluating the Yield of Medical Tests,”JAMA, vol. 247, no. 18, pp. 2543–2546, 1982
1982
-
[42]
Time-dependent ROC curves for censored survival data and a diagnostic marker,
P. J. Heagerty, T. Lumley, and M. S. Pepe, “Time-dependent ROC curves for censored survival data and a diagnostic marker,”Biometrics, vol. 56, no. 2, pp. 337–344, 2000
2000
-
[43]
Assessment and comparison of prognostic classification schemes for survival data,
E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher, “Assessment and comparison of prognostic classification schemes for survival data,” Statistics in Medicine, vol. 18, no. 17-18, pp. 2529–2545, 1999
1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.