Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation
Pith reviewed 2026-05-10 05:39 UTC · model grok-4.3
The pith
Generative data augmentation with adversarial random forests boosts logistic regression accuracy for predicting geriatric post-operative discharge destinations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using synthetic data generated by adversarial random forests from the SURGE-Ahead project and German geriatric trauma register datasets improves multinomial logistic regression performance on post-operative discharge destination prediction, raising accuracy from 0.70 to 0.81 and ROC AUC from 0.85 to 0.92, while random forest and TabPFN reach approximately 0.84 accuracy and 0.94 AUC with minimal effect from the added data.
What carries the argument
Adversarial random forests that produce synthetic patient records to augment the original limited clinical dataset before training discharge prediction models.
Load-bearing premise
The synthetic data accurately mirrors the statistical properties and relationships present in the real geriatric patient records without introducing biases or artifacts.
What would settle it
Evaluating the models on an independent real-world cohort of geriatric patients collected after the study period and finding that augmented-data training yields no accuracy or AUC gain over real-data-only training would disprove the reported benefit.
read the original abstract
Data scarcity challenges the development and implementation of innovative healthcare solutions. In geriatrics, fall-related injuries are a major cause of hospitalization, functional decline, and mortality in older adults. Optimizing post-operative discharge planning can mitigate these outcomes, but limited data hinders predictive model development. Here, we explored generative machine learning approaches to augment data from the SURGE-Ahead project (Supporting SURgery with Geriatric Co-Management and AI), an initiative addressing geriatric perioperative care. Data from the German geriatric trauma register (AltersTraumaZentrum; ATZ) were incorporated using two strategies: (i) combining SURGE-Ahead and ATZ register data with imputation (ComImp) and (ii) generating synthetic data from SURGE-Ahead alone or combined SURGE-Ahead and the ATZ register datasets with Adversarial random forests (ARF). Predictive models, including multinomial logistic regression, random forest, and a prior-fitted transformer (TabPFN), were trained and evaluated using standard performance metrics: accuracy, area under the receiver operating characteristic curve (ROC AUC), Brier score, and the logistic loss. Random forest and TabPFN performed well (accuracy around 0.84 and AUC around 0.94) and were largely unaffected by augmentation. Logistic regression benefited from augmented data, with predictive performance improving from 0.70 to 0.81 for accuracy and 0.85 to 0.92 for AUC. These results highlight generative data augmentation as a viable approach to enhance simpler predictive models in geriatric care and emphasize the importance of method selection when addressing data scarcity in heterogeneous clinical populations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates the use of generative data augmentation with Adversarial Random Forests (ARF) to address data scarcity in predicting post-operative discharge destinations for geriatric patients using data from the SURGE-Ahead project and the German geriatric trauma register (ATZ). It compares combining real datasets with imputation against generating synthetic data, and evaluates multinomial logistic regression, random forest, and TabPFN models using accuracy, ROC AUC, Brier score, and logistic loss. The key finding is that augmentation substantially improves logistic regression (accuracy 0.70 to 0.81, AUC 0.85 to 0.92) while having minimal effect on the stronger baseline models.
Significance. If the synthetic data generation is properly controlled and does not introduce artifacts, this work demonstrates a practical approach to enhancing simpler, interpretable models in data-limited clinical domains such as geriatric perioperative care. The differential benefit to logistic regression versus already-strong tree-based and transformer models is a noteworthy empirical observation that could inform model selection under data scarcity.
major comments (2)
- [Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.
- [Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.
minor comments (2)
- [Abstract] Abstract: While accuracy and AUC improvements are highlighted, the abstract does not report the corresponding Brier score or logistic loss values for the augmented versus baseline settings, even though these metrics are stated to have been computed.
- [Introduction] The manuscript uses the abbreviation 'ComImp' for the combined real-data imputation strategy without an explicit expansion on first use, which could reduce immediate clarity for readers unfamiliar with the project.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments, which help strengthen the methodological transparency and statistical rigor of our work. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.
Authors: We agree that explicit documentation of the ARF configuration and validation is necessary to support the central claim. In the revised manuscript, we will expand the Methods section to specify all hyperparameters (e.g., number of trees, maximum depth, and other settings from the arf implementation), detail the fitting and synthetic data generation procedure, and add validation steps including Kolmogorov-Smirnov tests on marginal distributions as well as checks for introduced correlations (such as propensity score overlap or pairwise dependency comparisons). These additions will confirm that the augmentation does not introduce artifacts disproportionately benefiting logistic regression. revision: yes
-
Referee: [Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.
Authors: We acknowledge that single-point estimates limit evaluation of robustness in small-sample settings. In the revised Results, we will report performance metrics (accuracy, AUC, Brier score, and logistic loss) averaged across multiple random seeds and repeated train-test splits (e.g., 10 repetitions of stratified 80/20 splits), accompanied by standard errors and 95% confidence intervals. This will demonstrate the stability of the observed gains for logistic regression while confirming the limited effect on random forest and TabPFN. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's core result is an empirical comparison: models trained on real data versus real+ARF-augmented data are evaluated with standard held-out accuracy, AUC, Brier score and log-loss on real test cases. No equation or claim reduces a reported performance gain to a fitted parameter by construction, nor does any load-bearing premise rest solely on a self-citation whose content is itself unverified. The augmentation step (ARF fitted to training rows) is independent of the downstream test-set metrics, satisfying the criteria for a non-circular empirical study.
Axiom & Free-Parameter Ledger
free parameters (1)
- ARF hyperparameters
axioms (1)
- domain assumption Synthetic data from ARF preserves the joint distribution of real clinical variables sufficiently for downstream prediction improvement
Reference graph
Works this paper leans on
-
[1]
Health data poverty: an assailable barrier to equitable digital health care
Ibrahim H, Liu X, Zariffa N, Morris AD, Denniston AK. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health 2021 Apr;3(4):e260–e265. doi: 10.1016/S2589-7500(20)30317-4
-
[2]
World Report on Ageing and Health
World Health Organization. World Report on Ageing and Health. World Health Organization; 2015
2015
-
[3]
Chen Y, Dai F, Huang S, Qi D, Peng C, Zhang A, Wang Y, Gu Y, Guo J. Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040. Npj Aging 2025 Oct 9;11(1):85. doi: 10.1038/s41514-025-00275-4
-
[4]
Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study
Salis F, Mandas A. Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study. Med Kaunas Lith 2023 May 17;59(5):964. PMID:37241196
2023
-
[5]
Discharge planning from hospital
Gonçalves-Bradley DC, Lannin NA, Clemson L, Cameron ID, Shepperd S. Discharge planning from hospital. Cochrane Database Syst Rev 2022 Feb 24;2(2):CD000313. PMID:35199849
2022
-
[6]
Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review
Nicolet A, Al-Gobari M, Perraudin C, Wagner J, Peytremann-Bridevaux I, Marti J. Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review. BMC Health Serv Res 2022 May 16;22(1):658. PMID:35578226
2022
-
[7]
Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician
Leinert C, Fotteler M, Kocar TD, Dallmeier D, Kestler HA, Wolf D, Gebhard F, Uihlein A, Steger F, Kilian R, Mueller-Stierlin AS, Michalski CW, Mihaljevic A, Bolenz C, Zengerling F, Leinert E, Schütze S, Hoffmann TK, Onder G, Andersen-Ranberg K, O’Neill D, Wehling M, Schobel J, Swoboda W, Denkinger M, SURGE-Ahead Study Group. Supporting SURgery with GEriat...
2023
-
[8]
SURGE-ahead postoperative delirium prediction: external validation and open-source library
Kocar TD, Wolf P, Leinert C, Brefka S, Fotteler ML, Uihlein A, Wezel F, Wehling M, Rahbari N, Kestler H, Gebhard F, Dallmeier D, Denkinger M. SURGE-ahead postoperative delirium prediction: external validation and open-source library. Eur Geriatr Med 2025 Mar 10; doi: 10.1007/s41999-025-01180-5
-
[9]
Combining datasets to improve model fitting
Nguyen T, Khadka R, Phan N, Yazidi A, Halvorsen P, Riegler MA. Combining datasets to improve model fitting. 2023 Int Jt Conf Neural Netw IJCNN IEEE; 2023. p. 1–9
2023
-
[10]
Handbook of Missing Data Methodology
Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of Missing Data Methodology. CRC Press; 2015
2015
-
[11]
Flexible Imputation of Missing Data
Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2018
2018
-
[12]
MissForest—non-parametric missing value imputation for mixed-type data
Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 2012;28(1):112–118
2012
-
[13]
Improving classification accuracy using data augmentation on small data sets
Moreno-Barea FJ, Jerez JM, Franco L. Improving classification accuracy using data augmentation on small data sets. Expert Syst Appl 2020;161:113696
2020
-
[14]
Synthetic data generation methods in healthcare: A review on open-source tools and methods
Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, Fotiadis DI. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J 2024;23:2892–2910
2024
-
[15]
Adversarial random forests for density estimation and generative modeling
Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. Proc 26th Int Conf Artif Intell Stat PMLR; 2023. p. 5357–5375
2023
-
[16]
synthpop: Bespoke creation of synthetic data in R
Nowok B, Raab GM, Dibben C. synthpop: Bespoke creation of synthetic data in R. J Stat Softw 2016;74:1–26
2016
-
[17]
Standard-of-Care vs
Leinert C, Brefka S, Fotteler ML, Müller-Stierlin AS, Gebhard F, Rahbari N, Bolenz C, Kestler H, Dallmeier D, Denkinger M, Kocar TD. Standard-of-Care vs. Expert- Recommended Discharge Destinations for Geriatric Surgical Inpatients: A Prospective Observational Cohort Study. Eur Geriatr Med 2025;in press
2025
-
[18]
Generative adversarial nets
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Adv Neural Inf Process Syst 2014
2014
-
[19]
Unsupervised learning with random forest predictors
Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat 2006;15(1):118–138
2006
-
[20]
CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests
Dandl S, Blesch K, Freiesleben T, König G, Kapar J, Bischl B, Wright MN. CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests. World Conf Explain Artif Intell Springer; 2024. p. 85– 107
2024
-
[21]
Conditional feature importance with generative modeling using adversarial random forests
Blesch K, Koenen N, Kapar J, Golchian P, Burk L, Loecher M, Wright MN. Conditional feature importance with generative modeling using adversarial random forests. Proc AAAI Conf Artif Intell 2025. p. 15596–15604
2025
-
[22]
Missing Value Imputation With Adversarial Random Forests—MissARF
Golchian P, Kapar J, Watson DS, Wright MN. Missing Value Imputation With Adversarial Random Forests—MissARF. Stat Med 2026;45(3–5):e70379. doi: 10.1002/sim.70379
-
[23]
Random forests
Breiman L. Random forests. Mach Learn 2001;45(1):5–32
2001
-
[24]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Hollmann N, Müller S, Eggensperger K, Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. NeurIPS 2022 First Table Represent Workshop 2022
2022
-
[25]
All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously
Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J Mach Learn Res 2019;20(177):1–81
2019
-
[26]
Handling imbalanced medical datasets: review of a decade of research
Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: review of a decade of research. Artif Intell Rev 2024;57(10):273
2024
-
[27]
Imbalanced data problem in machine learning: A review
Altalhan M, Algarni A, Alouane MT-H. Imbalanced data problem in machine learning: A review. IEEE Access 2025
2025
-
[28]
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable
Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 3rd ed. 2025. Available from: https://christophm.github.io/interpretable- ml-book
2025
-
[29]
Testing conditional independence in supervised learning algorithms
Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Mach Learn 2021;110(8):2107–2129
2021
-
[30]
Leinert C, Fotteler ML, Kocar TD, Wolf J, Beissel L, Grummich K, Dallmeier D, Denkinger M. Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review. Interact J Med Res 2025;14(e76582). doi: 10.2196/76582
-
[31]
Inference for the Generalization Error
Nadeau C, Bengio Y. Inference for the Generalization Error. Mach Learn 2003;52:239– 281
2003
-
[32]
Relating the partial dependence plot and permutation feature importance to the data generating process
Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. World Conf Explain Artif Intell Springer; 2023. p. 456–479
2023
-
[33]
Vo, Thu Nguyen, Hugo Lewi Hammer, Michael A
Vo TL, Nguyen T, Hammer HL, Riegler MA, Halvorsen P. Explainability of Machine Learning Models under Missing Data. 2024. Available from: arxiv.org/abs/2407.00411v2
-
[34]
The Impact of Missing Data Imputation on Model Performance and Explainability
Erez IB, Flokstra J, Poel M, van Keulen M. The Impact of Missing Data Imputation on Model Performance and Explainability. BNAICBeNeLearn 2024 Jt Int Sci Conf AI Mach Learn 2024
2024
-
[35]
Imputation Uncertainty in Interpretable Machine Learning Methods
Golchian P, Wright MN. Imputation Uncertainty in Interpretable Machine Learning Methods. 2025. Available from: arxiv.org/abs/2512.17689v1
-
[36]
mice: Multivariate imputation by chained equations in R
Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw 2011;45(3):1–67
2011
-
[37]
A Value for n-Person Games
Shapley LS. A Value for n-Person Games. In: Kuhn HW, Tucker AW, editors. Contrib Theory Games Princeton: Princeton University Press; 1953. p. 307–317
1953
-
[38]
Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024 July 23;12:e57005. PMID:39042420
2024
-
[39]
transfers
Brefka S, Dallmeier D, Mühlbauer V, von Arnim CAF, Bollig C, Onder G, Petrovic M, Schönfeldt-Lecuona C, Seibert M, Torbahn G, Voigt-Radloff S, Haefeli WE, Bauer JM, Denkinger MD, Medication and Quality of Life Research Group. A Proposal for the Retrospective Identification and Categorization of Older People With Functional Impairments in Scientific Studie...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.