Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation

arxiv: 2604.17250 · v1 · submitted 2026-04-19 · 📊 stat.AP

Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation

Pegah Golchian , Pauline Maier , Thomas Kocar , Marvin N. Wright This is my paper

Pith reviewed 2026-05-10 05:39 UTC · model grok-4.3

classification 📊 stat.AP

keywords generative data augmentationgeriatric caredischarge destinationadversarial random forestslogistic regressionsynthetic clinical datapost-operative prediction

0 comments p. Extension

The pith

Generative data augmentation with adversarial random forests boosts logistic regression accuracy for predicting geriatric post-operative discharge destinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to overcome limited clinical data when building models to predict where older adults will be discharged after surgery for fall-related injuries. It tests two ways to expand the SURGE-Ahead dataset: merging it with a trauma register plus imputation, and creating synthetic records via adversarial random forests. Logistic regression shows clear gains in accuracy and AUC after augmentation, while random forest and TabPFN already perform strongly and change little. The work targets a practical gap in geriatric perioperative care where better discharge forecasts could support planning and reduce complications.

Core claim

Using synthetic data generated by adversarial random forests from the SURGE-Ahead project and German geriatric trauma register datasets improves multinomial logistic regression performance on post-operative discharge destination prediction, raising accuracy from 0.70 to 0.81 and ROC AUC from 0.85 to 0.92, while random forest and TabPFN reach approximately 0.84 accuracy and 0.94 AUC with minimal effect from the added data.

What carries the argument

Adversarial random forests that produce synthetic patient records to augment the original limited clinical dataset before training discharge prediction models.

Load-bearing premise

The synthetic data accurately mirrors the statistical properties and relationships present in the real geriatric patient records without introducing biases or artifacts.

What would settle it

Evaluating the models on an independent real-world cohort of geriatric patients collected after the study period and finding that augmented-data training yields no accuracy or AUC gain over real-data-only training would disprove the reported benefit.

read the original abstract

Data scarcity challenges the development and implementation of innovative healthcare solutions. In geriatrics, fall-related injuries are a major cause of hospitalization, functional decline, and mortality in older adults. Optimizing post-operative discharge planning can mitigate these outcomes, but limited data hinders predictive model development. Here, we explored generative machine learning approaches to augment data from the SURGE-Ahead project (Supporting SURgery with Geriatric Co-Management and AI), an initiative addressing geriatric perioperative care. Data from the German geriatric trauma register (AltersTraumaZentrum; ATZ) were incorporated using two strategies: (i) combining SURGE-Ahead and ATZ register data with imputation (ComImp) and (ii) generating synthetic data from SURGE-Ahead alone or combined SURGE-Ahead and the ATZ register datasets with Adversarial random forests (ARF). Predictive models, including multinomial logistic regression, random forest, and a prior-fitted transformer (TabPFN), were trained and evaluated using standard performance metrics: accuracy, area under the receiver operating characteristic curve (ROC AUC), Brier score, and the logistic loss. Random forest and TabPFN performed well (accuracy around 0.84 and AUC around 0.94) and were largely unaffected by augmentation. Logistic regression benefited from augmented data, with predictive performance improving from 0.70 to 0.81 for accuracy and 0.85 to 0.92 for AUC. These results highlight generative data augmentation as a viable approach to enhance simpler predictive models in geriatric care and emphasize the importance of method selection when addressing data scarcity in heterogeneous clinical populations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARF augmentation lifts logistic regression on geriatric discharge prediction from 0.70/0.85 to 0.81/0.92 accuracy/AUC but leaves random forest and TabPFN unchanged.

read the letter

This paper applies adversarial random forests to generate synthetic rows from the SURGE-Ahead and ATZ geriatric datasets and shows that the added data improves multinomial logistic regression for post-operative discharge destination while stronger models stay flat. They train on real-plus-synthetic mixes, evaluate on held-out real data, and report accuracy, AUC, Brier score, and logistic loss across the three models. The logistic regression gains are concrete and the setup is a direct test of whether augmentation helps when data are scarce in a clinical population. That is the useful part: a practical, side-by-side comparison on real register data rather than another abstract claim about generative methods. The work is narrow but honest about its scope. The main limitation is the lack of detail on data splits, ARF hyperparameter choices, imputation steps, and any checks that the synthetic samples do not create spurious linear signals that only logistic regression exploits. Without those controls it is hard to know how robust the reported lift really is. The fact that only the weakest model improves also caps how far the result generalizes. This is the kind of applied methods paper that belongs in a reading group focused on small-sample medical prediction or data augmentation in healthcare. It deserves peer review because the empirical comparison is clear enough to be worth referee scrutiny even if the authors need to add more methodological transparency and code.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates the use of generative data augmentation with Adversarial Random Forests (ARF) to address data scarcity in predicting post-operative discharge destinations for geriatric patients using data from the SURGE-Ahead project and the German geriatric trauma register (ATZ). It compares combining real datasets with imputation against generating synthetic data, and evaluates multinomial logistic regression, random forest, and TabPFN models using accuracy, ROC AUC, Brier score, and logistic loss. The key finding is that augmentation substantially improves logistic regression (accuracy 0.70 to 0.81, AUC 0.85 to 0.92) while having minimal effect on the stronger baseline models.

Significance. If the synthetic data generation is properly controlled and does not introduce artifacts, this work demonstrates a practical approach to enhancing simpler, interpretable models in data-limited clinical domains such as geriatric perioperative care. The differential benefit to logistic regression versus already-strong tree-based and transformer models is a noteworthy empirical observation that could inform model selection under data scarcity.

major comments (2)

[Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.
[Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.

minor comments (2)

[Abstract] Abstract: While accuracy and AUC improvements are highlighted, the abstract does not report the corresponding Brier score or logistic loss values for the augmented versus baseline settings, even though these metrics are stated to have been computed.
[Introduction] The manuscript uses the abbreviation 'ComImp' for the combined real-data imputation strategy without an explicit expansion on first use, which could reduce immediate clarity for readers unfamiliar with the project.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which help strengthen the methodological transparency and statistical rigor of our work. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.

Authors: We agree that explicit documentation of the ARF configuration and validation is necessary to support the central claim. In the revised manuscript, we will expand the Methods section to specify all hyperparameters (e.g., number of trees, maximum depth, and other settings from the arf implementation), detail the fitting and synthetic data generation procedure, and add validation steps including Kolmogorov-Smirnov tests on marginal distributions as well as checks for introduced correlations (such as propensity score overlap or pairwise dependency comparisons). These additions will confirm that the augmentation does not introduce artifacts disproportionately benefiting logistic regression. revision: yes
Referee: [Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.

Authors: We acknowledge that single-point estimates limit evaluation of robustness in small-sample settings. In the revised Results, we will report performance metrics (accuracy, AUC, Brier score, and logistic loss) averaged across multiple random seeds and repeated train-test splits (e.g., 10 repetitions of stratified 80/20 splits), accompanied by standard errors and 95% confidence intervals. This will demonstrate the stability of the observed gains for logistic regression while confirming the limited effect on random forest and TabPFN. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core result is an empirical comparison: models trained on real data versus real+ARF-augmented data are evaluated with standard held-out accuracy, AUC, Brier score and log-loss on real test cases. No equation or claim reduces a reported performance gain to a fitted parameter by construction, nor does any load-bearing premise rest solely on a self-citation whose content is itself unverified. The augmentation step (ARF fitted to training rows) is independent of the downstream test-set metrics, satisfying the criteria for a non-circular empirical study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that generative models can safely augment clinical data; limited information from the abstract prevents a full ledger.

free parameters (1)

ARF hyperparameters
Adversarial random forest training involves tunable parameters that control synthetic data quality and are chosen or fitted during generation.

axioms (1)

domain assumption Synthetic data from ARF preserves the joint distribution of real clinical variables sufficiently for downstream prediction improvement
This assumption is required to interpret the reported gains as genuine rather than artifacts of the generator.

pith-pipeline@v0.9.0 · 5600 in / 1334 out tokens · 58286 ms · 2026-05-10T05:39:48.190813+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 7 canonical work pages

[1]

Health data poverty: an assailable barrier to equitable digital health care

Ibrahim H, Liu X, Zariffa N, Morris AD, Denniston AK. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health 2021 Apr;3(4):e260–e265. doi: 10.1016/S2589-7500(20)30317-4

work page doi:10.1016/s2589-7500(20)30317-4 2021
[2]

World Report on Ageing and Health

World Health Organization. World Report on Ageing and Health. World Health Organization; 2015

2015
[3]

Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040

Chen Y, Dai F, Huang S, Qi D, Peng C, Zhang A, Wang Y, Gu Y, Guo J. Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040. Npj Aging 2025 Oct 9;11(1):85. doi: 10.1038/s41514-025-00275-4

work page doi:10.1038/s41514-025-00275-4 2021
[4]

Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study

Salis F, Mandas A. Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study. Med Kaunas Lith 2023 May 17;59(5):964. PMID:37241196

2023
[5]

Discharge planning from hospital

Gonçalves-Bradley DC, Lannin NA, Clemson L, Cameron ID, Shepperd S. Discharge planning from hospital. Cochrane Database Syst Rev 2022 Feb 24;2(2):CD000313. PMID:35199849

2022
[6]

Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review

Nicolet A, Al-Gobari M, Perraudin C, Wagner J, Peytremann-Bridevaux I, Marti J. Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review. BMC Health Serv Res 2022 May 16;22(1):658. PMID:35578226

2022
[7]

Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician

Leinert C, Fotteler M, Kocar TD, Dallmeier D, Kestler HA, Wolf D, Gebhard F, Uihlein A, Steger F, Kilian R, Mueller-Stierlin AS, Michalski CW, Mihaljevic A, Bolenz C, Zengerling F, Leinert E, Schütze S, Hoffmann TK, Onder G, Andersen-Ranberg K, O’Neill D, Wehling M, Schobel J, Swoboda W, Denkinger M, SURGE-Ahead Study Group. Supporting SURgery with GEriat...

2023
[8]

SURGE-ahead postoperative delirium prediction: external validation and open-source library

Kocar TD, Wolf P, Leinert C, Brefka S, Fotteler ML, Uihlein A, Wezel F, Wehling M, Rahbari N, Kestler H, Gebhard F, Dallmeier D, Denkinger M. SURGE-ahead postoperative delirium prediction: external validation and open-source library. Eur Geriatr Med 2025 Mar 10; doi: 10.1007/s41999-025-01180-5

work page doi:10.1007/s41999-025-01180-5 2025
[9]

Combining datasets to improve model fitting

Nguyen T, Khadka R, Phan N, Yazidi A, Halvorsen P, Riegler MA. Combining datasets to improve model fitting. 2023 Int Jt Conf Neural Netw IJCNN IEEE; 2023. p. 1–9

2023
[10]

Handbook of Missing Data Methodology

Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of Missing Data Methodology. CRC Press; 2015

2015
[11]

Flexible Imputation of Missing Data

Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2018

2018
[12]

MissForest—non-parametric missing value imputation for mixed-type data

Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 2012;28(1):112–118

2012
[13]

Improving classification accuracy using data augmentation on small data sets

Moreno-Barea FJ, Jerez JM, Franco L. Improving classification accuracy using data augmentation on small data sets. Expert Syst Appl 2020;161:113696

2020
[14]

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, Fotiadis DI. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J 2024;23:2892–2910

2024
[15]

Adversarial random forests for density estimation and generative modeling

Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. Proc 26th Int Conf Artif Intell Stat PMLR; 2023. p. 5357–5375

2023
[16]

synthpop: Bespoke creation of synthetic data in R

Nowok B, Raab GM, Dibben C. synthpop: Bespoke creation of synthetic data in R. J Stat Softw 2016;74:1–26

2016
[17]

Standard-of-Care vs

Leinert C, Brefka S, Fotteler ML, Müller-Stierlin AS, Gebhard F, Rahbari N, Bolenz C, Kestler H, Dallmeier D, Denkinger M, Kocar TD. Standard-of-Care vs. Expert- Recommended Discharge Destinations for Geriatric Surgical Inpatients: A Prospective Observational Cohort Study. Eur Geriatr Med 2025;in press

2025
[18]

Generative adversarial nets

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Adv Neural Inf Process Syst 2014

2014
[19]

Unsupervised learning with random forest predictors

Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat 2006;15(1):118–138

2006
[20]

CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests

Dandl S, Blesch K, Freiesleben T, König G, Kapar J, Bischl B, Wright MN. CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests. World Conf Explain Artif Intell Springer; 2024. p. 85– 107

2024
[21]

Conditional feature importance with generative modeling using adversarial random forests

Blesch K, Koenen N, Kapar J, Golchian P, Burk L, Loecher M, Wright MN. Conditional feature importance with generative modeling using adversarial random forests. Proc AAAI Conf Artif Intell 2025. p. 15596–15604

2025
[22]

Missing Value Imputation With Adversarial Random Forests—MissARF

Golchian P, Kapar J, Watson DS, Wright MN. Missing Value Imputation With Adversarial Random Forests—MissARF. Stat Med 2026;45(3–5):e70379. doi: 10.1002/sim.70379

work page doi:10.1002/sim.70379 2026
[23]

Random forests

Breiman L. Random forests. Mach Learn 2001;45(1):5–32

2001
[24]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Hollmann N, Müller S, Eggensperger K, Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. NeurIPS 2022 First Table Represent Workshop 2022

2022
[25]

All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously

Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J Mach Learn Res 2019;20(177):1–81

2019
[26]

Handling imbalanced medical datasets: review of a decade of research

Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: review of a decade of research. Artif Intell Rev 2024;57(10):273

2024
[27]

Imbalanced data problem in machine learning: A review

Altalhan M, Algarni A, Alouane MT-H. Imbalanced data problem in machine learning: A review. IEEE Access 2025

2025
[28]

Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 3rd ed. 2025. Available from: https://christophm.github.io/interpretable- ml-book

2025
[29]

Testing conditional independence in supervised learning algorithms

Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Mach Learn 2021;110(8):2107–2129

2021
[30]

Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review

Leinert C, Fotteler ML, Kocar TD, Wolf J, Beissel L, Grummich K, Dallmeier D, Denkinger M. Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review. Interact J Med Res 2025;14(e76582). doi: 10.2196/76582

work page doi:10.2196/76582 2025
[31]

Inference for the Generalization Error

Nadeau C, Bengio Y. Inference for the Generalization Error. Mach Learn 2003;52:239– 281

2003
[32]

Relating the partial dependence plot and permutation feature importance to the data generating process

Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. World Conf Explain Artif Intell Springer; 2023. p. 456–479

2023
[33]

Vo, Thu Nguyen, Hugo Lewi Hammer, Michael A

Vo TL, Nguyen T, Hammer HL, Riegler MA, Halvorsen P. Explainability of Machine Learning Models under Missing Data. 2024. Available from: arxiv.org/abs/2407.00411v2

work page arXiv 2024
[34]

The Impact of Missing Data Imputation on Model Performance and Explainability

Erez IB, Flokstra J, Poel M, van Keulen M. The Impact of Missing Data Imputation on Model Performance and Explainability. BNAICBeNeLearn 2024 Jt Int Sci Conf AI Mach Learn 2024

2024
[35]

Imputation Uncertainty in Interpretable Machine Learning Methods

Golchian P, Wright MN. Imputation Uncertainty in Interpretable Machine Learning Methods. 2025. Available from: arxiv.org/abs/2512.17689v1

work page arXiv 2025
[36]

mice: Multivariate imputation by chained equations in R

Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw 2011;45(3):1–67

2011
[37]

A Value for n-Person Games

Shapley LS. A Value for n-Person Games. In: Kuhn HW, Tucker AW, editors. Contrib Theory Games Princeton: Princeton University Press; 1953. p. 307–317

1953
[38]

Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024 July 23;12:e57005. PMID:39042420

2024
[39]

transfers

Brefka S, Dallmeier D, Mühlbauer V, von Arnim CAF, Bollig C, Onder G, Petrovic M, Schönfeldt-Lecuona C, Seibert M, Torbahn G, Voigt-Radloff S, Haefeli WE, Bauer JM, Denkinger MD, Medication and Quality of Life Research Group. A Proposal for the Retrospective Identification and Categorization of Older People With Functional Impairments in Scientific Studie...

2019

[1] [1]

Health data poverty: an assailable barrier to equitable digital health care

Ibrahim H, Liu X, Zariffa N, Morris AD, Denniston AK. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health 2021 Apr;3(4):e260–e265. doi: 10.1016/S2589-7500(20)30317-4

work page doi:10.1016/s2589-7500(20)30317-4 2021

[2] [2]

World Report on Ageing and Health

World Health Organization. World Report on Ageing and Health. World Health Organization; 2015

2015

[3] [3]

Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040

Chen Y, Dai F, Huang S, Qi D, Peng C, Zhang A, Wang Y, Gu Y, Guo J. Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040. Npj Aging 2025 Oct 9;11(1):85. doi: 10.1038/s41514-025-00275-4

work page doi:10.1038/s41514-025-00275-4 2021

[4] [4]

Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study

Salis F, Mandas A. Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study. Med Kaunas Lith 2023 May 17;59(5):964. PMID:37241196

2023

[5] [5]

Discharge planning from hospital

Gonçalves-Bradley DC, Lannin NA, Clemson L, Cameron ID, Shepperd S. Discharge planning from hospital. Cochrane Database Syst Rev 2022 Feb 24;2(2):CD000313. PMID:35199849

2022

[6] [6]

Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review

Nicolet A, Al-Gobari M, Perraudin C, Wagner J, Peytremann-Bridevaux I, Marti J. Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review. BMC Health Serv Res 2022 May 16;22(1):658. PMID:35578226

2022

[7] [7]

Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician

Leinert C, Fotteler M, Kocar TD, Dallmeier D, Kestler HA, Wolf D, Gebhard F, Uihlein A, Steger F, Kilian R, Mueller-Stierlin AS, Michalski CW, Mihaljevic A, Bolenz C, Zengerling F, Leinert E, Schütze S, Hoffmann TK, Onder G, Andersen-Ranberg K, O’Neill D, Wehling M, Schobel J, Swoboda W, Denkinger M, SURGE-Ahead Study Group. Supporting SURgery with GEriat...

2023

[8] [8]

SURGE-ahead postoperative delirium prediction: external validation and open-source library

Kocar TD, Wolf P, Leinert C, Brefka S, Fotteler ML, Uihlein A, Wezel F, Wehling M, Rahbari N, Kestler H, Gebhard F, Dallmeier D, Denkinger M. SURGE-ahead postoperative delirium prediction: external validation and open-source library. Eur Geriatr Med 2025 Mar 10; doi: 10.1007/s41999-025-01180-5

work page doi:10.1007/s41999-025-01180-5 2025

[9] [9]

Combining datasets to improve model fitting

Nguyen T, Khadka R, Phan N, Yazidi A, Halvorsen P, Riegler MA. Combining datasets to improve model fitting. 2023 Int Jt Conf Neural Netw IJCNN IEEE; 2023. p. 1–9

2023

[10] [10]

Handbook of Missing Data Methodology

Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of Missing Data Methodology. CRC Press; 2015

2015

[11] [11]

Flexible Imputation of Missing Data

Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2018

2018

[12] [12]

MissForest—non-parametric missing value imputation for mixed-type data

Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 2012;28(1):112–118

2012

[13] [13]

Improving classification accuracy using data augmentation on small data sets

Moreno-Barea FJ, Jerez JM, Franco L. Improving classification accuracy using data augmentation on small data sets. Expert Syst Appl 2020;161:113696

2020

[14] [14]

Synthetic data generation methods in healthcare: A review on open-source tools and methods

Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, Fotiadis DI. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J 2024;23:2892–2910

2024

[15] [15]

Adversarial random forests for density estimation and generative modeling

Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. Proc 26th Int Conf Artif Intell Stat PMLR; 2023. p. 5357–5375

2023

[16] [16]

synthpop: Bespoke creation of synthetic data in R

Nowok B, Raab GM, Dibben C. synthpop: Bespoke creation of synthetic data in R. J Stat Softw 2016;74:1–26

2016

[17] [17]

Standard-of-Care vs

Leinert C, Brefka S, Fotteler ML, Müller-Stierlin AS, Gebhard F, Rahbari N, Bolenz C, Kestler H, Dallmeier D, Denkinger M, Kocar TD. Standard-of-Care vs. Expert- Recommended Discharge Destinations for Geriatric Surgical Inpatients: A Prospective Observational Cohort Study. Eur Geriatr Med 2025;in press

2025

[18] [18]

Generative adversarial nets

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Adv Neural Inf Process Syst 2014

2014

[19] [19]

Unsupervised learning with random forest predictors

Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat 2006;15(1):118–138

2006

[20] [20]

CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests

Dandl S, Blesch K, Freiesleben T, König G, Kapar J, Bischl B, Wright MN. CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests. World Conf Explain Artif Intell Springer; 2024. p. 85– 107

2024

[21] [21]

Conditional feature importance with generative modeling using adversarial random forests

Blesch K, Koenen N, Kapar J, Golchian P, Burk L, Loecher M, Wright MN. Conditional feature importance with generative modeling using adversarial random forests. Proc AAAI Conf Artif Intell 2025. p. 15596–15604

2025

[22] [22]

Missing Value Imputation With Adversarial Random Forests—MissARF

Golchian P, Kapar J, Watson DS, Wright MN. Missing Value Imputation With Adversarial Random Forests—MissARF. Stat Med 2026;45(3–5):e70379. doi: 10.1002/sim.70379

work page doi:10.1002/sim.70379 2026

[23] [23]

Random forests

Breiman L. Random forests. Mach Learn 2001;45(1):5–32

2001

[24] [24]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Hollmann N, Müller S, Eggensperger K, Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. NeurIPS 2022 First Table Represent Workshop 2022

2022

[25] [25]

All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously

Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J Mach Learn Res 2019;20(177):1–81

2019

[26] [26]

Handling imbalanced medical datasets: review of a decade of research

Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: review of a decade of research. Artif Intell Rev 2024;57(10):273

2024

[27] [27]

Imbalanced data problem in machine learning: A review

Altalhan M, Algarni A, Alouane MT-H. Imbalanced data problem in machine learning: A review. IEEE Access 2025

2025

[28] [28]

Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 3rd ed. 2025. Available from: https://christophm.github.io/interpretable- ml-book

2025

[29] [29]

Testing conditional independence in supervised learning algorithms

Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Mach Learn 2021;110(8):2107–2129

2021

[30] [30]

Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review

Leinert C, Fotteler ML, Kocar TD, Wolf J, Beissel L, Grummich K, Dallmeier D, Denkinger M. Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review. Interact J Med Res 2025;14(e76582). doi: 10.2196/76582

work page doi:10.2196/76582 2025

[31] [31]

Inference for the Generalization Error

Nadeau C, Bengio Y. Inference for the Generalization Error. Mach Learn 2003;52:239– 281

2003

[32] [32]

Relating the partial dependence plot and permutation feature importance to the data generating process

Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. World Conf Explain Artif Intell Springer; 2023. p. 456–479

2023

[33] [33]

Vo, Thu Nguyen, Hugo Lewi Hammer, Michael A

Vo TL, Nguyen T, Hammer HL, Riegler MA, Halvorsen P. Explainability of Machine Learning Models under Missing Data. 2024. Available from: arxiv.org/abs/2407.00411v2

work page arXiv 2024

[34] [34]

The Impact of Missing Data Imputation on Model Performance and Explainability

Erez IB, Flokstra J, Poel M, van Keulen M. The Impact of Missing Data Imputation on Model Performance and Explainability. BNAICBeNeLearn 2024 Jt Int Sci Conf AI Mach Learn 2024

2024

[35] [35]

Imputation Uncertainty in Interpretable Machine Learning Methods

Golchian P, Wright MN. Imputation Uncertainty in Interpretable Machine Learning Methods. 2025. Available from: arxiv.org/abs/2512.17689v1

work page arXiv 2025

[36] [36]

mice: Multivariate imputation by chained equations in R

Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw 2011;45(3):1–67

2011

[37] [37]

A Value for n-Person Games

Shapley LS. A Value for n-Person Games. In: Kuhn HW, Tucker AW, editors. Contrib Theory Games Princeton: Princeton University Press; 1953. p. 307–317

1953

[38] [38]

Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024 July 23;12:e57005. PMID:39042420

2024

[39] [39]

transfers

Brefka S, Dallmeier D, Mühlbauer V, von Arnim CAF, Bollig C, Onder G, Petrovic M, Schönfeldt-Lecuona C, Seibert M, Torbahn G, Voigt-Radloff S, Haefeli WE, Bauer JM, Denkinger MD, Medication and Quality of Life Research Group. A Proposal for the Retrospective Identification and Categorization of Older People With Functional Impairments in Scientific Studie...

2019