pith. sign in

arxiv: 2604.17250 · v1 · submitted 2026-04-19 · 📊 stat.AP

Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation

Pith reviewed 2026-05-10 05:39 UTC · model grok-4.3

classification 📊 stat.AP
keywords generative data augmentationgeriatric caredischarge destinationadversarial random forestslogistic regressionsynthetic clinical datapost-operative prediction
0
0 comments X p. Extension

The pith

Generative data augmentation with adversarial random forests boosts logistic regression accuracy for predicting geriatric post-operative discharge destinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to overcome limited clinical data when building models to predict where older adults will be discharged after surgery for fall-related injuries. It tests two ways to expand the SURGE-Ahead dataset: merging it with a trauma register plus imputation, and creating synthetic records via adversarial random forests. Logistic regression shows clear gains in accuracy and AUC after augmentation, while random forest and TabPFN already perform strongly and change little. The work targets a practical gap in geriatric perioperative care where better discharge forecasts could support planning and reduce complications.

Core claim

Using synthetic data generated by adversarial random forests from the SURGE-Ahead project and German geriatric trauma register datasets improves multinomial logistic regression performance on post-operative discharge destination prediction, raising accuracy from 0.70 to 0.81 and ROC AUC from 0.85 to 0.92, while random forest and TabPFN reach approximately 0.84 accuracy and 0.94 AUC with minimal effect from the added data.

What carries the argument

Adversarial random forests that produce synthetic patient records to augment the original limited clinical dataset before training discharge prediction models.

Load-bearing premise

The synthetic data accurately mirrors the statistical properties and relationships present in the real geriatric patient records without introducing biases or artifacts.

What would settle it

Evaluating the models on an independent real-world cohort of geriatric patients collected after the study period and finding that augmented-data training yields no accuracy or AUC gain over real-data-only training would disprove the reported benefit.

read the original abstract

Data scarcity challenges the development and implementation of innovative healthcare solutions. In geriatrics, fall-related injuries are a major cause of hospitalization, functional decline, and mortality in older adults. Optimizing post-operative discharge planning can mitigate these outcomes, but limited data hinders predictive model development. Here, we explored generative machine learning approaches to augment data from the SURGE-Ahead project (Supporting SURgery with Geriatric Co-Management and AI), an initiative addressing geriatric perioperative care. Data from the German geriatric trauma register (AltersTraumaZentrum; ATZ) were incorporated using two strategies: (i) combining SURGE-Ahead and ATZ register data with imputation (ComImp) and (ii) generating synthetic data from SURGE-Ahead alone or combined SURGE-Ahead and the ATZ register datasets with Adversarial random forests (ARF). Predictive models, including multinomial logistic regression, random forest, and a prior-fitted transformer (TabPFN), were trained and evaluated using standard performance metrics: accuracy, area under the receiver operating characteristic curve (ROC AUC), Brier score, and the logistic loss. Random forest and TabPFN performed well (accuracy around 0.84 and AUC around 0.94) and were largely unaffected by augmentation. Logistic regression benefited from augmented data, with predictive performance improving from 0.70 to 0.81 for accuracy and 0.85 to 0.92 for AUC. These results highlight generative data augmentation as a viable approach to enhance simpler predictive models in geriatric care and emphasize the importance of method selection when addressing data scarcity in heterogeneous clinical populations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates the use of generative data augmentation with Adversarial Random Forests (ARF) to address data scarcity in predicting post-operative discharge destinations for geriatric patients using data from the SURGE-Ahead project and the German geriatric trauma register (ATZ). It compares combining real datasets with imputation against generating synthetic data, and evaluates multinomial logistic regression, random forest, and TabPFN models using accuracy, ROC AUC, Brier score, and logistic loss. The key finding is that augmentation substantially improves logistic regression (accuracy 0.70 to 0.81, AUC 0.85 to 0.92) while having minimal effect on the stronger baseline models.

Significance. If the synthetic data generation is properly controlled and does not introduce artifacts, this work demonstrates a practical approach to enhancing simpler, interpretable models in data-limited clinical domains such as geriatric perioperative care. The differential benefit to logistic regression versus already-strong tree-based and transformer models is a noteworthy empirical observation that could inform model selection under data scarcity.

major comments (2)
  1. [Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.
  2. [Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.
minor comments (2)
  1. [Abstract] Abstract: While accuracy and AUC improvements are highlighted, the abstract does not report the corresponding Brier score or logistic loss values for the augmented versus baseline settings, even though these metrics are stated to have been computed.
  2. [Introduction] The manuscript uses the abbreviation 'ComImp' for the combined real-data imputation strategy without an explicit expansion on first use, which could reduce immediate clarity for readers unfamiliar with the project.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which help strengthen the methodological transparency and statistical rigor of our work. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Methods] Methods: The hyperparameters for the Adversarial Random Forests (ARF) used to generate synthetic data are not specified, nor is any procedure described for fitting them, selecting them, or validating that the synthetic samples match the real data distribution (e.g., via Kolmogorov-Smirnov tests or propensity score checks). This is load-bearing for the central claim, as any spurious correlations introduced by ARF could be disproportionately exploited by logistic regression.

    Authors: We agree that explicit documentation of the ARF configuration and validation is necessary to support the central claim. In the revised manuscript, we will expand the Methods section to specify all hyperparameters (e.g., number of trees, maximum depth, and other settings from the arf implementation), detail the fitting and synthetic data generation procedure, and add validation steps including Kolmogorov-Smirnov tests on marginal distributions as well as checks for introduced correlations (such as propensity score overlap or pairwise dependency comparisons). These additions will confirm that the augmentation does not introduce artifacts disproportionately benefiting logistic regression. revision: yes

  2. Referee: [Results] Results: Performance metrics are reported as single point estimates (e.g., logistic regression accuracy rising from 0.70 to 0.81) without standard errors, confidence intervals, or results across multiple data splits or random seeds. In small-sample geriatric datasets this omission prevents assessment of whether the reported gains are reliable or sensitive to particular train/test partitions.

    Authors: We acknowledge that single-point estimates limit evaluation of robustness in small-sample settings. In the revised Results, we will report performance metrics (accuracy, AUC, Brier score, and logistic loss) averaged across multiple random seeds and repeated train-test splits (e.g., 10 repetitions of stratified 80/20 splits), accompanied by standard errors and 95% confidence intervals. This will demonstrate the stability of the observed gains for logistic regression while confirming the limited effect on random forest and TabPFN. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core result is an empirical comparison: models trained on real data versus real+ARF-augmented data are evaluated with standard held-out accuracy, AUC, Brier score and log-loss on real test cases. No equation or claim reduces a reported performance gain to a fitted parameter by construction, nor does any load-bearing premise rest solely on a self-citation whose content is itself unverified. The augmentation step (ARF fitted to training rows) is independent of the downstream test-set metrics, satisfying the criteria for a non-circular empirical study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that generative models can safely augment clinical data; limited information from the abstract prevents a full ledger.

free parameters (1)
  • ARF hyperparameters
    Adversarial random forest training involves tunable parameters that control synthetic data quality and are chosen or fitted during generation.
axioms (1)
  • domain assumption Synthetic data from ARF preserves the joint distribution of real clinical variables sufficiently for downstream prediction improvement
    This assumption is required to interpret the reported gains as genuine rather than artifacts of the generator.

pith-pipeline@v0.9.0 · 5600 in / 1334 out tokens · 58286 ms · 2026-05-10T05:39:48.190813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 7 canonical work pages

  1. [1]

    Health data poverty: an assailable barrier to equitable digital health care

    Ibrahim H, Liu X, Zariffa N, Morris AD, Denniston AK. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health 2021 Apr;3(4):e260–e265. doi: 10.1016/S2589-7500(20)30317-4

  2. [2]

    World Report on Ageing and Health

    World Health Organization. World Report on Ageing and Health. World Health Organization; 2015

  3. [3]

    Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040

    Chen Y, Dai F, Huang S, Qi D, Peng C, Zhang A, Wang Y, Gu Y, Guo J. Global, regional, and national burden of falls among older adults: findings from the Global Burden of Disease Study 2021 and Projections to 2040. Npj Aging 2025 Oct 9;11(1):85. doi: 10.1038/s41514-025-00275-4

  4. [4]

    Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study

    Salis F, Mandas A. Physical Performance and Falling Risk Are Associated with Five-Year Mortality in Older Adults: An Observational Cohort Study. Med Kaunas Lith 2023 May 17;59(5):964. PMID:37241196

  5. [5]

    Discharge planning from hospital

    Gonçalves-Bradley DC, Lannin NA, Clemson L, Cameron ID, Shepperd S. Discharge planning from hospital. Cochrane Database Syst Rev 2022 Feb 24;2(2):CD000313. PMID:35199849

  6. [6]

    Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review

    Nicolet A, Al-Gobari M, Perraudin C, Wagner J, Peytremann-Bridevaux I, Marti J. Association between continuity of care (COC), healthcare use and costs: what can we learn from claims data? A rapid review. BMC Health Serv Res 2022 May 16;22(1):658. PMID:35578226

  7. [7]

    Supporting SURgery with GEriatric Co-Management and AI (SURGE-Ahead): A study protocol for the development of a digital geriatrician

    Leinert C, Fotteler M, Kocar TD, Dallmeier D, Kestler HA, Wolf D, Gebhard F, Uihlein A, Steger F, Kilian R, Mueller-Stierlin AS, Michalski CW, Mihaljevic A, Bolenz C, Zengerling F, Leinert E, Schütze S, Hoffmann TK, Onder G, Andersen-Ranberg K, O’Neill D, Wehling M, Schobel J, Swoboda W, Denkinger M, SURGE-Ahead Study Group. Supporting SURgery with GEriat...

  8. [8]

    SURGE-ahead postoperative delirium prediction: external validation and open-source library

    Kocar TD, Wolf P, Leinert C, Brefka S, Fotteler ML, Uihlein A, Wezel F, Wehling M, Rahbari N, Kestler H, Gebhard F, Dallmeier D, Denkinger M. SURGE-ahead postoperative delirium prediction: external validation and open-source library. Eur Geriatr Med 2025 Mar 10; doi: 10.1007/s41999-025-01180-5

  9. [9]

    Combining datasets to improve model fitting

    Nguyen T, Khadka R, Phan N, Yazidi A, Halvorsen P, Riegler MA. Combining datasets to improve model fitting. 2023 Int Jt Conf Neural Netw IJCNN IEEE; 2023. p. 1–9

  10. [10]

    Handbook of Missing Data Methodology

    Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of Missing Data Methodology. CRC Press; 2015

  11. [11]

    Flexible Imputation of Missing Data

    Van Buuren S. Flexible Imputation of Missing Data. CRC Press; 2018

  12. [12]

    MissForest—non-parametric missing value imputation for mixed-type data

    Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 2012;28(1):112–118

  13. [13]

    Improving classification accuracy using data augmentation on small data sets

    Moreno-Barea FJ, Jerez JM, Franco L. Improving classification accuracy using data augmentation on small data sets. Expert Syst Appl 2020;161:113696

  14. [14]

    Synthetic data generation methods in healthcare: A review on open-source tools and methods

    Pezoulas VC, Zaridis DI, Mylona E, Androutsos C, Apostolidis K, Tachos NS, Fotiadis DI. Synthetic data generation methods in healthcare: A review on open-source tools and methods. Comput Struct Biotechnol J 2024;23:2892–2910

  15. [15]

    Adversarial random forests for density estimation and generative modeling

    Watson DS, Blesch K, Kapar J, Wright MN. Adversarial random forests for density estimation and generative modeling. Proc 26th Int Conf Artif Intell Stat PMLR; 2023. p. 5357–5375

  16. [16]

    synthpop: Bespoke creation of synthetic data in R

    Nowok B, Raab GM, Dibben C. synthpop: Bespoke creation of synthetic data in R. J Stat Softw 2016;74:1–26

  17. [17]

    Standard-of-Care vs

    Leinert C, Brefka S, Fotteler ML, Müller-Stierlin AS, Gebhard F, Rahbari N, Bolenz C, Kestler H, Dallmeier D, Denkinger M, Kocar TD. Standard-of-Care vs. Expert- Recommended Discharge Destinations for Geriatric Surgical Inpatients: A Prospective Observational Cohort Study. Eur Geriatr Med 2025;in press

  18. [18]

    Generative adversarial nets

    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Adv Neural Inf Process Syst 2014

  19. [19]

    Unsupervised learning with random forest predictors

    Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat 2006;15(1):118–138

  20. [20]

    CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests

    Dandl S, Blesch K, Freiesleben T, König G, Kapar J, Bischl B, Wright MN. CountARFactuals–generating plausible model-agnostic counterfactual explanations with adversarial random forests. World Conf Explain Artif Intell Springer; 2024. p. 85– 107

  21. [21]

    Conditional feature importance with generative modeling using adversarial random forests

    Blesch K, Koenen N, Kapar J, Golchian P, Burk L, Loecher M, Wright MN. Conditional feature importance with generative modeling using adversarial random forests. Proc AAAI Conf Artif Intell 2025. p. 15596–15604

  22. [22]

    Missing Value Imputation With Adversarial Random Forests—MissARF

    Golchian P, Kapar J, Watson DS, Wright MN. Missing Value Imputation With Adversarial Random Forests—MissARF. Stat Med 2026;45(3–5):e70379. doi: 10.1002/sim.70379

  23. [23]

    Random forests

    Breiman L. Random forests. Mach Learn 2001;45(1):5–32

  24. [24]

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    Hollmann N, Müller S, Eggensperger K, Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. NeurIPS 2022 First Table Represent Workshop 2022

  25. [25]

    All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously

    Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J Mach Learn Res 2019;20(177):1–81

  26. [26]

    Handling imbalanced medical datasets: review of a decade of research

    Salmi M, Atif D, Oliva D, Abraham A, Ventura S. Handling imbalanced medical datasets: review of a decade of research. Artif Intell Rev 2024;57(10):273

  27. [27]

    Imbalanced data problem in machine learning: A review

    Altalhan M, Algarni A, Alouane MT-H. Imbalanced data problem in machine learning: A review. IEEE Access 2025

  28. [28]

    Interpretable Machine Learning: A Guide for Making Black Box Models Explainable

    Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 3rd ed. 2025. Available from: https://christophm.github.io/interpretable- ml-book

  29. [29]

    Testing conditional independence in supervised learning algorithms

    Watson DS, Wright MN. Testing conditional independence in supervised learning algorithms. Mach Learn 2021;110(8):2107–2129

  30. [30]

    Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review

    Leinert C, Fotteler ML, Kocar TD, Wolf J, Beissel L, Grummich K, Dallmeier D, Denkinger M. Identifying key predictors of appropriate discharge destinations for older inpatients in acute care: A scoping review. Interact J Med Res 2025;14(e76582). doi: 10.2196/76582

  31. [31]

    Inference for the Generalization Error

    Nadeau C, Bengio Y. Inference for the Generalization Error. Mach Learn 2003;52:239– 281

  32. [32]

    Relating the partial dependence plot and permutation feature importance to the data generating process

    Molnar C, Freiesleben T, König G, Herbinger J, Reisinger T, Casalicchio G, Wright MN, Bischl B. Relating the partial dependence plot and permutation feature importance to the data generating process. World Conf Explain Artif Intell Springer; 2023. p. 456–479

  33. [33]

    Vo, Thu Nguyen, Hugo Lewi Hammer, Michael A

    Vo TL, Nguyen T, Hammer HL, Riegler MA, Halvorsen P. Explainability of Machine Learning Models under Missing Data. 2024. Available from: arxiv.org/abs/2407.00411v2

  34. [34]

    The Impact of Missing Data Imputation on Model Performance and Explainability

    Erez IB, Flokstra J, Poel M, van Keulen M. The Impact of Missing Data Imputation on Model Performance and Explainability. BNAICBeNeLearn 2024 Jt Int Sci Conf AI Mach Learn 2024

  35. [35]

    Imputation Uncertainty in Interpretable Machine Learning Methods

    Golchian P, Wright MN. Imputation Uncertainty in Interpretable Machine Learning Methods. 2025. Available from: arxiv.org/abs/2512.17689v1

  36. [36]

    mice: Multivariate imputation by chained equations in R

    Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw 2011;45(3):1–67

  37. [37]

    A Value for n-Person Games

    Shapley LS. A Value for n-Person Games. In: Kuhn HW, Tucker AW, editors. Contrib Theory Games Princeton: Princeton University Press; 1953. p. 307–317

  38. [38]

    Rosenau L, Behrend P, Wiedekopf J, Gruendner J, Ingenerf J. Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study. JMIR Med Inform 2024 July 23;12:e57005. PMID:39042420

  39. [39]

    transfers

    Brefka S, Dallmeier D, Mühlbauer V, von Arnim CAF, Bollig C, Onder G, Petrovic M, Schönfeldt-Lecuona C, Seibert M, Torbahn G, Voigt-Radloff S, Haefeli WE, Bauer JM, Denkinger MD, Medication and Quality of Life Research Group. A Proposal for the Retrospective Identification and Categorization of Older People With Functional Impairments in Scientific Studie...