PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization
Pith reviewed 2026-06-27 00:58 UTC · model grok-4.3
The pith
PSyGenTAB generates synthetic clinical tabular data by solving a constrained optimization problem that embeds privacy thresholds directly into training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By casting synthetic data generation as a constrained optimization problem and solving it with the Augmented Lagrangian Method, PSyGenTAB directly incorporates configurable privacy constraints into the training objective. This produces tabular records that preserve inter-feature clinical correlations and minority-class diagnostic patterns while meeting explicit privacy thresholds, yielding downstream model performance comparable to real data and reduced vulnerability to re-identification.
What carries the argument
Formulating synthetic clinical tabular data generation as a constrained optimization problem solved via the Augmented Lagrangian Method, with privacy constraints embedded in the training loop.
If this is right
- Models trained on the synthetic data achieve performance comparable to real-data models on both Train-on-Synthetic/Test-on-Real and Train-on-Real/Test-on-Synthetic evaluations.
- Generated records show reduced exact reproduction of original patient entries.
- The framework demonstrates stronger resistance to membership inference attacks than existing synthetic data methods.
- Inter-feature clinical relationships and minority-class diagnostic patterns remain intact across multiple clinically motivated benchmarks.
Where Pith is reading between the lines
- The same constrained-optimization structure could be adapted to generate synthetic data for non-clinical tabular domains that also require strict privacy controls.
- Institutions could use the framework to create shareable synthetic cohorts that support multi-site model training without exchanging raw records.
- If the privacy constraints prove robust under repeated attacks, regulators might accept synthetic data as a compliant alternative for certain model-development workflows.
Load-bearing premise
Embedding privacy constraints directly into model training through the Augmented Lagrangian Method will simultaneously satisfy minimum privacy thresholds and retain clinically meaningful patterns without later degradation of downstream utility.
What would settle it
A controlled experiment in which models trained on the synthetic data exhibit statistically significant drops in diagnostic accuracy on held-out real patient records, or in which membership inference attacks recover patient identities at rates above those reported for the real baseline.
read the original abstract
The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, but existing methods lack principled mechanisms to explicitly manage the privacy-utility trade-off, often degrading clinically meaningful patterns or risking patient re-identification. We present PSyGenTAB, a privacy-preserving generative framework that formulates synthetic healthcare data generation as a constrained optimization problem solved using the Augmented Lagrangian Method. By embedding configurable privacy constraints directly into model training, PSyGenTAB enforces minimum privacy thresholds while maximizing clinical data utility. Across multiple clinically motivated benchmarks, PSyGenTAB preserves inter-feature clinical relationships and minority-class diagnostic patterns essential for reliable health AI. Downstream evaluation using Train-on-Synthetic, Test-on-Real and Train-on-Real, Test-on-Synthetic protocols shows that models trained on synthetic data achieve performance comparable to those trained on real patient records. Privacy auditing further demonstrates reduced exact record reproduction and strong resilience to membership inference attacks. These results establish PSyGenTAB as a principled framework for balancing privacy protection and clinical utility in synthetic healthcare data, supporting secure cross-institutional AI development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PSyGenTAB, a framework for generating synthetic clinical tabular data that formulates the task as a constrained optimization problem solved using the Augmented Lagrangian Method (ALM). By embedding privacy constraints directly into the training process, it aims to enforce minimum privacy thresholds while maximizing clinical utility. The authors claim that across clinically motivated benchmarks, the method preserves inter-feature relationships and minority-class patterns, with downstream models achieving comparable performance to those trained on real data under TOS/TOR and TOR/TOS protocols, and improved privacy metrics against record reproduction and membership inference attacks.
Significance. If the claims hold, this work would offer a significant advancement in synthetic data generation for healthcare by providing a principled, optimization-based approach to the privacy-utility trade-off, potentially enabling more secure data sharing for AI development. The use of ALM for explicit constraint handling is a notable methodological choice that could generalize beyond the presented benchmarks.
major comments (2)
- [Abstract] Abstract: The central claim that ALM enforces minimum privacy thresholds (reduced exact record reproduction and resilience to membership inference) while preserving minority-class diagnostic patterns rests on translating distribution-dependent privacy notions into deterministic, differentiable constraints amenable to dual updates. The abstract provides no formulation details showing how these are expressed as functions of generated samples or their statistics, leaving open whether the method uses hard constraints or soft penalties that may fail to meet thresholds or degrade utility on minority classes.
- [Abstract] The weakest assumption (embedding configurable privacy constraints via ALM to simultaneously enforce thresholds and preserve patterns) is load-bearing; if privacy metrics require post-hoc Monte-Carlo evaluation rather than direct constraint functions, the optimization may not deliver the claimed guarantees without additional verification steps not described in the abstract.
minor comments (1)
- [Abstract] The abstract supplies no implementation details, benchmark definitions, quantitative metrics, or error analysis, making it impossible to verify the optimization's support for the stated claims from the provided text alone.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the abstract. The concerns highlight the need for clearer high-level formulation details in the abstract itself. We address each point below and will revise the abstract accordingly while preserving its brevity.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that ALM enforces minimum privacy thresholds (reduced exact record reproduction and resilience to membership inference) while preserving minority-class diagnostic patterns rests on translating distribution-dependent privacy notions into deterministic, differentiable constraints amenable to dual updates. The abstract provides no formulation details showing how these are expressed as functions of generated samples or their statistics, leaving open whether the method uses hard constraints or soft penalties that may fail to meet thresholds or degrade utility on minority classes.
Authors: We agree that the abstract, as a high-level summary, does not include the explicit functional forms. In the full manuscript (Section 3), privacy constraints are expressed as differentiable functions of generated samples: exact record reproduction is penalized via a soft indicator based on Euclidean distance thresholds to real records, and membership inference resilience uses a differentiable approximation of attack success rate via logistic loss on sample statistics. These enter the ALM as inequality constraints with dual variable updates, functioning as soft penalties that asymptotically enforce thresholds. Minority-class patterns are preserved via separate utility constraints on class-conditional statistics. We will revise the abstract to briefly note that privacy notions are translated into differentiable sample-based functions solved via ALM. revision: yes
-
Referee: [Abstract] The weakest assumption (embedding configurable privacy constraints via ALM to simultaneously enforce thresholds and preserve patterns) is load-bearing; if privacy metrics require post-hoc Monte-Carlo evaluation rather than direct constraint functions, the optimization may not deliver the claimed guarantees without additional verification steps not described in the abstract.
Authors: The optimization uses direct, differentiable constraint functions (as detailed in Section 3) rather than post-hoc Monte-Carlo; the latter is reserved exclusively for final auditing in the experiments. The ALM dual updates operate on the embedded functions to enforce thresholds during training. We acknowledge the abstract does not distinguish this, which could lead to the noted ambiguity. We will revise the abstract to clarify that constraints are direct and differentiable (with post-hoc evaluation used only for reporting). revision: yes
Circularity Check
No circularity detected; formulation is an independent modeling choice
full rationale
The paper frames synthetic data generation as a constrained optimization problem solved via the standard Augmented Lagrangian Method, with privacy constraints embedded as configurable terms. This is an explicit modeling decision rather than a derivation that reduces to fitted inputs or self-citations. Downstream Train-on-Synthetic/Test-on-Real evaluations and privacy audits are presented as separate empirical checks, not forced by construction. No equations, self-citation chains, or renamings appear in the abstract that would create circularity. The approach is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Health insurance portability and accountability act of 1996 (hipaa)
U. Congress, “Health insurance portability and accountability act of 1996 (hipaa).” https://www.hhs.gov/hipaa/index.html, 1996. Accessed May 2025
1996
-
[2]
Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation)
E. Union, “Regulation (eu) 2016/679 of the european parliament and of the council (general data protection regulation).” https://gdpr-info.eu,
2016
-
[3]
Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record,
J. Walonoski, M. Kramer, J. Nichols, A. Quina, C. Moesel, D. Hall, C. Duffett, K. Dube, T. Gallagher, and S. McLachlan, “Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record,”Journal of the American Medical Informatics Association, vol. 0, pp. 1–9, 09 2017
2017
-
[4]
Gen- erating multi-label discrete patient records using generative adversarial networks,
E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, “Gen- erating multi-label discrete patient records using generative adversarial networks,”Machine Learning for Healthcare Conference, pp. 286–305, 2017
2017
-
[5]
Deep learning with differential privacy,
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318, 2016
2016
-
[6]
Generative adversarial nets,
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27, 2014
2014
-
[7]
Gen- erating multi-label discrete patient records using generative adversarial networks,
E. Choi, S. Biswal, B. Malin, J. Duke, W. F. Stewart, and J. Sun, “Gen- erating multi-label discrete patient records using generative adversarial networks,” inProceedings of Machine Learning for Healthcare (MLHC), vol. 68 ofProceedings of Machine Learning Research (PMLR), pp. 286– 305, PMLR, 2017
2017
-
[8]
Auto-Encoding Variational Bayes
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[9]
Privacy-preserving synthetic med- ical data generation using variational autoencoders,
S. Dash, O. G ¨unl¨uk, and D. Wei, “Privacy-preserving synthetic med- ical data generation using variational autoencoders,”arXiv preprint arXiv:2012.15328, 2020
-
[10]
Synthesizing Tabular Data using Generative Adversarial Networks
L. Xu and K. Veeramachaneni, “Synthesizing tabular data using gener- ative adversarial networks,”arXiv preprint arXiv:1811.11264, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[11]
Modeling tabular data using conditional gan,
L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling tabular data using conditional gan,”Advances in neural information processing systems, vol. 32, 2019
2019
-
[12]
Ctab-gan+: Enhancing tabular data synthesis,
Z. Zhao, A. Kunar, R. Birke, and L. Y . Chen, “Ctab-gan+: Enhancing tabular data synthesis,” 2022
2022
-
[13]
Realtabformer: Generating real- istic relational and tabular data using transformers,
A. V . Solatorio and O. Dupriez, “Realtabformer: Generating real- istic relational and tabular data using transformers,”arXiv preprint arXiv:2302.02041, 2023
-
[14]
Language models are realistic tabular data generators,
V . Borisov, K. Seßler, T. Leemann, M. Pawelczyk, and G. Kasneci, “Language models are realistic tabular data generators,”arXiv preprint arXiv:2210.06280, 2022
-
[15]
Tabddpm: Modelling tabular data with diffusion models,
A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko, “Tabddpm: Modelling tabular data with diffusion models,” inInternational Confer- ence on Machine Learning, pp. 17564–17579, PMLR, 2023
2023
-
[16]
Mixed-type tabular data synthesis with score-based diffusion in latent space,
H. Zhang, J. Zhang, B. Srinivasan, Z. Shen, X. Qin, C. Faloutsos, H. Rangwala, and G. Karypis, “Mixed-type tabular data synthesis with score-based diffusion in latent space,” inThe twelfth International Conference on Learning Representations, 2024
2024
-
[17]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” in2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18, IEEE, 2017
2017
-
[18]
Syn- thetic data generation for tabular health records: A systematic review,
M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin, “Syn- thetic data generation for tabular health records: A systematic review,” Neurocomputing, vol. 493, pp. 28–45, 2022
2022
-
[19]
Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation,
Y . Liu, U. R. Acharya, and J. H. Tan, “Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation,”Computer Methods and Programs in Biomedicine, vol. 260, p. 108571, 2025
2025
-
[20]
k-anonymity: A model for protecting privacy,
L. Sweeney, “k-anonymity: A model for protecting privacy,”Interna- tional Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002
2002
-
[21]
l-diversity: Privacy beyond k-anonymity,
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,”ACM Transactions on Knowl- edge Discovery from Data (TKDD), vol. 1, no. 1, pp. 3–es, 2007
2007
-
[22]
t-closeness: Privacy beyond k- anonymity and l-diversity,
N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k- anonymity and l-diversity,” in2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, IEEE, 2007
2007
-
[23]
The future of digital health with federated learning,
N. Rieke, J. Hancox, W. Li, F. Milletari, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, S. Herrmann, J. Shotton, J. Trees, B. Kainz, R. Cobb, B. Glocker, and D. Rueckert, “The future of digital health with federated learning,”npj Digital Medicine, vol. 3, no. 1, p. 119, 2020
2020
-
[24]
Federated learning for healthcare: Systematic review and architecture proposal,
J. Park, J. Yoon, S. Keum, J. Oh, M. Lee, and J.-W. Kim, “Federated learning for healthcare: Systematic review and architecture proposal,” IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 5, pp. 1478–1491, 2021
2021
-
[25]
Deep leakage from gradients,
L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” inAd- vances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 14774–14784, 2019
2019
-
[26]
Differential privacy,
C. Dwork, “Differential privacy,” inAutomata, Languages and Program- ming(M. Bugliesi, B. Preneel, V . Sassone, and I. Wegener, eds.), (Berlin, Heidelberg), pp. 1–12, Springer Berlin Heidelberg, 2006
2006
-
[27]
The algorithmic foundations of differential privacy,
C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,”Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014
2014
-
[28]
Dp-gan: Differentially private consecutive data publishing using generative adversarial nets,
S. Ho, Y . Qu, B. Gu, L. Gao, J. Li, and Y . Xiang, “Dp-gan: Differentially private consecutive data publishing using generative adversarial nets,” Journal of Network and Computer Applications, vol. 185, p. 103066, 2021
2021
-
[29]
Dp-ctgan: Differentially pri- vate medical data generation using ctgans,
M. L. Fang, D. S. Dhami, and K. Kersting, “Dp-ctgan: Differentially pri- vate medical data generation using ctgans,” inInternational Conference on Artificial Intelligence in Medicine, pp. 178–188, Springer, 2022
2022
-
[30]
Pate-gan: Generating synthetic data with differential privacy guarantees,
J. Jordon, J. Yoon, and M. Van Der Schaar, “Pate-gan: Generating synthetic data with differential privacy guarantees,” inInternational conference on learning representations, 2018
2018
-
[31]
Membership inference attacks from first principles,
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, ´U. Erlingsson,et al., “Membership inference attacks from first principles,” in2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914, IEEE, 2021
1914
-
[32]
Synthetic data — what, why and how?,
J. Jordon, L. Szpruch, F. Houssiau, M. Bottarelli, G. Cherubin, C. Maple, S. N. Cohen, and A. Weller, “Synthetic data — what, why and how?,” arXiv preprint arXiv:2205.03257, 2022
-
[33]
A comprehensive evaluation frame- work for synthetic medical tabular data generation,
A. Kurakova and H. Homayouni, “A comprehensive evaluation frame- work for synthetic medical tabular data generation,”ACM Transactions on Computing for Healthcare, 2024. Submitted for publication
2024
-
[34]
SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering,
A. Ilaty, H. Shirazi, and H. Homayouni, “SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering,” 2025. 10 Pages, 2 Supplementary Pages, 6 Tables
2025
-
[35]
Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data,
A. D. Lautrup, T. Hyrup, A. Zimek, and P. Schneider-Kamp, “Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data,”Data Mining and Knowledge Discovery, vol. 39, Dec. 2024
2024
-
[36]
Multiplier and gradient methods,
M. R. Hestenes, “Multiplier and gradient methods,”Journal of Opti- mization Theory and Applications, vol. 4, no. 5, pp. 303–320, 1969
1969
-
[37]
A method for nonlinear constraints in minimization problems,
M. J. D. Powell, “A method for nonlinear constraints in minimization problems,”Optimization, pp. 283–298, 1969. 14
1969
-
[38]
The multiplier method of Hestenes and Powell applied to convex programming,
R. T. Rockafellar, “The multiplier method of Hestenes and Powell applied to convex programming,”Journal of Optimization Theory and Applications, vol. 12, no. 6, pp. 555–562, 1973
1973
-
[39]
D. P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods. New York, NY , USA: Academic Press, 2014
2014
-
[40]
Stochastic inexact augmented lagrangian method for nonconvex expectation constrained optimization,
Z. Li, P.-Y . Chen, S. Liu, S. Lu, and Y . Xu, “Stochastic inexact augmented lagrangian method for nonconvex expectation constrained optimization,”Computational Optimization and Applications, vol. 87, no. 1, pp. 117–147, 2024
2024
-
[41]
Learning constrained optimization with deep augmented lagrangian methods,
J. Kotary and F. Fioretto, “Learning constrained optimization with deep augmented lagrangian methods,” 2024
2024
-
[42]
Two-Player Games for Efficient Non-Convex Constrained Optimization
A. Cotter, H. Jiang, and K. Sridharan, “Two-player games for efficient non-convex constrained optimization,”arXiv preprint arXiv:1804.06500, 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[43]
A multidimensional version of the kolmogorov–smirnov test,
G. Fasano and A. Franceschini, “A multidimensional version of the kolmogorov–smirnov test,”Monthly Notices of the Royal Astronomical Society, vol. 225, no. 1, pp. 155–170, 1987
1987
-
[44]
On information and sufficiency,
S. Kullback and R. A. Leibler, “On information and sufficiency,”The annals of mathematical statistics, vol. 22, no. 1, pp. 79–86, 1951
1951
-
[45]
The jensen-shannon divergence,
M. Men ´endez, J. Pardo, L. Pardo, and M. Pardo, “The jensen-shannon divergence,”Journal of the Franklin Institute, vol. 334, no. 2, pp. 307– 318, 1997
1997
-
[46]
Synthetic data metrics,
“Synthetic data metrics,” 04 2024. Version 0.14.0
2024
-
[47]
On the generation and evaluation of tabular data using gans,
B. Brenninkmeijer, A. de Vries, E. Marchiori, and Y . Hille, “On the generation and evaluation of tabular data using gans,”PhD diss., Radboud University, 2019
2019
-
[48]
Feature selection based on mutual information with correlation coefficient,
H. Zhou, X. Wang, and R. Zhu, “Feature selection based on mutual information with correlation coefficient,”Applied intelligence, vol. 52, no. 5, pp. 5457–5474, 2022
2022
-
[49]
Tabsyndex: A universal metric for robust evaluation of synthetic tabular data,
V . S. Chundawat, A. K. Tarun, M. Mandal, M. Lahoti, and P. Narang, “Tabsyndex: A universal metric for robust evaluation of synthetic tabular data,”arXiv preprint arXiv:2207.05295, 2022
-
[50]
Using dynamic time warping to find pat- terns in time series,
D. J. Berndt and J. Clifford, “Using dynamic time warping to find pat- terns in time series,” inProceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, p. 359–370, AAAI Press, 1994
1994
-
[51]
Tunnicliffe Wilson, “Time series analysis: Forecasting and control,5th edition, by george e
G. Tunnicliffe Wilson, “Time series analysis: Forecasting and control,5th edition, by george e. p. box, gwilym m. jenkins, gregory c. reinsel and greta m. ljung, 2015. published by john wiley and sons inc., hoboken, new jersey, pp. 712. isbn: 978-1-118-67502-1,”Journal of Time Series Analysis, vol. 37, pp. n/a–n/a, 03 2016
2015
-
[52]
Data Synthesis based on Generative Adversarial Networks
N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y . Kim, “Data synthesis based on generative adversarial networks,”arXiv preprint arXiv:1806.03384, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[53]
UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set
D. Dua and C. Graff, “UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set.” UCI Machine Learning Repository,
-
[54]
University of California, Irvine, School of Information and Computer Sciences
-
[55]
Incidence of diagnosed diabetes in adults — united states, 1980–2014,
N. R. Burrows, I. Hora, L. S. Geiss, E. W. Gregg, and A. Albright, “Incidence of diagnosed diabetes in adults — united states, 1980–2014,” MMWR Morbidity and Mortality Weekly Report, vol. 66, no. 12, pp. 306– 309, 2017
1980
-
[56]
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,
D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,”BMC Medical Informatics and Decision Making, vol. 20, no. 16, 2020
2020
-
[57]
Generating production rules from decision trees,
J. R. Quinlan, “Generating production rules from decision trees,” in Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI), pp. 304–307, 1987
1987
-
[58]
Bupa liver disorders dataset
B. M. R. Ltd., “Bupa liver disorders dataset.” https://archive.ics.uci.edu/ ml/datasets/Liver+Disorders, 1990. UCI Machine Learning Repository
1990
-
[59]
Uci machine learning repository
D. Dua and C. Graff, “Uci machine learning repository.” https://archive. ics.uci.edu/ml/datasets/Lung+Cancer, 2019. Lung Cancer Dataset
2019
-
[60]
Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from colombia, peru and mexico,
F. M. Palechor and A. de la Hoz Manotas, “Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from colombia, peru and mexico,”Data in Brief, vol. 25, p. 104344, 2019
2019
-
[61]
Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,
M. A. Little, P. E. McSharry, S. J. Roberts, D. A. E. Costello, and I. M. Moroz, “Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,”BioMedical Engineering OnLine, vol. 6, no. 23, 2007
2007
-
[62]
Uci machine learning repository
D. Dua and C. Graff, “Uci machine learning repository.” https://archive. ics.uci.edu/ml, 2019. Adult Census Income Dataset
2019
-
[63]
PIRvision FoG presence detection
M. Emad-ud din, “PIRvision FoG presence detection.” UCI Machine Learning Repository, 2023
2023
-
[64]
Vietnam banking transaction dataset for fraud detection
H. T. Nguyen and T. N. Tran, “Vietnam banking transaction dataset for fraud detection.” Public financial transaction dataset, 2020. If sourced from Kaggle or Zenodo, include DOI here
2020
-
[65]
Democra- tizing tabular data access with an open-source synthetic-data sdk,
I. Krchova, M. V . Vieyra, M. Scriminaci, and A. Sidorenko, “Democra- tizing tabular data access with an open-source synthetic-data sdk,” 2025
2025
-
[66]
El Emam and L
K. El Emam and L. Mosquera,Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. O’Reilly Media, 2020
2020
-
[67]
General and specific utility measures for synthetic data,
J. Snoke, G. M. Raab, B. Nowok, C. Dibben, and A. Slavkovic, “General and specific utility measures for synthetic data,”Journal of the Royal Statistical Society: Series A, vol. 181, no. 3, pp. 663–688, 2018
2018
-
[68]
The dcr delusion: Measuring the privacy risk of synthetic data,
Z. Yao, N. Kr ˇco, G. Ganev, and Y .-A. de Montjoye, “The dcr delusion: Measuring the privacy risk of synthetic data,” inComputer Security – ESORICS 2025(V . Nicomette, A. Benzekri, N. Boulahia-Cuppens, and J. Vaidya, eds.), (Cham), pp. 469–487, Springer Nature Switzerland, 2026
2025
-
[69]
El Emam, L
K. El Emam, L. Mosquera, and R. Hoptroff,Practical synthetic data generation : balancing privacy and the broad availability of data / Khaled El Emam, Lucy Mosquera, and Richard Hoptroff.Sebastopol, CA: O’Reilly Media, 1st edition ed., 2020
2020
-
[70]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” 2017
2017
-
[71]
Evaluating differentially private machine learning in practice,
B. Jayaraman and D. Evans, “Evaluating differentially private machine learning in practice,” 2019
2019
-
[72]
P. J. Huber and E. M. Ronchetti,Robust Statistics. Wiley, 2nd ed., 2009
2009
-
[73]
Dwork and A
C. Dwork and A. Roth,The Algorithmic Foundations of Differential Privacy, vol. 9 ofFoundations and Trends in Theoretical Computer Science. Now Publishers Inc., 2014
2014
-
[74]
Reinforced Augmented La- grangian for constrained optimization in deep learning,
H. Yuan, X. Lian, J. Li, J. Liu, and B. Xu, “Reinforced Augmented La- grangian for constrained optimization in deep learning,”arXiv preprint arXiv:2106.01134, 2021. 15 APPENDIX Table IX provides an exhaustive, multi-page breakdown of downstream predictive efficacy across all classifiers and sampling strategies. This summary highlights the Train-on- Synthe...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.