pith. sign in

arxiv: 2501.01785 · v1 · pith:CDANHIZHnew · submitted 2025-01-03 · 💻 cs.LG · cs.AI· cs.CY

Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

Pith reviewed 2026-05-23 06:05 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CY
keywords synthetic dataalgorithmic fairnessprivacylearning analyticsDECAFmachine learningfairness algorithmsdata generation
0
0 comments X

The pith

DECAF achieves the best privacy-fairness balance among synthetic data generators, and fairness algorithms improve synthetic data more than real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether synthetic data can simultaneously support privacy and fairness in machine learning models for learning analytics. It compares multiple synthetic data generation methods on privacy, fairness, and utility metrics. The study also checks if standard fairness pre-processing steps work better when applied to synthetic data instead of real data. The findings indicate a path to fairer and more private models by combining these techniques, which matters for educational applications where both concerns are acute.

Core claim

The DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, it suffers in utility as reflected in predictive accuracy. Applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

What carries the argument

The DEbiasing CAusal Fairness (DECAF) algorithm, which generates synthetic data while enforcing causal fairness constraints, and the empirical comparison of its performance against other generators and fairness methods on privacy and fairness metrics.

If this is right

  • Synthetic data can enhance both privacy and fairness simultaneously in LA models.
  • Pre-processing fairness on synthetic data yields superior fairness outcomes compared to real data.
  • Trade-offs in predictive accuracy must be managed when prioritizing privacy and fairness.
  • This combination provides a practical strategy for fairer learning analytics without direct use of sensitive real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The results may extend to other sensitive data domains like healthcare if similar metrics hold.
  • Future work could explore whether different fairness definitions alter the observed advantages of DECAF.
  • Integrating utility optimization into DECAF might address its accuracy limitations.
  • This suggests synthetic data generation as a preprocessing step worth standardizing in fair ML pipelines.

Load-bearing premise

The selected privacy, fairness, and utility metrics along with the specific datasets used serve as reliable indicators of performance in actual learning analytics deployments.

What would settle it

Empirical results on additional datasets or with alternative metrics where another generator outperforms DECAF on the privacy-fairness balance or where fairness pre-processing does not improve more on synthetic data.

Figures

Figures reproduced from arXiv: 2501.01785 by George Siemens, Mohammad Khalil, Oscar Deho, Qinyi Liu, Sam Urmian, Srecko Joksimovic.

Figure 1
Figure 1. Figure 1: The overall flow of our experiments. It starts with (1) synthetic data generation and privacy evaluation, (2) training of bas eline and fair models on both real and synthetic data, and (3) evaluation of baseline and fair models for fairness and predictive accuracy. 𝑐. A TPR value of 0 indicates perfect fairness, a positive value means the unprivileged group has a higher true positive rate, and a negative v… view at source ↗
read the original abstract

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts an empirical comparative study of synthetic data generators (including DECAF) and fairness pre-processing algorithms in learning analytics settings. It evaluates trade-offs among privacy, fairness, and utility metrics, concluding that DECAF offers the strongest privacy-fairness balance (at the expense of predictive accuracy) and that applying standard fairness pre-processors to synthetic data yields larger fairness gains than when applied to the original real data.

Significance. If the experimental outcomes are robust, the work supplies practical evidence that synthetic data plus fairness pre-processing can jointly advance privacy and fairness in LA models, a domain where both concerns are acute. The comparative design across multiple generators and algorithms is a strength, as is the explicit reporting of utility degradation for the top privacy-fairness performer. No machine-checked proofs or parameter-free derivations are present, but the falsifiable metric-based claims allow direct replication checks.

major comments (2)
  1. [§4, Table 2] §4 (Experimental Setup) and Table 2: the claim that 'DECAF achieves the best balance' requires an explicit scalar or Pareto criterion; the text does not state whether balance is defined by a weighted sum, dominance count, or threshold on the reported privacy and fairness scores, making it impossible to verify the ranking without re-deriving the ordering from raw numbers.
  2. [§5.2] §5.2 (Fairness Improvement on Synthetic Data): the statement that pre-processing 'improves fairness even more than when applied to real data' is load-bearing for the central recommendation, yet the section supplies no statistical test (e.g., paired t-test or Wilcoxon) or confidence intervals on the fairness deltas, and does not report whether the same random seeds and hyper-parameters were used for the real-data and synthetic-data fairness runs.
minor comments (2)
  1. [Abstract] Abstract: lists no datasets, sample sizes, or exact metric definitions; readers must reach §4 to learn these details.
  2. [Tables] Notation: 'privacy' and 'fairness' scores are used without a single consolidated table that also includes the utility (accuracy) column for every generator-algorithm pair.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and will revise the manuscript accordingly to improve clarity and statistical rigor.

read point-by-point responses
  1. Referee: [§4, Table 2] §4 (Experimental Setup) and Table 2: the claim that 'DECAF achieves the best balance' requires an explicit scalar or Pareto criterion; the text does not state whether balance is defined by a weighted sum, dominance count, or threshold on the reported privacy and fairness scores, making it impossible to verify the ranking without re-deriving the ordering from raw numbers.

    Authors: We agree that an explicit definition of the balance criterion is needed for verifiability. In the original analysis, DECAF was selected because it simultaneously minimized privacy leakage (across membership inference and attribute inference attacks) while achieving the lowest fairness violations (demographic parity and equalized odds) among the generators tested, corresponding to Pareto dominance in the privacy-fairness plane. We will revise §4 and Table 2 to state explicitly that balance is defined by Pareto dominance (no other generator improves both metrics without degrading at least one), and we will add the raw metric values plus a short dominance table for direct verification. revision: yes

  2. Referee: [§5.2] §5.2 (Fairness Improvement on Synthetic Data): the statement that pre-processing 'improves fairness even more than when applied to real data' is load-bearing for the central recommendation, yet the section supplies no statistical test (e.g., paired t-test or Wilcoxon) or confidence intervals on the fairness deltas, and does not report whether the same random seeds and hyper-parameters were used for the real-data and synthetic-data fairness runs.

    Authors: We acknowledge the need for statistical support on this central claim. The fairness pre-processing runs on real and synthetic data used identical random seeds, hyper-parameters, and train/test splits to ensure comparability. We will add Wilcoxon signed-rank tests on the per-dataset fairness deltas (with p-values) and 95% confidence intervals on the mean improvements, plus an explicit statement confirming the shared experimental controls. These additions will appear in §5.2 and the associated tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical study

full rationale

The paper is a comparative empirical study of synthetic data generators and fairness pre-processing algorithms on learning analytics datasets. All claims rest on reported experimental outcomes for privacy, fairness, and utility metrics rather than any mathematical derivations, fitted parameters renamed as predictions, or self-citation chains. No equations, ansatzes, or uniqueness theorems appear in the abstract or framing; the structure is self-contained against external benchmarks with no load-bearing reductions to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the work is a standard empirical comparison of existing generators and algorithms.

pith-pipeline@v0.9.0 · 5711 in / 1026 out tokens · 48954 ms · 2026-05-23T06:05:31.916630+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 1 internal anchor

  1. [1]

    Adel Abroshan et al. 2024. Improving Fairness in Machine Learning via Synthetic Data Generation. In Proceedings of the 41st International Conference on Machine Learning, Vol. 238. PMLR. https://proceedings.mlr.press/v238/abroshan24a/abroshan24a.pdf

  2. [2]

    Mahed Abroshan, Andrew Elliott, and Mohammad Mahdi Khalili. 2024. Imposing Fairness Constraints in Synthetic Data Generation. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 238). 2269–2277. https: //proceedings.mlr.press/v238/abroshan24a.html Navigating Privacy ...

  3. [3]

    Adel Abusitta, Esma Aïmeur, and Omar Abdel Wahab. 2019. Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems. arXiv preprint arXiv:1905.09972 (2019). https://arxiv.org/abs/1905.09972

  4. [4]

    Ryan S Baker and Aaron Hawn. 2021. Algorithmic bias in education. International Journal of Artificial Intelligence in Education (2021), 1–41

  5. [5]

    Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2021. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research 50, 1 (2021), 3–44

  6. [6]

    Karan Bhanot. 2023. Synthetic Data Generation and Evaluation for Fairness. Ph.D. dissertation. Rensselaer Polytechnic Institute. https://www. proquest.com/docview/2869461606 ProQuest Document ID: 30570311

  7. [7]

    Aqsa Bhatti and Binil Starly. 2022. Generative Design in Additive Manufacturing: A Review. Machines 4, 2 (2022), 22. https://doi.org/10.3390/ make4020022

  8. [8]

    Borisov, K

    V. Borisov, K. Seßler, T. Leemann, M. Pawelczyk, and G. Kasneci. 2023. LANGUAGE MODELS ARE REALISTIC TABULAR DATA GENERATORS. In The Eleventh International Conference on Learning Representations. https://openreview.net/pdf?id=cEygmQNOeI

  9. [9]

    Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of Machine Learning Research (PMLR) (New York, NY, USA), Sorelle A. Friedler and Christo Wilson (Eds.), Vol. 81. PMLR, 77–91

  10. [10]

    Victoria Cheng et al. 2021. Can You Fake It Until You Make It?: Impacts of Differentially Private Synthetic Data on Downstream Classification Fairness. In Proceedings of the 2021 ACM Conference. ACM. https://doi.org/10.1145/3442188.3445879

  11. [11]

    Paulo Cortez. 2008. Student Performance. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5TG7T

  12. [12]

    Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Se- woong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassil- vitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, and ...

  13. [13]

    F. K. Dankar, M. K. Ibrahim, and L. Ismail. 2022. A Multi-Dimensional Evaluation of Synthetic Data Generators. IEEE Access 10 (2022), 11147–11158. https://doi.org/10.1109/access.2022.3144765

  14. [14]

    Oscar Blessed Deho, Srecko Joksimovic, Jiuyong Li, Chen Zhan, Jixue Liu, and Lin Liu. 2022. Should learning analytics models include sensitive attributes? Explaining the why. IEEE Transactions on Learning Technologies 16, 4 (2022), 560–572

  15. [15]

    Oscar Blessed Deho, Chen Zhan, Jiuyong Li, Jixue Liu, Lin Liu, and Thuc Duy Le. 2022. How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics? British Journal of Educational Technology (2022)

  16. [16]

    Shayan Doroudi. 2024. On the Paradigms of Learning Analytics: Machine Learning Meets Epistemology. Computers and Education: Artificial Intelligence 6 (2024), 100192. https://doi.org/10.1016/j.caeai.2023.100192

  17. [17]

    Hendrik Drachsler and Wolfgang Greller. 2016. Privacy and analytics: it’s a DELICATE issue a checklist for trusted learning analytics. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. Association for Computing Machinery (ACM), 89–98. https: //doi.org/10.1145/2883851.2883893

  18. [18]

    Cynthia Dwork. 2006. Differential privacy. In International colloquium on automata, languages, and programming. Springer, 1–12

  19. [19]

    X. Fang, W. Xu, F. A. Tan, J. Zhang, Z. Hu, Y. Qi, S. Nickleach, D. Socolinsky, S. Sengamedu, and C. Faloutsos. 2024. Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding – A Survey. ArXiv (2024). https://doi.org/10.48550/arxiv.2402.17944

  20. [20]

    Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268

  21. [21]

    Figueira and B

    A. Figueira and B. Vaz. 2022. Survey on Synthetic Data Generation, Evaluation Methods and GANs. Mathematics 10, 15 (2022), 2733. https: //doi.org/10.3390/math10152733

  22. [22]

    Ferdinando Fioretto, Cuong Tran, Pascal Van Hentenryck, and Kan Zhu. 2022. Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey. arXiv preprint arXiv:2202.08187 (2022). https://arxiv.org/pdf/2202.08187

  23. [23]

    Josh Gardner, Christopher Brooks, and Ryan Baker. 2019. Evaluating the fairness of predictive student models through slicing analysis. In Proceedings of the 9th international conference on learning analytics & knowledge. 225–234

  24. [24]

    Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016)

  25. [25]

    Arto Hellas, Petri Ihantola, Andrew Petersen, Vangel V Ajanovski, Mirela Gutica, Timo Hynninen, Antti Knutas, Juho Leinonen, Chris Messom, and Soohyun Nam Liao. 2018. Predicting academic performance: a systematic literature review. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education. 175–199

  26. [26]

    Hernadez, G

    M. Hernadez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin. 2023. Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions. Methods of Information in Medicine 62, Suppl 1 (2023), e19–e38. https://doi.org/10.1055/s-0042-1760247

  27. [27]

    Lan Jiang, Clara Belitz, and Nigel Bosch. 2024. Synthetic Dataset Generation for Fairer Unfairness Research. In Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK’24). https://doi.org/10.1145/3636555.3636868

  28. [28]

    Jamie Jordon, Łukasz Szpruch, François Houssiau, Marco Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel Cohen, and Adrian Weller. 2022. Synthetic Data - what, why and how? (2022). https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/Synthetic_Data_ Survey-24.pdf

  29. [29]

    James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2019. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=S1zk9iRqF7

  30. [30]

    Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems 33, 1 (2012), 1–33

  31. [31]

    Mohammad Khalil, Paul Prinsloo, and Sharon Slade. 2023. Fairness, Trust, Transparency, Equity, and Responsibility in Learning Analytics. Journal of Learning Analytics 10, 1 (2023). https://doi.org/10.18608/jla.2023.7983 16 Liu et al. Manuscript submitted to ACM

  32. [32]

    Khalil, F

    M. Khalil, F. Vadiee, R. Shakya, and Q. Liu. 2025. Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation. In Proceedings of the 15th Learning Analytics and Knowledge Conference (LAK’25)

  33. [33]

    Minjun Kim et al. 2023. Privacy Risks of Machine Learning Models with Unintended Memorization. arXiv preprint arXiv:2302.12580 (2023). https://arxiv.org/abs/2302.12580

  34. [34]

    René F Kizilcec and Hansol Lee. 2022. Algorithmic fairness in education. In The Ethics of Artificial Intelligence in Education. Routledge, 174–202

  35. [35]

    Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. Advances in neural information processing systems 30 (2017)

  36. [36]

    Jakub Kuzilek, Martin Hlosta, and Zdenek Zdrahal. 2017. Open university learning analytics dataset. Scientific data 4, 1 (2017), 1–8

  37. [37]

    Qinyi Liu and Mohammad Khalil. 2023. Understanding privacy and data protection issues in learning analytics: A systematic review. British Journal of Educational Technology 54, 5 (2023). https://doi.org/10.1111/bjet.13388

  38. [38]

    Qinyi Liu and Mohammad Khalil. 2024. Exploring the Generation of Synthetic Educational Tabular Data using LLMs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’24), AI for Education (AI4EDU): Advancing Personalized Education with LLM and Adaptive Learning Workshop. Barcelona. https://www.researchgate.net/public...

  39. [39]

    Q. Liu, M. Khalil, J. Jovanovic, and R. Shakya. 2024. Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning Analytics. In Proceedings of the 14th Learning Analytics and Knowledge Conference. https://doi.org/10.1145/3636555.3636921

  40. [40]

    WeiKang Liu, Yanchun Zhang, Hong Yang, and Qinxue Meng. 2024. A survey on differential privacy for medical data analysis. Annals of Data Science 11, 2 (2024), 733–747

  41. [41]

    Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, and Wenqi Wei. 2023. Machine Learning for Synthetic Data Generation: A Review. arXiv preprint arXiv:2302.04062 (2023). https://arxiv.org/pdf/2302.04062

  42. [42]

    Amalia Luque, Alejandro Carrasco, Alejandro Martín, and Ana de las Heras. 2019. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition 91 (2019), 216–231

  43. [43]

    Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR) 54, 6 (2021), 1–35

  44. [44]

    Emmanouil Panagiotou, Arjun Roy, and Eirini Ntoutsi. 2024. Synthetic Tabular Data Generation for Class Imbalance and Fairness: A Comparative Study. arXiv preprint arXiv:2409.05215v1 (2024). https://arxiv.org/pdf/2409.05215v1

  45. [45]

    Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres, and Rafael de Sousa. 2024. Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data. PLOS ONE 19, 9 (2024). https://doi.org/10. 1371/journal.pone.0297271

  46. [46]

    David Pujol, Amir Gilad, and Ashwin Machanavajjhala. 2024. PreFair: Privately Generating Justifiably Fair Synthetic Data. In Proceedings of the VLDB Endowment (PVLDB), Vol. 16. https://www.vldb.org/pvldb/vol16/p1573-pujol.pdf

  47. [47]

    Zhaozhi Qian, Bogdan-Constantin Cebere, and Mihaela van der Schaar. 2023. Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv preprint arXiv:2301.07573 (2023)

  48. [48]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In NeurIPS 𝐸𝑀𝐶2 Workshop

  49. [49]

    Filippo Sciarrone. 2018. Machine Learning and Learning Analytics: Integrating Data with Learning. IEEE (2018). https://doi.org/10.1109/EDUCON. 2018.8424780

  50. [50]

    Lele Sha, Mladen Rakovic, Alexander Whitelock-Wainwright, David Carroll, Victoria M Yew, Dragan Gasevic, and Guanliang Chen. 2021. Assessing algorithmic fairness in automatic classifiers of educational forum posts. In Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Pa...

  51. [51]

    Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. 2021. DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks. arXiv preprint arXiv:2110.12884 (2021). https://arxiv.org/abs/2110.12884

  52. [52]

    Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In Proceedings of the international workshop on software fairness. 1–7

  53. [53]

    Hilde Weerts, Miroslav Dudík, Richard Edgar, Adrin Jalali, Roman Lutz, and Michael Madaio. 2023. Fairlearn: Assessing and Improving Fairness of AI Systems. , 8 pages. http://jmlr.org/papers/v24/23-0389.html

  54. [54]

    Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998)

  55. [55]

    Lei Xu, Marianna Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular Data using Conditional GAN. arXiv preprint arXiv:1907.00503 (2019). https://arxiv.org/abs/1907.00503

  56. [56]

    Erez Yacobson, Orly Fuhrman, Arnon Hershkovitz, and Giora Alexandron. 2021. De-identification is Insufficient to Protect Student Privacy, or – What Can a Field Trip Reveal? Journal of Learning Analytics 8, 2 (2021), 83–92. https://doi.org/10.18608/jla.2021.7353

  57. [57]

    Jinsung Yoon, Lydia N Drumright, and Mihaela van der Schaar. 2020. Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE Journal of Biomedical and Health Informatics 24, 8 (2020), 2378–2388. https://doi.org/10.1109/JBHI.2020.2980262

  58. [58]

    Renzhe Yu, Qiujie Li, Christian Fischer, Shayan Doroudi, and Di Xu. 2020. Towards accurate and fair prediction of college success: evaluating different sources of student data. In Proceedings of the 13th International Conference on Educational Data Mining (EDM 2020). ERIC, 292–301

  59. [59]

    Chen Zhan, Srećko Joksimović, Djazia Ladjal, Thierry Rakotoarivelo, Ruth Marshall, and Abelardo Pardo. 2024. Preserving Both Privacy and Utility in Learning Analytics. IEEE Transactions on Learning Technologies 17 (2024), 1655 – 1667. https://doi.org/10.1109/TLT.2024.3393766

  60. [60]

    Z. Zhao, A. Kunar, R. Birke, and L. Chen. 2021. CTAB-GAN: Effective Table Data Synthesizing. In Proceedings of Machine Learning Research, Vol. 157. 2021–2021. https://proceedings.mlr.press/v157/zhao21a/zhao21a.pdf