Recognition: unknown
Data Synthesis based on Generative Adversarial Networks
read the original abstract
Privacy is an important concern for our society where sharing data with partners or releasing data to the public is a frequent occurrence. Some of the techniques that are being used to achieve privacy are to remove identifiers, alter quasi-identifiers, and perturb values. Unfortunately, these approaches suffer from two limitations. First, it has been shown that private information can still be leaked if attackers possess some background knowledge or other information sources. Second, they do not take into account the adverse impact these methods will have on the utility of the released data. In this paper, we propose a method that meets both requirements. Our method, called table-GAN, uses generative adversarial networks (GANs) to synthesize fake tables that are statistically similar to the original table yet do not incur information leakage. We show that the machine learning models trained using our synthetic tables exhibit performance that is similar to that of models trained using the original table for unknown testing cases. We call this property model compatibility. We believe that anonymization/perturbation/synthesis methods without model compatibility are of little value. We used four real-world datasets from four different domains for our experiments and conducted in-depth comparisons with state-of-the-art anonymization, perturbation, and generation techniques. Throughout our experiments, only our method consistently shows a balance between privacy level and model compatibility.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
-
Generative Augmentation of Imbalanced Flight Records for Flight Diversion Prediction: A Multi-objective Optimisation Framework
Hyperparameter-optimized generative models augment scarce flight diversion records and substantially improve prediction accuracy over real data alone.
-
Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network
Fully connected neural network with randomized loss synthesizes real-world tabular data distributions from Gaussian noise faster than state-of-the-art deep generative models.
-
Synthetic Flight Data Generation Using Generative Models
Synthetic flight data generated by TVAE and Gaussian Copula models supports flight delay prediction models with accuracy comparable to real data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.