ReMIA offers a practical privacy metric for synthetic data by training two generators and using a classifier to detect source dataset membership, achieving sensitivity comparable to standard MIAs with far less computation.
-C., van der Schaar, M.: SynthCity: facilitating innovative use cases of synthetic data in different data modalities
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
GenTS is a modular benchmark library providing unified data pipelines, generative models, and evaluation metrics for time series synthesis, forecasting, and imputation, with open-source code and initial benchmarking experiments.
Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.
citing papers explorer
-
ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators
ReMIA offers a practical privacy metric for synthetic data by training two generators and using a classifier to detect source dataset membership, achieving sensitivity comparable to standard MIAs with far less computation.
-
GenTS: A Comprehensive Benchmark Library for Generative Time Series Models
GenTS is a modular benchmark library providing unified data pipelines, generative models, and evaluation metrics for time series synthesis, forecasting, and imputation, with open-source code and initial benchmarking experiments.
-
Quality Degradation Attack in Synthetic Data
Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.
-
Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.
-
Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.