LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
-C., van der Schaar, M.: SynthCity: facilitating innovative use cases of synthetic data in different data modalities
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
GenTS is a modular benchmark library providing unified data pipelines, generative models, and evaluation metrics for time series synthesis, forecasting, and imputation, with open-source code and initial benchmarking experiments.
Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.
citing papers explorer
-
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
-
GenTS: A Comprehensive Benchmark Library for Generative Time Series Models
GenTS is a modular benchmark library providing unified data pipelines, generative models, and evaluation metrics for time series synthesis, forecasting, and imputation, with open-source code and initial benchmarking experiments.
-
Quality Degradation Attack in Synthetic Data
Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.
-
Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.
-
Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.