arxiv: 2512.16284 · v2 · submitted 2025-12-18 · 💻 cs.CR

Recognition: no theorem link

Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework

Milton Nicol\'as Plasencia Palacios , Alexander Boudewijn , Sebastiano Saccani , Andrea Filippo Ferraris , Diana Sofronieva , Giuseppe D'Acquisto , Filiberto Brozzetti , Daniele Panfilo

show 1 more author

Luca Bortolussi

Authors on Pith no claims yet

Pith reviewed 2026-05-16 21:40 UTC · model grok-4.3

classification 💻 cs.CR

keywords synthetic dataprivacy metricstabular dataprivacy quantificationempirical evaluationrisk insertionno-box threat modelprivacy enhancing technology

0 comments

The pith

A framework evaluates tabular synthetic data privacy metrics by deliberately inserting known risks and measuring detection under no-box conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an experimental framework that inserts specific, controlled privacy risks into synthetic tabular datasets and then applies existing privacy quantification methods to see if those risks are detected. This matters for practitioners because synthetic data is widely promoted as a privacy-enhancing technology, yet without reliable empirical tests it is hard to know which quantification methods actually work or how to compare them. The authors survey current approaches and related legal theory, then demonstrate the framework on public datasets using no-box threat models where the attacker has no access to the original data. If the framework holds, it gives a repeatable way to benchmark privacy claims rather than relying on theoretical assurances alone.

Core claim

The authors claim that deliberately inserting specific privacy risks into synthetic tabular data, then testing whether quantification methods detect them, provides an empirical way to assess the efficacy of those methods under no-box threat models on publicly available datasets.

What carries the argument

Controlled deliberate risk insertion into synthetic tabular data to test privacy quantification methods.

If this is right

Privacy metrics can be ranked by their ability to detect deliberately inserted risks across multiple datasets.
The framework supplies a repeatable benchmark for comparing synthetic data generators on privacy grounds.
Legal compliance arguments for synthetic data can be grounded in empirical detection rates rather than theoretical properties alone.
No-box threat models become testable rather than assumed, allowing direct measurement of residual identification risk.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same insertion approach could be adapted to test privacy metrics on non-tabular data such as time series or graphs.
If insertion patterns match known real-world breach vectors, regulators could adopt the framework as a certification test for privacy-enhancing technologies.
Extending the method to measure utility loss alongside risk detection would reveal trade-offs that current single-metric evaluations miss.

Load-bearing premise

Deliberately inserting specific risks into synthetic data accurately models real-world privacy threats under no-box conditions.

What would settle it

Run the framework on a dataset where the inserted risks are known to be undetectable by any metric or where real-world attacks succeed despite the metrics reporting low risk; mismatch between inserted risks and metric scores would falsify the framework's validity.

Figures

Figures reproduced from arXiv: 2512.16284 by Alexander Boudewijn, Andrea Filippo Ferraris, Daniele Panfilo, Diana Sofronieva, Filiberto Brozzetti, Giuseppe D'Acquisto, Luca Bortolussi, Milton Nicol\'as Plasencia Palacios, Sebastiano Saccani.

**Figure 2.** Figure 2: Risk assessment methods evaluated using the leaky risk model [PITH_FULL_IMAGE:figures/full_fig_p023_2.png] view at source ↗

**Figure 3.** Figure 3: Risk assessment methods evaluated using the overfit risk model - RTF [61] [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Risk assessment methods evaluated using the overfit risk model - Synthpop [62] [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Risk assessment methods evaluated using the DP risk model - PATEGAN [39] [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Risk assessment methods evaluated using the DP risk model - AIM [63] [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: correlation matrices risk assessment methods using leaky risk model (from left to right: Adult, Texas, Census [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: correlation matrices risk assessment methods using overfitting risk model - RTF(from left to right: Adult, [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: correlation matrices risk assessment methods using overfitting risk model - Synthpop(from left to right: Adult, [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: correlation matrices risk assessment methods using DP risk model - AIM (from left to right: Adult, Texas, [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: correlation matrices risk assessment methods using DP risk model - PATEGAN (from left to right: Adult, [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Results with k-NN-based privacy metric for various k using the leaky risk model [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

**Figure 13.** Figure 13: Results with k-NN-based privacy metric for various k using the overfit risk model - RTF [61] [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Results with k-NN-based privacy metric for various k using the overfit risk model - Synthpop [62] C Experiments with outlier removal We used the local outlier factor (LOF) to embeddings obtained through contrastive learning to identify outliers [75]. Different proportions of outliers were removed to evaluate their impact on the measurements. We used the overfitting risk model, with overfit ratios of 1.0 a… view at source ↗

**Figure 15.** Figure 15: Results with k-NN-based privacy metric for various k using the DP risk model - PATEGAN [39] [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Results with k-NN-based privacy metric for various k using the DP risk model - AIM [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

**Figure 17.** Figure 17: IMS, DCR, and MIA with outlier removal in the original dataset prior to generator training, with both no [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗

**Figure 18.** Figure 18: GTCAP and ML Inference with outlier removal in the original dataset prior to generator training, with both [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗

**Figure 19.** Figure 19: Anonymeter’s methods with outlier removal in the original dataset prior to generator training, with both no [PITH_FULL_IMAGE:figures/full_fig_p029_19.png] view at source ↗

**Figure 20.** Figure 20: IMS, DCR, and MIA with outlier removal in the original dataset prior to generator training, with both no [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗

**Figure 21.** Figure 21: GTCAP and ML Inference with outlier removal in the original dataset prior to generator training, with both [PITH_FULL_IMAGE:figures/full_fig_p029_21.png] view at source ↗

**Figure 22.** Figure 22: Anonymeter’s methods with outlier removal in the original dataset prior to generator training, with both no [PITH_FULL_IMAGE:figures/full_fig_p029_22.png] view at source ↗

**Figure 23.** Figure 23: Comparison between Canary Record Baseline and Risk on the training set using the leaky risk model [PITH_FULL_IMAGE:figures/full_fig_p030_23.png] view at source ↗

**Figure 24.** Figure 24: Results with GTCAP privacy metric for various radiuses using the overfit risk model [PITH_FULL_IMAGE:figures/full_fig_p030_24.png] view at source ↗

**Figure 25.** Figure 25: MLE utility scores of synthetic datasets per dataset and generator (utility of training set for reference) [PITH_FULL_IMAGE:figures/full_fig_p031_25.png] view at source ↗

read the original abstract

Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the related legal theory. We then apply the framework to the main privacy quantification methods with no-box threat models on publicly available datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper outlines a controlled risk insertion framework for evaluating synthetic data privacy metrics but provides no empirical results to support it.

read the letter

This paper puts forward a framework for assessing how well privacy metrics work on synthetic tabular data by deliberately inserting specific risks, such as membership or attribute disclosure, and then checking if the metrics flag them under no-box conditions. The idea is to create a more reliable benchmark than what currently exists for these evaluations. It does a solid job reviewing the landscape of privacy quantification methods for synthetic data and connecting them to relevant legal theory. The authors then sketch how their framework would apply to the main no-box methods using publicly available datasets. The controlled insertion provides a practical way to generate test cases with known ground truth, which is a step forward from purely theoretical discussions. The main limitation is the absence of any actual results. The abstract and description stop at proposing and surveying without showing metric performances, implementation details for the insertion, or analysis of potential biases in the process. This makes it difficult to evaluate whether the framework delivers on its promise. The stress-test concern about insertion not matching real no-box scenarios holds weight here. Placing risks requires knowledge of the data or generator, which could introduce signals that a genuine no-box attacker wouldn't have access to, potentially inflating or deflating the metrics artificially. The paper needs to address how the insertion avoids creating such artifacts. This is relevant for researchers and practitioners focused on privacy-enhancing technologies, particularly those dealing with tabular data in compliance-heavy fields. Readers looking for new empirical methods to standardize privacy assessments would get value from the setup. I think it should go to peer review. The core idea is sound enough to warrant feedback on how to implement and validate the experiments properly, even if the current version is more of a proposal than a completed study.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an empirical framework for assessing the efficacy of tabular synthetic data privacy quantification methods by using controlled, deliberate risk insertion to create ground-truth labels for benchmarking under no-box threat models. It surveys existing privacy quantification approaches and legal theory, then applies the framework to main methods on public datasets.

Significance. If the framework is shown to be valid, it would be significant for the field by providing a practical method to empirically evaluate and compare privacy metrics for synthetic data, helping to address the elusive nature of privacy assessment in privacy-enhancing technologies.

major comments (2)

[Proposed framework] The framework's reliance on deliberate risk insertion to simulate no-box attacker capabilities is a load-bearing assumption that lacks justification. Inserting risks requires knowledge of original records or generator internals, which is unavailable in a true no-box setting where only the synthetic table is provided; this may cause the inserted signals to be either undetectable by realistic attacks or detectable only due to procedural artifacts, undermining the ground-truth mapping.
[Abstract] Although the abstract states that the framework is applied to main privacy quantification methods, no concrete results, validation steps, or error analysis are provided, leaving the central empirical claim without demonstrated support.

minor comments (1)

[Survey of approaches] The survey of existing approaches to synthetic data privacy quantification and legal theory would benefit from clearer integration with how they inform the design of the proposed framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, clarifying the framework design and manuscript content while committing to targeted revisions for improved justification and clarity.

read point-by-point responses

Referee: [Proposed framework] The framework's reliance on deliberate risk insertion to simulate no-box attacker capabilities is a load-bearing assumption that lacks justification. Inserting risks requires knowledge of original records or generator internals, which is unavailable in a true no-box setting where only the synthetic table is provided; this may cause the inserted signals to be either undetectable by realistic attacks or detectable only due to procedural artifacts, undermining the ground-truth mapping.

Authors: The framework deliberately separates the experimental setup (controlled risk insertion by the evaluator to establish ground-truth labels) from the threat model under which the metrics are tested (strictly no-box, with metrics receiving only the synthetic table). This is analogous to standard benchmarking practices in privacy research, such as membership inference evaluations that use known labels for ground truth while testing black-box attacks. We acknowledge that the manuscript would benefit from explicit discussion of this distinction, potential procedural artifacts, and limitations in simulating realistic no-box conditions. We will revise the methodology section to provide this justification and analysis. revision: yes
Referee: [Abstract] Although the abstract states that the framework is applied to main privacy quantification methods, no concrete results, validation steps, or error analysis are provided, leaving the central empirical claim without demonstrated support.

Authors: The abstract is intentionally concise and summarizes the application at a high level. Concrete results—including quantitative comparisons of privacy metrics, validation across public datasets, and error analysis—are detailed in the experimental sections of the full manuscript. To improve accessibility, we will revise the abstract to include a brief mention of key empirical outcomes and validation approach. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework applies existing methods without self-referential reduction

full rationale

The paper proposes an empirical evaluation framework that inserts controlled privacy risks into synthetic tabular data and then applies existing no-box privacy quantification methods to public datasets. No equations, derivations, fitted parameters, or uniqueness theorems are present. The central claim rests on the methodological design of risk insertion and benchmarking rather than any reduction to self-citations, self-definitions, or renamed known results. The approach is self-contained as an application of prior independent quantification techniques, with no load-bearing step that collapses to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about threat models and the validity of risk insertion as a proxy for privacy evaluation.

axioms (2)

domain assumption No-box threat models are appropriate for assessing synthetic data privacy risks
The paper explicitly applies the framework under no-box threat models.
domain assumption Deliberate insertion of known risks can serve as a valid ground truth for metric evaluation
Central to the proposed empirical assessment method.

pith-pipeline@v0.9.0 · 5452 in / 1092 out tokens · 27838 ms · 2026-05-16T21:40:26.121341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 1 internal anchor

[1]

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.npj Digital Medicine, 6, 10 2023

Mauro Giuffrè and Dennis Shung. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.npj Digital Medicine, 6, 10 2023

work page 2023
[2]

Generation and evaluation of synthetic patient data.BMC Medical Research Methodology, 2020

Andre Goncalves, Priyadip Ray, Braden Soper, Jennifer Stevens, Linda Coyle, and Anna Paula Slaes. Generation and evaluation of synthetic patient data.BMC Medical Research Methodology, 2020

work page 2020
[3]

Vamsi K. Potluru, Daniel Borrajo, Andrea Coletta, Niccolò Dalmasso, Yousef El-Laham, Elizabeth Fons, Mohsen Ghassemi, Sriram Gopalakrishnan, Vikesh Gosai, Eleonora Kreaˇci´c, Ganapathy Mani, Saheed Obitayo, Deepak Paramanand, Natraj Raman, Mikhail Solonin, Srijan Sood, Svitlana Vyetrenko, Haibei Zhu, Manuela Veloso, and Tucker Balch. Synthetic data applic...

work page 2024
[4]

Gal and Orla Lynskey

Michal S. Gal and Orla Lynskey. Synthetic data: Legal implications of the data-generation revolution.Iowa Law Review, 109(1087):1087–1154, 2024

work page 2024
[5]

Synthetic data: A look back and a look forward.Trans

Jerome P Reiter. Synthetic data: A look back and a look forward.Trans. Data Priv., 16(1):15–24, 2023

work page 2023
[6]

Synthetic data protection: Towards a paradigm change in data regulation?Big Data & Society, 11(1):20539517241231277, 2024

Ana Beduschi. Synthetic data protection: Towards a paradigm change in data regulation?Big Data & Society, 11(1):20539517241231277, 2024

work page 2024
[7]

Bellovin, K

Steven M. Bellovin, K. Dutta, Preetam, and N. Reitinger. Privacy and synthetic datasets.Stanford Technology Law Review, 2018

work page 2018
[8]

Practical privacy metrics for synthetic data, 2024

Gillian M Raab, Beata Nowok, and Chris Dibben. Practical privacy metrics for synthetic data, 2024

work page 2024
[9]

Ferraris, Daniele Panfilo, Vanessa Cocca, Sabrina Zinutti, Karel Schepper, and Carlo Chauvenet

Alexander Boudewijn, Andrea F. Ferraris, Daniele Panfilo, Vanessa Cocca, Sabrina Zinutti, Karel Schepper, and Carlo Chauvenet. Privacy measurement in tabular synthetic data: State of the art and future research directions. 10 2023. 19 APREPRINT- DECEMBER19, 2025

work page 2023
[10]

A unified framework for quantifying privacy risk in synthetic data, 2022

Matteo Giomi, Franziska Boenisch, Christoph Wehmeyer, and Borbála Tasnádi. A unified framework for quantifying privacy risk in synthetic data, 2022

work page 2022
[11]

Gonzales, G

A. Gonzales, G. Guruswamy, and S. R. Smith. Synthetic data in health care: A narrative review.PLOS Digital Health, 2(1):e0000082, 2023

work page 2023
[12]

Giuffrè and D

M. Giuffrè and D. L. Shung. Harnessing the power of synthetic data in healthcare: Innovation, application, and privacy.NPJ Digital Medicine, 6(1):186, 2023

work page 2023
[13]

Cohen, and Adrian Weller

James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N. Cohen, and Adrian Weller. Synthetic data - what, why and how? 2022

work page 2022
[14]

Hradec, M

J. Hradec, M. Craglia, M. Di Leo, S. De Nigris, N. Ostlaender, and N. Nicholson.Multipurpose synthetic population for policy applications. JRC technical report, 2022

work page 2022
[15]

Patient-centric synthetic data generation, no reason to risk reidentification in biomedical data analysis.NPJ Digital Medicine, (6(37)), 2023

Morgan Guillaudeux, Olicia Rousseau, Julien Petot, Zineb Bennis, Charles-Axel Dein, Thomas Goronflot, Nicolas Vince, Sophie Limou, Matilde Karakachoff, Matthieu Wargny, and Pierre-Antoine Gourraud. Patient-centric synthetic data generation, no reason to risk reidentification in biomedical data analysis.NPJ Digital Medicine, (6(37)), 2023

work page 2023
[16]

Buckeridge, and Khaled El Emam

Jean-Francois Rajotte, Robert Bergen, David L. Buckeridge, and Khaled El Emam. Synthetic data as an enabler for machine learning applications in medicine.iScience, 25(11):105331, 2022

work page 2022
[17]

Boudewijn and A

A. Boudewijn and A. F. Ferraris. Legal and regulatory perspectives on synthetic data as an anonymization strategy. Journal of Personal Data Protection Law, 1:17–31, 2024

work page 2024
[18]

Cohen, and Adrian Weller

James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N. Cohen, and Adrian Weller. Synthetic data – what, why and how?, 2022

work page 2022
[19]

Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15), 2022

Alvaro Figueira and Bruno Vaz. Survey on synthetic data generation, evaluation methods and gans.Mathematics, 10(15), 2022

work page 2022
[20]

A survey on generative adversarial networks: Variants, applications, and training, 2020

Abdul Jabbar, Xi Li, and Bourahla Omar. A survey on generative adversarial networks: Variants, applications, and training, 2020. [21]Opinion 4/2007 on the concept of personal data, 2007

work page 2020
[21]

Opinion 05/2014 on anonymisation techniques, 2014

Article 29 Working Party on the Protection of Individuals with regard to the Processing of Personal Data. Opinion 05/2014 on anonymisation techniques, 2014

work page 2014
[22]

López and Abdullah Elbi

Cesar A.F. López and Abdullah Elbi. On the legal nature of synthetic data.Proceedings of the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[23]

Advancing microdata privacy protection: A review of synthetic data, 2023

Jingchen Hu and Claire McKay Bowen. Advancing microdata privacy protection: A review of synthetic data, 2023

work page 2023
[24]

Failure modes in machine learning systems, 2019

Ram Shankar Siva Kumar, David O Brien, Kendra Albert, Salomé Viljöen, and Jeffrey Snover. Failure modes in machine learning systems, 2019

work page 2019
[25]

Synthetic data – anonymisation groundhog day, 2022

Theresa Stadler, Bristena Oprisanu, and Carmela Troncoso. Synthetic data – anonymisation groundhog day, 2022

work page 2022
[26]

Lisa Pilgram, Fida K. Dankar, Jorg Drechsler, Mark Elliot, Josep Domingo-Ferrer, Paul Francis, Murat Kantar- cioglu, Linglong Kong, Bradley Malin, Krishnamurty Muralidhar, Puja Myles, Fabian Prasser, Jean Louis Raisaro, Chao Yan, and Khaled El Emam. A consensus privacy metrics framework for synthetic data, 2025

work page 2025
[27]

Can we trust synthetic data in medicine? a scoping review of privacy and utility metrics, 11 2023

Bayrem Kaabachi, Jérémie Despraz, Thierry Meurers, Karen Otte, Mehmed Halilovic, Fabian Prasser, and Jean Louis Raisaro. Can we trust synthetic data in medicine? a scoping review of privacy and utility metrics, 11 2023

work page 2023
[28]

Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review.IEEE Access, 2024

Pablo A Osorio-Marulanda, Gorka Epelde, Mikel Hernandez, Imanol Isasa, Nicolas Moreno Reyes, and An- doni Beristain Iraola. Privacy mechanisms and evaluation metrics for synthetic data generation: A systematic review.IEEE Access, 2024

work page 2024
[29]

Standardised metrics and methods for synthetic tabular data evaluation, 09 2021

Mikel Hernandez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, and Debbie Rankin. Standardised metrics and methods for synthetic tabular data evaluation, 09 2021

work page 2021
[30]

A multifaceted benchmarking of synthetic electronic health record generation models.Nature Communications, 13, 12 2022

Chao Yan, Yao Yan, Zhiyu Wan, Ziqi Zhang, Larsson Omberg, Justin Guinney, Sean Mooney, and Bradley Malin. A multifaceted benchmarking of synthetic electronic health record generation models.Nature Communications, 13, 12 2022

work page 2022
[31]

Sure: A new privacy and utility assessment library for synthetic data

Dario Brunelli, Shalini Kurapati, and Luca Gilli. Sure: A new privacy and utility assessment library for synthetic data. In2024 IEEE International Conference on Blockchain (Blockchain), pages 643–648, 2024. 20 APREPRINT- DECEMBER19, 2025

work page 2024
[32]

Membership inference attacks against synthetic data through overfitting detection, 2023

Boris van Breugel, Hao Sun, Zhaozhi Qian, and Mihaela van der Schaar. Membership inference attacks against synthetic data through overfitting detection, 2023

work page 2023
[33]

PhD thesis, 2024

Steven Golob.Privacy Vulnerabilities in Marginals-based Synthetic Data. PhD thesis, 2024

work page 2024
[34]

Achilles’ heels: Vulnerable record identification in synthetic data publishing, 2023

Matthieu Meeus, Florent Guepin, Ana-Maria Cretu, and Yves-Alexandre de Montjoye. Achilles’ heels: Vulnerable record identification in synthetic data publishing, 2023

work page 2023
[35]

The algorithmic foundations of differential privacy.Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014

Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy.Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014

work page 2014
[36]

Differentially private synthetic data: Applied evaluations and enhancements, 2020

Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj Desai, and Joshua Allen. Differentially private synthetic data: Applied evaluations and enhancements, 2020

work page 2020
[37]

Differentially private generative adversarial network, 2018

Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. Differentially private generative adversarial network, 2018

work page 2018
[38]

PATE-GAN: Generating synthetic data with differential privacy guarantees

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic data with differential privacy guarantees. InInternational Conference on Learning Representations, 2019

work page 2019
[39]

Geyer, Tassilo Klein, and Moin Nabi

Robin C. Geyer, Tassilo Klein, and Moin Nabi. Differentially private federated learning: A client level perspective, 2018

work page 2018
[40]

SDNist v2: Deidentified Data Report Tool, March 2023

Christine Task, Karan Bhagat, and Gary Howarth. SDNist v2: Deidentified Data Report Tool, March 2023

work page 2023
[41]

truly anonymous synthetic data

Georgi Ganev and Emiliano De Cristofaro. On the inadequacy of similarity-based privacy metrics: Reconstruction attacks against" truly anonymous synthetic data”.arXiv preprint arXiv:2312.05114, 2023

work page arXiv 2023
[42]

Gillian M. Raab. Utility and disclosure risk for differentially private synthetic categorical data. In Josep Domingo- Ferrer and Maryline Laurent, editors,Privacy in Statistical Databases, pages 250–265, Cham, 2022. Springer International Publishing

work page 2022
[43]

Smooth-gan: Towards sharp and smooth synthetic ehr data generation

Sina Rashidian, Fusheng Wang, Richard Moffitt, Victor Garcia, Anurag Dutt, Wei Chang, Vishwam Pandya, Janos Hajagos, Mary Saltz, and Joel Saltz. Smooth-gan: Towards sharp and smooth synthetic ehr data generation. In Martin Michalowski and Robert Moskovitch, editors,Artificial Intelligence in Medicine, pages 37–48, Cham,

work page
[44]

Springer International Publishing

work page
[45]

Hyland, and Gunnar Rätsch

Cristóbal Esteban, Stephanie L. Hyland, and Gunnar Rätsch. Real-valued (medical) time series generation with recurrent conditional gans, 2017

work page 2017
[46]

A deep learning-based pipeline for the generation of synthetic tabular data

Daniele Panfilo, Alexander Boudewijn, Sebastiano Saccani, Andrea Coser, Borut Svara, Carlo Rossi Chauvenet, Ciro Antonio Mami, and Eric Medvet. A deep learning-based pipeline for the generation of synthetic tabular data. IEEE Access, pages 1–1, 2023

work page 2023
[47]

Holdout-based empirical assessment of mixed-type synthetic data

Michael Platzer and Thomas Reutterer. Holdout-based empirical assessment of mixed-type synthetic data. Frontiers in Big Data, 2021

work page 2021
[48]

Nfdi4health workflow and service for synthetic data generation, assessment and risk management, 2024

Sobhan Moazemi, Tim Adams, Hwei Geok NG, Lisa Kühnel, Julian Schneider, Anatol-Fiete Näher, Juliane Fluck, and Holger Fröhlich. Nfdi4health workflow and service for synthetic data generation, assessment and risk management, 2024

work page 2024
[49]

Stewart, and Jimeng Sun

Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. Generating multi-label discrete patient records using generative adversarial networks, 2018

work page 2018
[50]

M. Naldi G. D’Acquisto.Big Data e Privacy by design. Anonimizzazione, pseudonimizzazione, sicurezza. Giappichelli, 2017

work page 2017
[51]

Logan: Membership inference attacks against generative models, 2018

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. Logan: Membership inference attacks against generative models, 2018

work page 2018
[52]

Gan-leaks: A taxonomy of membership inference attacks against generative models

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. Gan-leaks: A taxonomy of membership inference attacks against generative models. InProceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 343–362, 2020

work page 2020
[53]

Monte carlo and reconstruction membership inference attacks against generative models.Proceedings on Privacy Enhancing Technologies, 2019

Benjamin Hilprecht, Martin Härterich, and Daniel Bernau. Monte carlo and reconstruction membership inference attacks against generative models.Proceedings on Privacy Enhancing Technologies, 2019

work page 2019
[54]

Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch

Florimond Houssiau, James Jordon, Samuel N. Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch. Tapas: a toolbox for adversarial privacy auditing of synthetic data, 2022

work page 2022
[55]

Statistical properties and privacy guarantees of an original distance-based fully synthetic data generation method, 2023

Rémy Chapelle and Bruno Falissard. Statistical properties and privacy guarantees of an original distance-based fully synthetic data generation method, 2023. 21 APREPRINT- DECEMBER19, 2025

work page 2023
[56]

Creating the best risk-utility profile : The synthetic data challenge, 2015

Mark Elliot. Creating the best risk-utility profile : The synthetic data challenge, 2015

work page 2015
[57]

The trade-off between information utility and disclosure risk in a ga synthetic data generator.Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 2019

Yingrui Chen, Jennifer Taub, and Mark Elliot. The trade-off between information utility and disclosure risk in a ga synthetic data generator.Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 2019

work page 2019
[58]

The privacy onion effect: Memorization is relative, 2022

Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, and Florian Tramer. The privacy onion effect: Memorization is relative, 2022

work page 2022
[59]

Membership inference attacks against gans by leveraging over-representation regions

Hailong Hu and Jun Pang. Membership inference attacks against gans by leveraging over-representation regions. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, CCS ’21, page 2387–2389, New York, NY , USA, 2021. Association for Computing Machinery

work page 2021
[60]

A linear reconstruction approach for attribute inference attacks against synthetic data, 2024

Meenatchi Sundaram Muthu Selva Annamalai, Andrea Gadotti, and Luc Rocher. A linear reconstruction approach for attribute inference attacks against synthetic data, 2024

work page 2024
[61]

Solatorio and Olivier Dupriez

Aivin V . Solatorio and Olivier Dupriez. Realtabformer: Generating realistic relational and tabular data using transformers, 2023

work page 2023
[62]

Raab, and Chris Dibben

Beata Nowok, Gillian M. Raab, and Chris Dibben. Synthpop: Bespoke creation of synthetic data in r.Journal of Statistical Software, 74(11):1–26, 2016

work page 2016
[63]

The smartnoise system: Aim, 2024

Open Differential Privacy (OpenDP). The smartnoise system: Aim, 2024. https://github.com/opendp/ smartnoise-sdk

work page 2024
[64]

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Nicolas Papernot, Martín Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-supervised knowledge transfer for deep learning from private training data.arXiv preprint arXiv:1610.05755, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[65]

Sure: A new privacy and utility assessment library for synthetic data

Dario Brunelli, Shalini Kurapati, and Luca Gilli. Sure: A new privacy and utility assessment library for synthetic data. In2024 IEEE International Conference on Blockchain (Blockchain), pages 643–648, 2024

work page 2024
[66]

Synthcity: facilitating innovative use cases of synthetic data in different data modalities, 2023

Zhaozhi Qian, Bogdan-Constantin Cebere, and Mihaela van der Schaar. Synthcity: facilitating innovative use cases of synthetic data in different data modalities, 2023

work page 2023
[67]

Jennifer Taub, Mark James Elliot, and Gillian M. Raab. Creating the best risk-utility profile : The synthetic data challenge. 2019

work page 2019
[68]

Block neural autoregressive flow

Nicola De Cao, Wilker Aziz, and Ivan Titov. Block neural autoregressive flow. InUncertainty in artificial intelligence, pages 1263–1273. PMLR, 2020

work page 2020
[69]

Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20

work page doi:10.24432/c5xw20 1996
[70]

Texas hospital discharge data pub- lic use data., 2005

Center for Health Statistics Texas Department of State Health Services. Texas hospital discharge data pub- lic use data., 2005. https://www.dshs.texas.gov/texas-health-care-information-collection/ general-public-information/hospital-discharge-data-public

work page 2005
[71]

Ipums usa: Version 8.0 extract of 1940 census for u.s

Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek. Ipums usa: Version 8.0 extract of 1940 census for u.s. census bureau disclosure avoidance research. minneapolis, mn: Ipums, 2008. DOI: https://doi.org/10.18128/D010.V8.0.EXT1940USCB

work page doi:10.18128/d010.v8.0.ext1940uscb 1940
[72]

The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

work page 2019
[73]

The dcr delusion: Measuring the privacy risk of synthetic data

Zexi Yao, Nataša Krˇco, Georgi Ganev, and Yves-Alexandre de Montjoye. The dcr delusion: Measuring the privacy risk of synthetic data. InEuropean Symposium on Research in Computer Security, pages 469–487. Springer, 2025

work page 2025
[74]

From isolation to identification

Giuseppe D’Acquisto, Aloni Cohen, Maurizio Naldi, and Kobbi Nissim. From isolation to identification. In Privacy in Statistical Databases: International Conference, PSD 2024, Antibes Juan-Les-Pins, France, September 25–27, 2024, Proceedings, page 3–17, Berlin, Heidelberg, 2024. Springer-Verlag

work page 2024
[75]

Texas”) [ 70], and the 1940 Census full enumeration from IPUMS USA (“Census

Milton Nicolás Plasencia Palacios, Sebastiano Saccani, Gabriele Sgroi, Alexander Boudewijn, and Luca Bortolussi. Contrastive learning-based privacy metrics in tabular synthetic datasets, 2025. 22 APREPRINT- DECEMBER19, 2025 A Complete experimental results Figure 2: Risk assessment methods evaluated using the leaky risk model Figure 3: Risk assessment meth...

work page 2025