When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
Pith reviewed 2026-05-16 23:44 UTC · model grok-4.3
The pith
LLM tabular data generators leak training records through memorized numeric digit strings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Popular LLM adaptations for tabular data generation memorize and reproduce string sequences of numeric digits drawn from training observations. This memorization allows a simple attack with access solely to the synthetic outputs to infer training-set membership by matching those digit strings, exposing privacy leakage that can reach perfect accuracy on certain models and datasets.
What carries the argument
LevAtt, a no-box membership inference attack that targets memorized string sequences of numeric digits in synthetic observations to classify training-set membership.
If this is right
- Both fine-tuning and in-context prompting regimes for LLM tabular generation exhibit the leakage.
- The attack requires no model weights or training data access, only the synthetic outputs.
- A digit-perturbation sampling strategy during generation defeats the attack while keeping fidelity and utility losses small.
- The vulnerability applies across a wide range of models and tabular datasets.
Where Pith is reading between the lines
- The same digit-string leakage may appear in other structured generative tasks that output numeric fields.
- Synthetic data pipelines for privacy-sensitive domains may need routine checks for digit memorization before release.
- Future generators could incorporate explicit anti-memorization steps for numeric sequences without major redesign.
Load-bearing premise
The appearance of particular numeric digit strings in generated records reliably indicates that those records were in the training set rather than arising from model generalization or coincidental patterns.
What would settle it
Run the LevAtt attack on synthetic data produced from a training set whose numeric digit strings have been deliberately randomized or replaced with non-memorized alternatives; if attack accuracy remains high, the claim that digit strings indicate membership would be falsified.
Figures
read the original abstract
Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adapting LLMs to tabular data generation: (i) fine-tuning smaller models directly on tabular datasets, and (ii) prompting larger models with examples provided in context. In this work, we show that popular implementations from both regimes exhibit a tendency to compromise privacy by reproducing memorized patterns of numeric digits from their training data. To systematically analyze this risk, we introduce a simple No-box Membership Inference Attack (MIA) called LevAtt that assumes adversarial access to only the generated synthetic data and targets the string sequences of numeric digits in synthetic observations. Using this approach, our attack exposes substantial privacy leakage across a wide range of models and datasets, and in some cases, is even a perfect membership classifier on state-of-the-art models. Our findings highlight a unique privacy vulnerability of LLM-based synthetic data generation and the need for effective defenses. To this end, we propose two methods, including a novel sampling strategy that strategically perturbs digits during generation. Our evaluation demonstrates that this approach can defeat these attacks with minimal loss of fidelity and utility of the synthetic data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLM-based tabular data generators (both fine-tuned small models and prompted large models) leak privacy by reproducing exact numeric digit sequences from their training data. It introduces LevAtt, a simple no-box membership inference attack that flags synthetic rows containing such sequences as training-set members, reports substantial leakage (including perfect classification on some SOTA models) across multiple models and datasets, and proposes two defenses, one of which is a novel digit-perturbation sampling strategy that preserves fidelity.
Significance. If the empirical results hold after the requested controls, the work identifies a concrete and previously under-examined privacy vector in the rapidly adopted setting of LLM tabular synthesis. The no-box threat model and the demonstration that a trivial string-matching rule can serve as a near-perfect classifier on some models are noteworthy; the proposed perturbation defense is a practical contribution that could be adopted quickly.
major comments (3)
- [§4 and §5] §4 (Attack Evaluation) and §5 (Results): the claim of perfect or near-perfect classification on SOTA models is not accompanied by per-column entropy statistics, train/test digit-sequence overlap rates, or false-positive rates measured on held-out non-member records. Without these quantities it is impossible to rule out that the observed leakage is inflated by low-entropy numeric fields whose n-grams occur with non-negligible base rate under the learned marginal distribution.
- [§3.2] §3.2 (LevAtt Definition): the attack treats exact reproduction of any numeric digit string as a membership signal. The manuscript should report an ablation that varies the minimum string length and the column-selection criterion (e.g., only columns whose empirical entropy exceeds a threshold) to demonstrate that the reported AUCs are not artifacts of including trivially predictable fields such as IDs or ages.
- [§6] §6 (Defense Evaluation): the fidelity/utility numbers for the proposed digit-perturbation sampler are given only in aggregate. A per-column breakdown (or at least for the columns that drove the original attack success) is needed to confirm that the defense does not simply trade one form of leakage for another (e.g., by increasing variance in high-entropy columns).
minor comments (2)
- [Tables/Figures] Table 1 and Figure 2 captions should explicitly state the number of runs and whether error bars represent standard deviation or standard error.
- [§3] The notation for the membership label and the LevAtt decision rule should be introduced once in §3 and used consistently thereafter; currently the same symbol appears with slightly different meanings in the attack pseudocode and the experimental tables.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The suggested additions of entropy statistics, ablations, and per-column breakdowns will improve the clarity and robustness of our results. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§4 and §5] §4 (Attack Evaluation) and §5 (Results): the claim of perfect or near-perfect classification on SOTA models is not accompanied by per-column entropy statistics, train/test digit-sequence overlap rates, or false-positive rates measured on held-out non-member records. Without these quantities it is impossible to rule out that the observed leakage is inflated by low-entropy numeric fields whose n-grams occur with non-negligible base rate under the learned marginal distribution.
Authors: We agree that these additional statistics are important to rule out confounding factors. In the revision we will add per-column entropy statistics for all numeric fields, train/test digit-sequence overlap rates, and false-positive rates computed on held-out non-member records. These will be reported in the updated §4 and §5 to demonstrate that the leakage is not driven solely by low-entropy columns. revision: yes
-
Referee: [§3.2] §3.2 (LevAtt Definition): the attack treats exact reproduction of any numeric digit string as a membership signal. The manuscript should report an ablation that varies the minimum string length and the column-selection criterion (e.g., only columns whose empirical entropy exceeds a threshold) to demonstrate that the reported AUCs are not artifacts of including trivially predictable fields such as IDs or ages.
Authors: We appreciate the request for an ablation study. We will include a new ablation in the revised §3.2 that varies the minimum string length (e.g., 4, 6, and 8 digits) and restricts columns to those exceeding an entropy threshold. The resulting AUCs will be reported to show that LevAtt remains effective even when low-entropy or trivially predictable columns are excluded. revision: yes
-
Referee: [§6] §6 (Defense Evaluation): the fidelity/utility numbers for the proposed digit-perturbation sampler are given only in aggregate. A per-column breakdown (or at least for the columns that drove the original attack success) is needed to confirm that the defense does not simply trade one form of leakage for another (e.g., by increasing variance in high-entropy columns).
Authors: We agree that aggregate metrics alone are insufficient. In the revised §6 we will provide a per-column breakdown of fidelity and utility for the digit-perturbation sampler, with emphasis on the columns that contributed most to attack success. This will confirm that the defense does not increase variance or introduce new issues in high-entropy columns. revision: yes
Circularity Check
No circularity: empirical attack defined and evaluated directly on outputs
full rationale
The paper defines LevAtt as a simple string-matching MIA on numeric digit sequences in LLM-generated tabular rows, then measures its success against explicit held-out membership labels across models and datasets. No equations, fitted parameters, or self-citations are used to derive the attack or its performance; success rates are reported as direct experimental outcomes. The central claims rest on falsifiable empirical results rather than any reduction to inputs by construction, self-definition, or load-bearing self-citation chains.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs trained or prompted on tabular data can reproduce exact numeric digit sequences from training examples in their outputs.
Reference graph
Works this paper leans on
-
[1]
Rakesh Agrawal and Jerry Kiernan. 2002. Watermarking relational databases. In VLDB’02: Proceedings of the 28th International Conference on Very Large Databases. Morgan Kaufmann, Hong Kong, China, 155–166
work page 2002
-
[2]
Abd S Alfagi, A Abd Manaf, B Hamida, S Khan, and Ali A Elrowayati. 2016. Survey on relational database watermarking techniques.ARPN-JEAS11 (2016), 422–423
work page 2016
-
[3]
Ankur Ankan and Abinash Panda. 2015. pgmpy: Probabilistic Graphical Models using Python. InProceedings of the Python in Science Conference (SciPy). SciPy, Austin, TX, USA, 6–11. https://doi.org/10.25080/majora-7b98e3ed-001
-
[4]
and Dervovic, Danial and Mahfouz, Mahmoud and Tillman, Robert E
Samuel A. Assefa, Danial Dervovic, Mahmoud Mahfouz, Robert E. Tillman, Prashant Reddy, and Manuela Veloso. 2021. Generating synthetic data in finance: opportunities, challenges and pitfalls. InProceedings of the First ACM International Conference on AI in Finance(New York, New York)(ICAIF ’20). Association for Computing Machinery, New York, NY, USA, Artic...
- [5]
- [6]
-
[7]
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, A. Terzis, and Florian Tramèr. 2021. Membership Inference Attacks From First Principles. , 1897- 1914 pages. https://api.semanticscholar.org/CorpusID:244920593
work page 2021
-
[8]
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. 2023. Quantifying Memorization Across Neural Language Models. arXiv:2202.07646 [cs.LG] https://arxiv.org/abs/2202.07646
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Lan- guage Models. arXiv:2012.07805 [cs.CR] https://arxiv.org/abs/2012.07805
-
[10]
Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. 2020. GAN-Leaks: A Taxonomy of Membership Inference Attacks against Generative Models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS ’20). ACM, Virtual Event, USA, 343–362. https://doi.org/10.1145/ 3372297.3417238
-
[11]
Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V Le, Sergey Levine, and Yi Ma. 2025. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training. InForty- second International Conference on Machine Learning, Vol. TBD. PMLR, Vancouver, Canada, XXXX–YYYY. https://openreview.net/forum?id=dYur3yabMj
work page 2025
-
[12]
Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. 2019. Neural spline flows. InAdvances in Neural Information Processing Systems, Vol. 32. Curran Associates Inc., Vancouver, Canada, 7627–7638
work page 2019
-
[13]
Sebastian Felix Fischer, Matthias Feurer, and Bernd Bischl. 2023. OpenML-CTR23 – A curated tabular regression benchmarking suite. InAutoML Conference 2023 (Workshop). PMLR, Baltimore, MD, USA. https://openreview.net/forum?id= HebAOoMm94
work page 2023
- [14]
-
[15]
Joao Fonseca and Fernando Bação. 2023. Tabular and latent space synthetic data generation: a literature review.Journal of Big Data10 (07 2023). https: //doi.org/10.1186/s40537-023-00792-7
-
[16]
Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, and Tao Jiang
-
[17]
Membership inference attacks against fine-tuned large language models via self-prompt calibration. InProceedings of the 38th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’24). Curran Associates Inc., Red Hook, NY, USA, Article 4290, 30 pages
-
[18]
Filippo Galli, Luca Melis, and Tommaso Cucinotta. 2024. Noisy Neighbors: Efficient membership inference attacks against LLMs. InProceedings of the Fifth Workshop on Privacy in Natural Language Processing, Ivan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, and Oluwaseyi Feyi...
work page 2024
-
[19]
Mauro Giuffré and Dennis L. Shung. 2023. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.NPJ Digital Medicine6 (2023). https://api.semanticscholar.org/CorpusID:263802405
work page 2023
-
[20]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Benjamin Hilprecht, Martin Härterich, and Daniel Bernau. 2019. Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models. Proceedings on Privacy Enhancing Technologies2019 (2019), 232 – 249. https: //api.semanticscholar.org/CorpusID:199546273
work page 2019
-
[22]
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. 2025. Accurate predictions on small data with a tabular foundation model.Nature637, 8045 (2025), 319–326
work page 2025
-
[23]
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Hoo, Robin Schirrmeister, and Frank Hutter. 2025. Accurate pre- dictions on small data with a tabular foundation model.Nature637 (01 2025), 319–326. https://doi.org/10.1038/s41586-024-08328-6
-
[24]
Florimond Houssiau, James Jordon, Samuel N Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch
-
[25]
Tapas: a toolbox for adversarial privacy auditing of synthetic data
-
[26]
Daphne Ippolito, Florian Tramer, Milad Nasr, Chiyuan Zhang, Matthew Jagiel- ski, Katherine Lee, Christopher Choquette Choo, and Nicholas Carlini. 2023. Preventing Generation of Verbatim Memorization in Language Models Gives a False Sense of Privacy. InProceedings of the 16th International Natural Lan- guage Generation Conference, C. Maria Keet, Hung-Yi Le...
-
[27]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, De- vendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.068...
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [28]
-
[29]
Jinhee Kim, Taesung Kim, and Jaegul Choo. 2024. EPIC: Effective Prompt- ing for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. Curran Associates, Inc., Vancouver, Canada. https://openreview.net/forum?id=d5cKDHCrFJ
work page 2024
-
[30]
Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko
-
[31]
TabDDPM: Modelling Tabular Data with Diffusion Models. arXiv:2209.15421 [cs.LG]
-
[32]
Vladimir I Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals.Soviet Physics Doklady10 (February 1966), 707
work page 1966
- [33]
-
[34]
Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schoelkopf, Mrinmaya Sachan, and Taylor Berg-Kirkpatrick. 2023. Membership Inference At- tacks against Language Models via Neighbourhood Comparison. InFindings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd- Graber, and Naoaki Okazaki (Eds.). Associatio...
-
[35]
Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerome Miklau. 2022. AIM: an adaptive and iterative mechanism for differentially private synthetic data.Proc. VLDB Endow.15, 11 (July 2022), 2599–2612. https://doi.org/10.14778/3551793. 3551817
-
[36]
2024.Achilles’ Heels: Vulnerable Record Identification in Synthetic Data Publishing
Matthieu Meeus, Florent Guepin, Ana-Maria Creţu, and Yves-Alexandre de Montjoye. 2024.Achilles’ Heels: Vulnerable Record Identification in Synthetic Data Publishing. Springer Nature Switzerland, Cham, Switzerland, 380–399. https://doi.org/10.1007/978-3-031-51476-0_19
-
[37]
Meta AI. 2024. LLaMA-3.3 70B Instruct Model. https://huggingface.co/meta- llama/Llama-3.3-70B-Instruct. Released December 6, 2024; accessed 2025-06-13
work page 2024
-
[38]
Gonzalo Navarro. 2001. A Guided Tour to Approximate String Matching.Comput. Surveys33, 1 (2001), 31–88
work page 2001
-
[39]
OpenAI. 2024. GPT-4o Mini Model in Chat Completions API. https://platform. openai.com/docs/models/gpt-4o-mini. Released July 18, 2024; accessed 2025-06- 13
work page 2024
-
[40]
Michael Platzer and Thomas Reutterer. 2021. Holdout-based empirical assessment of mixed-type synthetic data.Frontiers in big Data4 (2021), 679939
work page 2021
-
[41]
Zhaozhi Qian, Bogdan-Constantin Cebere, and Mihaela van der Schaar. 2023. Synthcity: facilitating innovative use cases of synthetic data in different data modalities. https://doi.org/10.48550/ARXIV.2301.07573
-
[42]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
2019.Language Models are Unsupervised Multitask Learners
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019.Language Models are Unsupervised Multitask Learners. Technical Report. OpenAI
work page 2019
-
[44]
Nabeel Seedat, Nicolas Huynh, Boris van Breugel, and Mihaela van der Schaar
-
[45]
InForty-first International Conference on Machine Learning, Vol
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes. InForty-first International Conference on Machine Learning, Vol. 235. PMLR, Vienna, Austria, 44060–44092
- [46]
-
[47]
Membership inference attacks against machine learning models
R. Shokri, M. Stronati, C. Song, and V. Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In2017 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 3–18. https: //doi.org/10.1109/SP.2017.41
-
[48]
Aivin V Solatorio and Olivier Dupriez. 2023. Realtabformer: Generating realistic relational and tabular data using transformers
work page 2023
-
[49]
Theresa Stadler, Bristena Oprisanu, and Carmela Troncoso. 2022. Synthetic Data – Anonymisation Groundhog Day. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 1451–1468. https: //www.usenix.org/conference/usenixsecurity22/presentation/stadler
work page 2022
-
[50]
Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Mehrdad Honarkhah, and Guang Cheng. 2023. AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing. https://openreview.net/forum?id=XhxOCXlXSh
work page 2023
-
[51]
Marshall, Severin Elvatun, Helga M.B
Vibeke Binz Vallevik, Aleksandar Babic, Serena E. Marshall, Severin Elvatun, Helga M.B. Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha R. Veerara- gavan, Anne Kjersti Befring, and Jan F. Nygård. 2024. Can I trust my fake data – A comprehensive quality assessment framework for synthetic tabular data in healthcare.International Journal of Medical Inf...
- [52]
-
[53]
Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, and Hao Wang. 2025. HARMONIC: harnessing LLMs for tabular data synthesis and privacy protection. InProceedings of the 38th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’24). Curran Associates Inc., Red Hook, NY, ...
work page 2025
-
[54]
Zhepeng Wang, Runxue Bao, Yawen Wu, Jackson Taylor, Cao Xiao, Feng Zheng, Weiwen Jiang, Shangqian Gao, and Yanfu Zhang. 2024. Unlocking Memoriza- tion in Large Language Models with Dynamic Soft Prompting. InProceed- ings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). ...
- [55]
- [56]
-
[57]
Joshua Ward, Chi-Hua Wang, and Guang Cheng. 2025. Privacy Auditing Syn- thetic Data Release through Local Likelihood Attacks. arXiv:2508.21146 [cs.LG] https://arxiv.org/abs/2508.21146
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Watson, Kristin Blesch, Jan Kapar, and Marvin N
David S. Watson, Kristin Blesch, Jan Kapar, and Marvin N. Wright. 2023. Ad- versarial Random Forests for Density Estimation and Generative Modeling. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 206), Francisco Ruiz, Jen- nifer Dy, and Jan-Willem van de Meent (...
work page 2023
-
[59]
Jinhong Wu, Konstantinos Plataniotis, Lucy Liu, Ehsan Amjadian, and Yuri Lawryshyn. 2023. Interpretation for Variational Autoencoder Used to Generate Financial Synthetic Tabular Data.Algorithms16 (02 2023), 121. https://doi.org/ 10.3390/a16020121
-
[60]
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramacha- neni. 2019. Modeling Tabular data using Conditional GAN. InAdvances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., Van- couver, Canada, 7335–7345. https://proceedings.neurips.cc/paper/2019/hash/ 254ed7d2de3b23ab10936522dd547b78-Abstract.html
work page 2019
-
[61]
Jinsung Yoon, Lydia N Drumright, and Mihaela Van Der Schaar. 2020. Anonymiza- tion through data synthesis using generative adversarial networks (ads-gan). IEEE journal of biomedical and health informatics24, 8 (2020), 2378–2388
work page 2020
-
[62]
Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2019. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. InInternational Conference on Learning Representations. OpenReview.net, New Orleans, LA, USA, 1–15. https://openreview.net/forum?id=S1zk9iRqF7
work page 2019
-
[63]
Li Yujian and Liu Bo. 2007. A Normalized Levenshtein Distance Metric.IEEE Trans. Pattern Anal. Mach. Intell.29, 6 (June 2007), 1091–1095. https://doi.org/ 10.1109/TPAMI.2007.1078
-
[64]
Hengrui Zhang, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. 2024. Mixed- Type Tabular Data Synthesis with Score-based Diffusion in Latent Space. InThe Twelfth International Conference on Learning Representations. OpenReview.net, Vienna, Austria, 4Ay23yeuz0. https://openreview.n...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.