Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data

Hadrien Lautraite; Heber H. Arcolezi; Maryam Babaei; Sebastien Gambs; Ulrich Aivodji; Yingke Wang

arxiv: 2606.06334 · v1 · pith:KEPQJHCHnew · submitted 2026-06-04 · 💻 cs.LG

Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data

Maryam Babaei , Yingke Wang , Hadrien Lautraite , Heber H. Arcolezi , Ulrich Aivodji , Sebastien Gambs This is my paper

Pith reviewed 2026-06-28 02:00 UTC · model grok-4.3

classification 💻 cs.LG

keywords privacycounterfactual explanationsmembership inference attackssynthetic datamachine learning

0 comments

The pith

Counterfactual explanations can reveal whether a record was in a model's training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that membership inference attacks created for synthetic data can be transferred directly to counterfactual explanations. These attacks identify training-set membership using only the counterfactuals themselves, without any queries to the original model. The work matters because counterfactuals are commonly released in high-stakes domains to explain decisions, and the same privacy risk that applies to synthetic data now applies to them. A sympathetic reader would conclude that releasing counterfactuals carries an under-appreciated privacy cost.

Core claim

Membership inference attacks designed for synthetic data succeed when applied to counterfactuals generated by a model, and the attacks remain effective when the attacker has access only to the set of counterfactuals and no access to the model.

What carries the argument

Direct transfer of membership inference attacks from synthetic data, treating counterfactuals as realistic substitutes for training records.

If this is right

Releasing counterfactuals can produce a privacy breach comparable to releasing synthetic data.
The attacks apply across multiple counterfactual generation methods.
Model developers must consider privacy when sharing counterfactuals with users or the public.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other forms of model explanations may carry similar membership leakage risks.
Organizations releasing explanations might need new privacy safeguards or access controls.

Load-bearing premise

Counterfactuals function as realistic substitutes for real training data in the same way synthetic data does.

What would settle it

An experiment in which the membership inference attack's success rate on counterfactuals equals its success rate on randomly chosen non-training records.

Figures

Figures reproduced from arXiv: 2606.06334 by Hadrien Lautraite, Heber H. Arcolezi, Maryam Babaei, Sebastien Gambs, Ulrich Aivodji, Yingke Wang.

**Figure 1.** Figure 1: Attack pipeline. The attacker has no access to or information about the model or counterfactual generator. They only [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Attack performance comparison. Blue lines show ROC curves for [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Acs_income – distribution_comparison between real dataset and counterfactual sets used to perform no-box ensemble [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Compas – distribution_comparison between real dataset and counterfactual sets used to perform no-box ensemble [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: heloc – distribution_comparison between real dataset and counterfactual sets used to perform no-box ensemble [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: adult – distribution_comparison between real dataset and counterfactual sets used to perform no-box ensemble [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

read the original abstract

Counterfactuals are typically used in high-stakes decision areas to explain a machine learning model by showing how changes to the user profiles result in the desired outcome. However, explaining the model's decisions through counterfactuals can also be exploited by an adversary to conduct privacy attacks against the model or its training data. Drawing on the analogy that counterfactuals provide realistic substitutes for real training data, similar to synthetic data, we demonstrate in this paper how it is possible to successfully perform privacy attacks on counterfactuals by drawing on the attacks developed against synthetic data. More precisely, we investigate the effectiveness of the membership inference attacks designed for synthetic data on various types of counterfactuals. Additionally, while existing membership inference attacks against counterfactuals usually require to be able to query the model, we show how it is possible to perform successful membership inference attacks using only a set of counterfactuals, with no access to the model from which they are generated. Our results demonstrate that model developers should be more cautious when releasing counterfactuals to various users, as it can lead to a privacy breach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts synthetic-data membership inference to counterfactuals with a no-model-access variant, but provides no metrics or justification for why the transfer works.

read the letter

The main takeaway is that the authors claim membership inference attacks designed for synthetic data can be applied directly to sets of counterfactual explanations, including a version that needs no queries to the original model. If the experiments back this up, it flags a privacy issue for releasing counterfactuals in regulated settings.

They do a reasonable job identifying the risk and noting that counterfactuals can act as stand-ins for training data in the same way synthetic records do. The no-access attack is a practical angle worth exploring.

The soft spots are substantial. The abstract gives no success rates, no dataset names, no baselines, and no controls, so there is no way to assess whether the attacks actually succeed at meaningful levels. The central assumption—that counterfactuals leak membership signals the same way full synthetic datasets do—receives no formal argument or ablation. Counterfactuals are usually sparse, instance-specific, and constrained by validity rules rather than sampled from the marginal, so the density or reconstruction features from synthetic-data attacks may not remain discriminative. The stress-test concern holds up on the given text.

This is for researchers working on privacy attacks in explainable AI. A reader hunting for concrete attack methods on explanations might extract some value if the full experiments are thorough and the transfer is shown to work. It deserves a serious referee because the topic is relevant and the no-access claim, if demonstrated cleanly, would be worth public discussion even as an adaptation.

Recommendation: send it to peer review rather than desk reject, but expect referees to demand the missing experimental details and a direct check on whether the synthetic-data attack features survive the counterfactual regime.

Referee Report

2 major / 1 minor

Summary. The paper claims that counterfactual explanations can leak membership information about training data in a manner analogous to synthetic datasets. It asserts that membership inference attacks originally developed for synthetic data can be directly applied to sets of counterfactuals, achieving successful attacks even without query access to the underlying model. The work investigates this across various counterfactual types and concludes that model developers should exercise caution when releasing counterfactuals due to privacy risks.

Significance. If the empirical transfer of synthetic-data MIA techniques to counterfactuals is valid, the result would extend documented privacy vulnerabilities from data synthesis to the explainable-AI setting, providing a concrete reason to treat released counterfactual sets as potentially sensitive. The analogy itself is a useful framing, but the manuscript supplies no quantitative evidence, baselines, or controls in the visible text, preventing assessment of whether the claimed success is robust or merely an artifact of untested assumptions about sampling regimes.

major comments (2)

[Abstract] Abstract and introduction: the central premise that counterfactuals 'provide realistic substitutes for real training data, similar to synthetic data' is asserted without any ablation, density comparison, or feature-distribution analysis showing that MIA signals (density ratios, reconstruction error, shadow-model outputs) remain discriminative when the data are sparse, validity-constrained, and instance-specific rather than drawn from a model's marginal.
[Abstract] The manuscript states that attacks succeed 'with no access to the model' yet supplies no metrics, attack success rates, baseline comparisons, dataset descriptions, or statistical controls in the provided text, rendering the empirical claim impossible to evaluate for soundness.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one quantitative result (e.g., attack accuracy or AUC) and the datasets used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We address each major comment point by point below, clarifying the empirical content of the full manuscript while agreeing to strengthen certain aspects in revision.

read point-by-point responses

Referee: [Abstract] Abstract and introduction: the central premise that counterfactuals 'provide realistic substitutes for real training data, similar to synthetic data' is asserted without any ablation, density comparison, or feature-distribution analysis showing that MIA signals (density ratios, reconstruction error, shadow-model outputs) remain discriminative when the data are sparse, validity-constrained, and instance-specific rather than drawn from a model's marginal.

Authors: We agree that the analogy benefits from explicit supporting analysis. The full manuscript reports empirical transfer of MIA techniques with attack success rates on counterfactual sets, but does not include dedicated density or feature-distribution ablations. We will add these comparisons (e.g., KL divergence between counterfactual and training distributions, plus ablation on sparsity/validity constraints) to the revised version. revision: yes
Referee: [Abstract] The manuscript states that attacks succeed 'with no access to the model' yet supplies no metrics, attack success rates, baseline comparisons, dataset descriptions, or statistical controls in the provided text, rendering the empirical claim impossible to evaluate for soundness.

Authors: The full manuscript contains these elements in the experimental evaluation (attack success rates, baseline comparisons against random and shadow-model attacks, dataset details, and statistical significance). The referee's observation correctly notes their absence from the abstract alone. We will incorporate a concise summary of key quantitative results into the abstract during revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical attack transfer with no derivations or self-referential reductions.

full rationale

The paper is an empirical demonstration that applies pre-existing membership inference attacks (designed for synthetic data) directly to released counterfactual sets. No equations, fitted parameters, or derivations appear in the provided text. The central claim rests on experimental transfer of attack success rates rather than any self-definition, fitted-input prediction, or load-bearing self-citation chain. The analogy between counterfactuals and synthetic data is stated as a premise for the experiments but is not used to derive results by construction; results are measured outcomes. This matches the default case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review is based on abstract only; no free parameters, axioms, or invented entities are stated or derivable from the provided text.

pith-pipeline@v0.9.1-grok · 5738 in / 904 out tokens · 29704 ms · 2026-06-28T02:00:52.029690+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. 2020. Model extraction from counterfactual explanations.arXiv preprint arXiv:2009.01884(2020)

work page arXiv 2020
[2]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias: Risk Assessments in Criminal Sentencing. https: //github.com/propublica/compas-analysis. Accessed 2025-05-15

2016
[3]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository

2007
[4]

André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, and Ian Foster. 2024. Compre- hensive exploration of synthetic data generation: A survey.arXiv preprint arXiv:2401.02524(2024)

work page arXiv 2024
[5]

Brett K Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, and Casey S Greene. 2019. Privacy-preserving generative deep neural networks support clinical data sharing.Circulation: Cardiovascular Quality and Outcomes12, 7 (2019), e005122

2019
[6]

Dieter Brughmans, Pieter Leyman, and David Martens. 2023. Nice: an algorithm for nearest instance counterfactual explanations.Data Mining and Knowledge Discovery(2023), 1–39

2023
[7]

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. 2020. Gan-leaks: A taxonomy of membership inference attacks against generative models. InProceedings of the 2020 ACM SIGSAC conference on computer and communications security. 343–362

2020
[8]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security. 15–26

2017
[9]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy.Foundations and Trends®in Theoretical Computer Science9, 3–4 (2014), 211–407

2014
[10]

Sofie Goethals, Kenneth Sörensen, and David Martens. 2023. The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks.ACM Trans. Intell. Syst. Technol.14, 5, Article 83 (aug 2023), 24 pages. doi:10.1145/3608482

work page doi:10.1145/3608482 2023
[11]

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio
[12]

Generative adversarial nets.Advances in neural information processing systems27 (2014)

2014
[13]

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2017. Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Benjamin Hilprecht, Martin Härterich, and Daniel Bernau. 2019. Monte carlo and reconstruction membership inference attacks against generative models.Proceedings on Privacy Enhancing Technologies(2019)

2019
[15]

Florimond Houssiau, James Jordon, Samuel N Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch. 2022. TAPAS: a toolbox for adversarial privacy auditing of synthetic data.arXiv preprint arXiv:2211.06550(2022)

work page arXiv 2022
[16]

Aoting Hu, Renjie Xie, Zhigang Lu, Aiqun Hu, and Minhui Xue. 2021. Tablegan-mca: Evaluating membership collisions of gan-synthesized tabular data releasing. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2096–2112

2021
[17]

Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, and Himabindu Lakkaraju. 2023. Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage.arXiv preprint arXiv:2308.04341(2023)

work page arXiv 2023
[18]

Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence. 43–58

2011
[19]

Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2022. A survey of algorithmic recourse: contrastive explanations and consequential recommendations.Comput. Surveys55, 5 (2022), 1–29

2022
[20]

Aditya Kuppa and Nhien-An Le-Khac. 2021. Adversarial XAI methods in cybersecurity.IEEE transactions on information forensics and security16 (2021), 4924–4938

2021
[21]

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2019. The dangers of post-hoc interpretability: unjustified counterfactual explanations. InProceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China)(IJCAI’19). 2801–2807

2019
[22]

Ryan McKenna, Gerome Miklau, and Daniel Sheldon. 2021. Winning the nist contest: A scalable and general approach to differentially private synthetic data.arXiv preprint arXiv:2108.04978(2021)

work page arXiv 2021
[23]

Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerome Miklau. 2022. AIM: an adaptive and iterative mechanism for differentially private synthetic data.Proc. VLDB Endow.15, 11 (July 2022), 2599–2612. doi:10.14778/3551793.3551817

work page doi:10.14778/3551793.3551817 2022
[24]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. InProceedings of the 2020 conference on fairness, accountability, and transparency. 607–617

2020
[25]

2022.Privacy-preserving counterfactual explanations to help humans contest AI-based decisions

DJ Nelson. 2022.Privacy-preserving counterfactual explanations to help humans contest AI-based decisions. Master’s thesis. University of Twente

2022
[26]

2021.Synthetic data for deep learning

Sergey I Nikolenko et al. 2021.Synthetic data for deep learning. Vol. 174. Springer

2021
[27]

OpenML. 2018. FICO-HELOC-cleaned Dataset. https://openml.org/d/45554. Accessed 2025-05-15

2018
[28]

Art B. Owen. 2013.Monte Carlo theory, methods and examples. https://artowen.su.domains/mc/. FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Babaei et al

2013
[29]

Martin Pawelczyk, Himabindu Lakkaraju, and Seth Neel. 2023. On the privacy risks of algorithmic recourse. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 9680–9696

2023
[30]

Sikha Pentyala, Shubham Sharma, Sanjay Kariyappa, Freddy Lecue, and Daniele Magazzeni. 2023. Privacy-Preserving Algorithmic Recourse.arXiv preprint arXiv:2311.14137(2023)

work page arXiv 2023
[31]

Reza Shokri, Martin Strobel, and Yair Zick. 2021. On the privacy risks of model explanations. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 231–241

2021
[32]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP). IEEE, 3–18

2017
[33]

Marlon Tobaben, Hibiki Ito, Joonas Jälkö, Yuan He, and Antti Honkela. 2025. Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning. InAdvances in Neural Information Processing Systems, Vol. 38. 67486–67537

2025
[34]

Boris van Breugel, Hao Sun, Zhaozhi Qian, and Mihaela van der Schaar. 2023. Membership Inference Attacks against Synthetic Data through Overfitting Detection. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 3493–3514

2023
[35]

Giuseppe Vietri, Cedric Archambeau, Sergul Aydore, William Brown, Michael Kearns, Aaron Roth, Ankit Siva, Shuai Tang, and Steven Z Wu. 2022. Private synthetic data for multitask learning and marginal queries.Advances in Neural Information Processing Systems35 (2022), 18282–18295

2022
[36]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR.Harv. JL & Tech.31 (2017), 841

2017
[37]

Yongjie Wang, Hangwei Qian, and Chunyan Miao. 2022. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. In2022 ACM Conference on Fairness, Accountability, and Transparency. 1318–1329

2022
[38]

Joshua Ward, Chi-Hua Wang, and Guang Cheng. 2024. Data plagiarism index: Characterizing the privacy risk of data-copying in tabular generative models.arXiv preprint arXiv:2406.13012(2024)

work page arXiv 2024
[39]

Joshua Ward, Chi-Hua Wang, and Guang Cheng. 2025. Privacy Auditing Synthetic Data Release through Local Likelihood Attacks. arXiv preprint arXiv:2508.21146(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Joshua Ward, Yuxuan Yang, Chi-Hua Wang, and Guang Cheng. 2025. Ensembling Membership Inference Attacks Against Tabular Generative Models. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security (AISec ’25). Association for Computing Machinery, New York, NY, USA, 182–193. doi:10.1145/3733799.3762977 A Distribution comparison between ...

work page doi:10.1145/3733799.3762977 2025

[1] [1]

Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. 2020. Model extraction from counterfactual explanations.arXiv preprint arXiv:2009.01884(2020)

work page arXiv 2020

[2] [2]

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias: Risk Assessments in Criminal Sentencing. https: //github.com/propublica/compas-analysis. Accessed 2025-05-15

2016

[3] [3]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository

2007

[4] [4]

André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, and Ian Foster. 2024. Compre- hensive exploration of synthetic data generation: A survey.arXiv preprint arXiv:2401.02524(2024)

work page arXiv 2024

[5] [5]

Brett K Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, and Casey S Greene. 2019. Privacy-preserving generative deep neural networks support clinical data sharing.Circulation: Cardiovascular Quality and Outcomes12, 7 (2019), e005122

2019

[6] [6]

Dieter Brughmans, Pieter Leyman, and David Martens. 2023. Nice: an algorithm for nearest instance counterfactual explanations.Data Mining and Knowledge Discovery(2023), 1–39

2023

[7] [7]

Dingfan Chen, Ning Yu, Yang Zhang, and Mario Fritz. 2020. Gan-leaks: A taxonomy of membership inference attacks against generative models. InProceedings of the 2020 ACM SIGSAC conference on computer and communications security. 343–362

2020

[8] [8]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security. 15–26

2017

[9] [9]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy.Foundations and Trends®in Theoretical Computer Science9, 3–4 (2014), 211–407

2014

[10] [10]

Sofie Goethals, Kenneth Sörensen, and David Martens. 2023. The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks.ACM Trans. Intell. Syst. Technol.14, 5, Article 83 (aug 2023), 24 pages. doi:10.1145/3608482

work page doi:10.1145/3608482 2023

[11] [11]

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

[12] [12]

Generative adversarial nets.Advances in neural information processing systems27 (2014)

2014

[13] [13]

Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2017. Logan: Membership inference attacks against generative models.arXiv preprint arXiv:1705.07663(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Benjamin Hilprecht, Martin Härterich, and Daniel Bernau. 2019. Monte carlo and reconstruction membership inference attacks against generative models.Proceedings on Privacy Enhancing Technologies(2019)

2019

[15] [15]

Florimond Houssiau, James Jordon, Samuel N Cohen, Owen Daniel, Andrew Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, and Lukasz Szpruch. 2022. TAPAS: a toolbox for adversarial privacy auditing of synthetic data.arXiv preprint arXiv:2211.06550(2022)

work page arXiv 2022

[16] [16]

Aoting Hu, Renjie Xie, Zhigang Lu, Aiqun Hu, and Minhui Xue. 2021. Tablegan-mca: Evaluating membership collisions of gan-synthesized tabular data releasing. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2096–2112

2021

[17] [17]

Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, and Himabindu Lakkaraju. 2023. Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage.arXiv preprint arXiv:2308.04341(2023)

work page arXiv 2023

[18] [18]

Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and J Doug Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM workshop on Security and artificial intelligence. 43–58

2011

[19] [19]

Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2022. A survey of algorithmic recourse: contrastive explanations and consequential recommendations.Comput. Surveys55, 5 (2022), 1–29

2022

[20] [20]

Aditya Kuppa and Nhien-An Le-Khac. 2021. Adversarial XAI methods in cybersecurity.IEEE transactions on information forensics and security16 (2021), 4924–4938

2021

[21] [21]

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2019. The dangers of post-hoc interpretability: unjustified counterfactual explanations. InProceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China)(IJCAI’19). 2801–2807

2019

[22] [22]

Ryan McKenna, Gerome Miklau, and Daniel Sheldon. 2021. Winning the nist contest: A scalable and general approach to differentially private synthetic data.arXiv preprint arXiv:2108.04978(2021)

work page arXiv 2021

[23] [23]

Ryan McKenna, Brett Mullins, Daniel Sheldon, and Gerome Miklau. 2022. AIM: an adaptive and iterative mechanism for differentially private synthetic data.Proc. VLDB Endow.15, 11 (July 2022), 2599–2612. doi:10.14778/3551793.3551817

work page doi:10.14778/3551793.3551817 2022

[24] [24]

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. InProceedings of the 2020 conference on fairness, accountability, and transparency. 607–617

2020

[25] [25]

2022.Privacy-preserving counterfactual explanations to help humans contest AI-based decisions

DJ Nelson. 2022.Privacy-preserving counterfactual explanations to help humans contest AI-based decisions. Master’s thesis. University of Twente

2022

[26] [26]

2021.Synthetic data for deep learning

Sergey I Nikolenko et al. 2021.Synthetic data for deep learning. Vol. 174. Springer

2021

[27] [27]

OpenML. 2018. FICO-HELOC-cleaned Dataset. https://openml.org/d/45554. Accessed 2025-05-15

2018

[28] [28]

Art B. Owen. 2013.Monte Carlo theory, methods and examples. https://artowen.su.domains/mc/. FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Babaei et al

2013

[29] [29]

Martin Pawelczyk, Himabindu Lakkaraju, and Seth Neel. 2023. On the privacy risks of algorithmic recourse. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 9680–9696

2023

[30] [30]

Sikha Pentyala, Shubham Sharma, Sanjay Kariyappa, Freddy Lecue, and Daniele Magazzeni. 2023. Privacy-Preserving Algorithmic Recourse.arXiv preprint arXiv:2311.14137(2023)

work page arXiv 2023

[31] [31]

Reza Shokri, Martin Strobel, and Yair Zick. 2021. On the privacy risks of model explanations. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 231–241

2021

[32] [32]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP). IEEE, 3–18

2017

[33] [33]

Marlon Tobaben, Hibiki Ito, Joonas Jälkö, Yuan He, and Antti Honkela. 2025. Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning. InAdvances in Neural Information Processing Systems, Vol. 38. 67486–67537

2025

[34] [34]

Boris van Breugel, Hao Sun, Zhaozhi Qian, and Mihaela van der Schaar. 2023. Membership Inference Attacks against Synthetic Data through Overfitting Detection. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 3493–3514

2023

[35] [35]

Giuseppe Vietri, Cedric Archambeau, Sergul Aydore, William Brown, Michael Kearns, Aaron Roth, Ankit Siva, Shuai Tang, and Steven Z Wu. 2022. Private synthetic data for multitask learning and marginal queries.Advances in Neural Information Processing Systems35 (2022), 18282–18295

2022

[36] [36]

Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR.Harv. JL & Tech.31 (2017), 841

2017

[37] [37]

Yongjie Wang, Hangwei Qian, and Chunyan Miao. 2022. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. In2022 ACM Conference on Fairness, Accountability, and Transparency. 1318–1329

2022

[38] [38]

Joshua Ward, Chi-Hua Wang, and Guang Cheng. 2024. Data plagiarism index: Characterizing the privacy risk of data-copying in tabular generative models.arXiv preprint arXiv:2406.13012(2024)

work page arXiv 2024

[39] [39]

Joshua Ward, Chi-Hua Wang, and Guang Cheng. 2025. Privacy Auditing Synthetic Data Release through Local Likelihood Attacks. arXiv preprint arXiv:2508.21146(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Joshua Ward, Yuxuan Yang, Chi-Hua Wang, and Guang Cheng. 2025. Ensembling Membership Inference Attacks Against Tabular Generative Models. InProceedings of the 18th ACM Workshop on Artificial Intelligence and Security (AISec ’25). Association for Computing Machinery, New York, NY, USA, 182–193. doi:10.1145/3733799.3762977 A Distribution comparison between ...

work page doi:10.1145/3733799.3762977 2025