pith. machine review for the scientific record. sign in

arxiv: 2605.01129 · v1 · submitted 2026-05-01 · 💻 cs.CR

Recognition: unknown

Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:34 UTC · model grok-4.3

classification 💻 cs.CR
keywords machine unlearningmembership inference attackprivacy leakageretain settri-class inferencedifferential privacydropoutmodel privacy
0
0 comments X

The pith

Machine unlearning increases privacy risks for the data left in the model after removal requests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When a model is asked to forget certain training samples, the remaining data can become easier for an attacker to identify as having been part of the original training set. The authors introduce TC-UMIA, a method that examines how a model's predictions shift before and after unlearning to separate forgotten samples from retained samples and from completely new data. Experiments across five unlearning algorithms and six datasets show that retain-set members become more exposed to membership inference. The work also tests three defenses and identifies accuracy costs in exchange for reduced leakage.

Core claim

Unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks. TC-UMIA is a population-level inference framework that leverages model predictions before and after unlearning to distinguish among the forget, retain, and unseen set and proves effective across a wide range of model architectures, datasets, and MU approaches.

What carries the argument

TC-UMIA, a tri-class membership inference attack that uses population-level signals from changes in model predictions before versus after unlearning to classify samples as forget-set, retain-set, or unseen.

Load-bearing premise

Changes in model predictions before and after unlearning provide a reliable population-level signal to separate retain-set members from unseen data.

What would settle it

A test in which an attacker armed only with pre- and post-unlearning predictions fails to classify retain-set samples versus unseen samples at accuracy significantly above random chance across multiple standard unlearning methods.

Figures

Figures reproduced from arXiv: 2605.01129 by Da Zhong, Jie Fu, Nima Naderloui, Wendy Hui Wang, Yuan Hong.

Figure 1
Figure 1. Figure 1: Illustration of threat model. original model, and likewise, fθ− and f − to refer to the unlearning model throughout the remainder of this paper. Attack Goal. The adversary attempts to determine the membership of a specific data instance (x, y) in the training dataset of f and f −. Specifically, there are three types of membership sets: • Forget set DF , i.e., the set of instances to be removed from the tra… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the instances across three membership classes by various types of features (CIFAR-100 dataset). The unseen set, forget set, and retain set are colored in blue, green, and red, respectively. P and P − are the posterior probability vector output by the models before and after unlearning, respectively. Py and P − y are the posterior probability of the ground-truth label y predicted by the mod… view at source ↗
Figure 3
Figure 3. Figure 3: The training process of TC-UMIA. PU (P − U ), PF (P − F ), and PR (P − R) denote the posterior probabilities of the ground-truth labels produced by the original (unlearned) model for samples in the unseen, forget, and retain sets, respectively. which unlearning algorithm is used by the server, they may apply a different unlearning mechanism. This mis￾match can introduce behavioral discrepancies between the… view at source ↗
Figure 4
Figure 4. Figure 4: Transferability of TC-UMIA. The horizontal and vertical axes represent the target model/dataset/MU methods and the shadow model/dataset/MU methods, respectively. (a) Size of forget set (b) Size ratio of three classes (c) Size of attack training set view at source ↗
Figure 5
Figure 5. Figure 5: Impact of various factors on the performance of view at source ↗
Figure 6
Figure 6. Figure 6: TC-UMIA’s accuracy (F1-score) using different features. CP, DF, SM, CT, and CDS stand for P ∥ P −, Py −P − y , Py + P − y , Py ∥ P − y , view at source ↗
Figure 7
Figure 7. Figure 7: The frequency distribution of posterior probability of the true label before and after unlearning (ResNet-18, view at source ↗
Figure 8
Figure 8. Figure 8: Per-class transferability of TC-UMIA while exhibiting slightly worse performance on the retain set when low-uncertainty instances are removed. Such disparate effects across the three sets are likely due to the differing influence these samples have on the model. Low￾uncertainty samples (i.e., those the model predicts with high confidence) tend to have a stronger impact on the model’s decision boundaries. A… view at source ↗
read the original abstract

Machine unlearning (MU) has emerged as a key mechanism for ensuring data privacy and regulatory compliance by enabling models to forget specific training samples. However, recent studies have shown that the removal of data can inadvertently introduce privacy leakages to the retain set,i.e., data that remain in the model after unlearning. In this paper, we extend the scope of privacy analysis in unlearning to the often-overlooked retained data. We introduce TC-UMIA, the first tri-class unlearning membership inference attack. TC-UMIA is a population-level inference framework that leverages model predictions before and after unlearning to distinguish among the forget, retain, and unseen set. Extensive experiments on five state-of-the-art unlearning algorithms and six real-world datasets demonstrate that: (i) unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks; (ii) TC-UMIA is effective across a wide range of model architectures, datasets, and MU approaches. Beyond launching the attack, we rigorously evaluate three defense mechanisms, namely label-only outputs, dropout, and differential privacy, to mitigate the privacy risks posed by TC- UMIA. Our results reveal a fundamental trade-off between privacy protection and model accuracy, with the dropout approach offering the most favorable balance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that machine unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks. It proposes TC-UMIA, a tri-class unlearning membership inference attack that leverages model predictions before and after unlearning to distinguish between forget, retain, and unseen sets. Extensive experiments on five unlearning algorithms and six datasets show TC-UMIA's effectiveness with AUCs consistently above 0.75, outperforming binary baselines, and evaluate defenses including dropout, label-only outputs, and differential privacy, revealing accuracy-privacy trade-offs.

Significance. If the empirical results hold, this work is significant for the field of machine unlearning and privacy, as it identifies a new attack surface on retained data and provides a practical attack framework along with defense evaluations. The broad experimental validation across architectures, datasets, and methods adds to its impact. The paper credits the use of population-level signals from prediction shifts as a reliable distinguisher.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'extensive experiments' and 'rigorous' defense evaluation is not supported by any mention of statistical tests, confidence intervals, or specific AUC values; adding one sentence summarizing key quantitative outcomes would improve the abstract's informativeness.
  2. [Experiments] Experimental sections: while ablation controls for architecture and hyperparameters are reported, the manuscript should explicitly state whether the tri-class formulation's superiority over binary MIA baselines was tested for statistical significance across all six datasets.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work and the recommendation for minor revision. The referee's summary accurately reflects the core contributions regarding privacy risks to the retain set in machine unlearning and the proposed TC-UMIA framework. No specific major comments were included in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical demonstration of a membership inference attack (TC-UMIA) that uses observable pre- and post-unlearning prediction shifts as a population-level signal. No mathematical derivations, fitted parameters, or equations are presented that could reduce claims to inputs by construction. Central results consist of reported attack AUCs, ablation studies, and defense evaluations across multiple datasets and unlearning methods; these are externally falsifiable experimental outcomes rather than self-referential definitions or self-citation chains. No load-bearing steps match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical attack paper; no mathematical axioms, free parameters, or invented entities are invoked in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 942 out tokens · 27930 ms · 2026-05-09T18:34:29.911307+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    G. D. P. Regulation, “Gdpr,” https://gdpr-info.eu/, 2016

  2. [2]

    T. C. C. P. A. (CCPA), “Ccpa,” https://oag.ca.gov/privacy/ccpa, 2018

  3. [3]

    The privacy onion effect: Memorization is relative,

    N. Carlini, M. Jagielski, C. Zhang, N. Papernot, A. Terzis, and F. Tram `er, “The privacy onion effect: Memorization is relative,” Advances in Neural Information Processing Systems, vol. 35, pp. 13 263–13 276, 2022

  4. [4]

    Remember what you want to forget: Algorithms for machine unlearning,

    A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what you want to forget: Algorithms for machine unlearning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 075–18 086, 2021

  5. [5]

    Towards unbounded machine unlearning,

    M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou, “Towards unbounded machine unlearning,” Advances in neural information processing systems, vol. 36, 2024

  6. [6]

    Unrolling sgd: Understanding factors influencing machine unlearning,

    A. Thudi, G. Deza, V . Chandrasekaran, and N. Papernot, “Unrolling sgd: Understanding factors influencing machine unlearning,” in 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 303–319

  7. [7]

    Certified edge unlearning for graph neural networks,

    K. Wu, J. Shen, Y . Ning, T. Wang, and W. H. Wang, “Certified edge unlearning for graph neural networks,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 2606–2617

  8. [8]

    Gif: A general graph unlearning strategy via influence function,

    J. Wu, Y . Yang, Y . Qian, Y . Sui, X. Wang, and X. He, “Gif: A general graph unlearning strategy via influence function,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 651– 661

  9. [9]

    Recommendation unlearning,

    C. Chen, F. Sun, M. Zhang, and B. Ding, “Recommendation unlearning,” in Proceedings of the ACM web conference 2022, 2022, pp. 2768–2777

  10. [10]

    Making rec- ommender systems forget: Learning and unlearning for erasable recommendation,

    Y . Li, C. Chen, X. Zheng, J. Liu, and J. Wang, “Making rec- ommender systems forget: Learning and unlearning for erasable recommendation,” Knowledge-Based Systems, vol. 283, p. 111124, 2024

  11. [11]

    Rethinking machine unlearning for large language models,

    S. Liu, Y . Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, Y . Yao, C. Y . Liu, X. Xu, H. Li et al., “Rethinking machine unlearning for large language models,” Nature Machine Intelligence, pp. 1–14, 2025

  12. [12]

    Large language model unlearning,

    Y . Yao, X. Xu, and Y . Liu, “Large language model unlearning,” Advances in Neural Information Processing Systems, vol. 37, pp. 105 425–105 475, 2024

  13. [13]

    When machine unlearning jeopardizes privacy,

    M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y . Zhang, “When machine unlearning jeopardizes privacy,” in Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, 2021, pp. 896–911

  14. [14]

    Forgetting outside the box: Scrubbing deep networks of information accessible from input- output observations,

    A. Golatkar, A. Achille, and S. Soatto, “Forgetting outside the box: Scrubbing deep networks of information accessible from input- output observations,” in European Conference on Computer Vision. Springer, 2020, pp. 383–398

  15. [15]

    Inexact unlearning needs more careful evaluations to avoid a false sense of privacy,

    J. Hayes, I. Shumailov, E. Triantafillou, A. Khalifa, and N. Paper- not, “Inexact unlearning needs more careful evaluations to avoid a false sense of privacy,” arXiv preprint arXiv:2403.01218, 2024

  16. [16]

    Rectifying privacy and efficacy measurements in machine unlearning: A new inference attack perspective,

    N. Naderloui, S. Yan, B. Wang, J. Fu, W. H. Wang, W. Liu, and Y . Hong, “Rectifying privacy and efficacy measurements in machine unlearning: A new inference attack perspective,” in 34th USENIX security symposium (USENIX Security 25), 2025

  17. [17]

    Membership inference attacks against machine learning models,

    R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE symposium on security and privacy (SP). IEEE, 2017, pp. 3–18

  18. [18]

    ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models

    A. Salem, Y . Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes, “Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,” arXiv preprint arXiv:1806.01246, 2018

  19. [19]

    Learn to forget: Machine unlearning via neuron masking,

    Z. Ma, Y . Liu, X. Liu, J. Liu, J. Ma, and K. Ren, “Learn to forget: Machine unlearning via neuron masking,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3194–3207, 2022

  20. [20]

    Amnesiac machine learning,

    L. Graves, V . Nagisetty, and V . Ganesh, “Amnesiac machine learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 516–11 524

  21. [21]

    Senate bill 264: Interests of foreign countries,

    T. F. Senate, “Senate bill 264: Interests of foreign countries,” https: //www.flsenate.gov/Session/Bill/2023/264, 2023

  22. [22]

    Senate bill 17: Relating to the purchase or acquisition of an interest in real property by certain aliens or foreign entities,

    T. T. Senate, “Senate bill 17: Relating to the purchase or acquisition of an interest in real property by certain aliens or foreign entities,” https://legiscan.com/TX/text/SB17/2025, 2025

  23. [23]

    Membership inference attacks on machine learning: A survey,

    H. Hu, Z. Salcic, L. Sun, G. Dobbie, P. S. Yu, and X. Zhang, “Membership inference attacks on machine learning: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 11s, pp. 1–37, 2022

  24. [24]

    Membership inference attacks from first principles,

    N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 1897– 1914

  25. [25]

    Understanding membership in- ferences on well-generalized learning models,

    Y . Long, V . Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. Gunter, and K. Chen, “Understanding membership in- ferences on well-generalized learning models,” arXiv preprint arXiv:1802.04889, 2018

  26. [26]

    Demystifying membership inference attacks in machine learning as a service,

    S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “Demystifying membership inference attacks in machine learning as a service,” IEEE transactions on services computing, vol. 14, no. 6, pp. 2073– 2089, 2019

  27. [27]

    Machine un- learning,

    L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine un- learning,” in 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021, pp. 141–159

  28. [28]

    2022 , month = oct, journal =

    T. T. Nguyen, T. T. Huynh, Z. Ren, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V . H. Nguyen, “A survey of machine unlearning,” arXiv preprint arXiv:2209.02299, 2022

  29. [29]

    Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy,

    T. Shaik, X. Tao, H. Xie, L. Li, X. Zhu, and Q. Li, “Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy,” IEEE Transactions on Neural Networks and Learning Systems, 2024

  30. [30]

    Certi- fied data removal from machine learning models,

    C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten, “Certi- fied data removal from machine learning models,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 3832–3842

  31. [31]

    Model sparsity can simplify machine unlearning,

    J. Liu, P. Ram, Y . Yao, G. Liu, Y . Liu, P. Sharma, and S. Liu, “Model sparsity can simplify machine unlearning,” Advances in Neural Information Processing Systems, vol. 36, 2024

  32. [32]

    Machine unlearn- ing: Linear filtration for logit-based classifiers,

    T. Baumhauer, P. Sch ¨ottle, and M. Zeppelzauer, “Machine unlearn- ing: Linear filtration for logit-based classifiers,” Machine Learning, vol. 111, no. 9, pp. 3203–3226, 2022

  33. [33]

    Membership inference attacks by exploiting loss trajectory,

    Y . Liu, Z. Zhao, M. Backes, and Y . Zhang, “Membership inference attacks by exploiting loss trajectory,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2085–2098

  34. [34]

    Enhanced membership inference attacks against machine learning models,

    J. Ye, A. Maddi, S. K. Murakonda, V . Bindschaedler, and R. Shokri, “Enhanced membership inference attacks against machine learning models,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 3093–3106

  35. [35]

    Canary in a coalmine: Better member- ship inference with ensembled adversarial queries,

    Y . Wen, A. Bansal, H. Kazemi, E. Borgnia, M. Goldblum, J. Geip- ing, and T. Goldstein, “Canary in a coalmine: Better member- ship inference with ensembled adversarial queries,” arXiv preprint arXiv:2210.10750, 2022

  36. [36]

    Low-cost high-power mem- bership inference attacks,

    S. Zarifzadeh, P. Liu, and R. Shokri, “Low-cost high-power mem- bership inference attacks,” in Forty-first International Conference on Machine Learning, 2024

  37. [37]

    arXiv preprint arXiv:2201.06640 , year =

    S. Goel, A. Prabhu, A. Sanyal, S.-N. Lim, P. Torr, and P. Ku- maraguru, “Towards adversarial evaluations for inexact machine unlearning,” arXiv preprint arXiv:2201.06640, 2022

  38. [38]

    Machine unlearning: Solu- tions and challenges,

    J. Xu, Z. Wu, C. Wang, and X. Jia, “Machine unlearning: Solu- tions and challenges,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

  39. [39]

    Datarobot inc

    “Datarobot inc.” https://www.datarobot.com, 2017

  40. [40]

    H2o.ai ai platform,

    “H2o.ai ai platform,” https://h2o.ai, 2015

  41. [41]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hinton, “Learning multiple layers of features from tiny images,” 2009

  42. [42]

    Cinic-10 is not imagenet or cifar-10.arXiv preprint arXiv:1810.03505, 2018

    L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “Cinic-10 is not imagenet or cifar-10,” arXiv preprint arXiv:1810.03505, 2018

  43. [43]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

  44. [44]

    Recursive deep models for semantic composi- tionality over a sentiment treebank,

    R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y . Ng, and C. Potts, “Recursive deep models for semantic composi- tionality over a sentiment treebank,” in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642

  45. [45]

    Effective 20 newsgroups dataset cleaning,

    K. Albishre, M. Albathan, and Y . Li, “Effective 20 newsgroups dataset cleaning,” in 2015 IEEE/WIC/ACM International Confer- ence on Web Intelligence and Intelligent Agent Technology (WI- IAT), vol. 3. IEEE, 2015, pp. 98–101

  46. [46]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708

  47. [47]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  48. [48]

    Pythia: A suite for analyzing large language models across training and scaling,

    S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff et al., “Pythia: A suite for analyzing large language models across training and scaling,” in International Conference on Machine Learning. PMLR, 2023, pp. 2397–2430

  49. [49]

    Eternal sunshine of the spotless net: Selective forgetting in deep networks,

    A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312

  50. [50]

    arXiv preprint arXiv:2008.05756 , year=

    M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class clas- sification: an overview,” arXiv preprint arXiv:2008.05756, 2020

  51. [51]

    Privacy risk in machine learning: Analyzing the connection to overfitting,

    S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in 2018 IEEE 31st computer security foundations symposium (CSF). IEEE, 2018, pp. 268–282

  52. [52]

    Knowledge unlearning for mitigating privacy risks in language models,

    J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo, “Knowledge unlearning for mitigating privacy risks in language models,” in 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Association for Compu- tational Linguistics (ACL), 2023, pp. 14 389–14 408

  53. [53]

    Editing models with task arithmetic,

    G. Ilharco, M. T. Ribeiro, M. Wortsman, L. Schmidt, H. Hajishirzi, and A. Farhadi, “Editing models with task arithmetic,” in The Eleventh International Conference on Learning Representations, 2022

  54. [54]

    arXiv preprint arXiv:2404.05868 , year=

    R. Zhang, L. Lin, Y . Bai, and S. Mei, “Negative preference optimization: From catastrophic collapse to effective unlearning,” arXiv preprint arXiv:2404.05868, 2024

  55. [55]

    Continual learning and private unlearning,

    B. Liu, Q. Liu, and P. Stone, “Continual learning and private unlearning,” in Conference on Lifelong Learning Agents. PMLR, 2022, pp. 243–254

  56. [56]

    Tofu: A task of fictitious unlearning for llms,

    P. Maini, Z. Feng, A. Schwarzschild, Z. C. Lipton, and J. Z. Kolter, “Tofu: A task of fictitious unlearning for llms,” in First Conference on Language Modeling, 2024

  57. [57]

    Label-only membership inference attacks,

    C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” in International con- ference on machine learning. PMLR, 2021, pp. 1964–1974

  58. [58]

    On the effectiveness of reg- ularization against membership inference attacks,

    Y . Kaya, S. Hong, and T. Dumitras, “On the effectiveness of reg- ularization against membership inference attacks,” arXiv preprint arXiv:2006.05336, 2020

  59. [59]

    Defending privacy against more knowledgeable membership inference attackers,

    Y . Yin, K. Chen, L. Shou, and G. Chen, “Defending privacy against more knowledgeable membership inference attackers,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2026–2036

  60. [60]

    Improving neural networks with dropout,

    N. Srivastava, “Improving neural networks with dropout,” Univer- sity of Toronto, vol. 182, no. 566, p. 7, 2013

  61. [61]

    Differential privacy,

    C. Dwork, “Differential privacy,” in International colloquium on automata, languages, and programming. Springer, 2006, pp. 1– 12

  62. [62]

    Dpsur: Accelerating differentially private stochastic gradient de- scent using selective update and release,

    J. Fu, Q. Ye, H. Hu, Z. Chen, L. Wang, K. Wang, and X. Ran, “Dpsur: Accelerating differentially private stochastic gradient de- scent using selective update and release,” Proceedings of the VLDB Endowment, vol. 17, no. 6, pp. 1200–1213, 2024

  63. [63]

    Mem- guard: Defending against black-box membership inference attacks via adversarial examples,

    J. Jia, A. Salem, M. Backes, Y . Zhang, and N. Z. Gong, “Mem- guard: Defending against black-box membership inference attacks via adversarial examples,” in Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 2019, pp. 259–274

  64. [64]

    Local and central differential privacy for robustness and privacy in federated learning.arXiv preprint arXiv:2009.03561, 2020

    M. Naseri, J. Hayes, and E. De Cristofaro, “Local and central dif- ferential privacy for robustness and privacy in federated learning,” arXiv preprint arXiv:2009.03561, 2020

  65. [65]

    Deep learning with differential privacy,

    M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318

  66. [66]

    arXiv preprint arXiv:2210.01504 , year=

    J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo, “Knowledge unlearning for mitigating privacy risks in language models,” arXiv preprint arXiv:2210.01504, 2022

  67. [67]

    Who’s harry potter? approximate unlearning in llms, 2023.URL https://arxiv

    R. Eldan and M. Russinovich, “Who’s harry potter? approximate unlearning in llms,” arXiv preprint arXiv:2310.02238, 2023

  68. [68]

    In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579,

    M. Pawelczyk, S. Neel, and H. Lakkaraju, “In-context unlearn- ing: Language models as few shot unlearners,” arXiv preprint arXiv:2310.07579, 2023

  69. [69]

    Detecting pretraining data from large lan- guage models,

    W. Shi, A. Ajith, M. Xia, Y . Huang, D. Liu, T. Blevins, D. Chen, and L. Zettlemoyer, “Detecting pretraining data from large lan- guage models,” in The Twelfth International Conference on Learn- ing Representations (ICLR), 2024

  70. [70]

    Extracting training data from large language models,

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX security symposium (USENIX Security 21), 2021, pp. 2633–2650

  71. [71]

    Recovering private text in federated learning of language models,

    S. Gupta, Y . Huang, Z. Zhong, T. Gao, K. Li, and D. Chen, “Recovering private text in federated learning of language models,” Advances in neural information processing systems, vol. 35, pp. 8130–8143, 2022

  72. [72]

    Membership inference attacks against language models via neighbourhood comparison

    J. Mattern, F. Mireshghallah, Z. Jin, B. Sch ¨olkopf, M. Sachan, and T. Berg-Kirkpatrick, “Membership inference attacks against language models via neighbourhood comparison,” arXiv preprint arXiv:2305.18462, 2023

  73. [73]

    arXiv preprint arXiv:2407.06460 , year=

    W. Shi, J. Lee, Y . Huang, S. Malladi, J. Zhao, A. Holtzman, D. Liu, L. Zettlemoyer, N. A. Smith, and C. Zhang, “Muse: Machine unlearning six-way evaluation for language models,” arXiv preprint arXiv:2407.06460, 2024

  74. [74]

    Do membership inference attacks work on large language models?

    M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettle- moyer, Y . Tsvetkov, Y . Choi, D. Evans, and H. Hajishirzi, “Do membership inference attacks work on large language models?” in Conference on Language Modeling (COLM), 2024

  75. [75]

    What neural networks memorize and why: Discovering the long tail via influence estimation,

    V . Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” Advances in Neural Information Processing Systems, vol. 33, pp. 2881–2891, 2020

  76. [76]

    The privacy onion effect: Memorization is relative,

    N. Carlini, M. Jagielski, C. Zhang, N. Papernot, A. Terzis, and F. Tramer, “The privacy onion effect: Memorization is relative,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, 2022, pp. 13 263–13 276

  77. [77]

    A maximum entropy approach to natural language processing,

    A. Berger, S. A. Della Pietra, and V . J. Della Pietra, “A maximum entropy approach to natural language processing,” Computational linguistics, vol. 22, no. 1, pp. 39–71, 1996

  78. [78]

    Truth serum: Poisoning machine learning models to reveal their secrets,

    F. Tram `er, R. Shokri, A. San Joaquin, H. Le, M. Jagielski, S. Hong, and N. Carlini, “Truth serum: Poisoning machine learning models to reveal their secrets,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2779–2792

  79. [79]

    Stolen memories: Leveraging model memorization for calibrated white-box membership inference,

    K. Leino and M. Fredrikson, “Stolen memories: Leveraging model memorization for calibrated white-box membership inference,” in 29th USENIX security symposium (USENIX Security 20), 2020, pp. 1605–1622

  80. [80]

    A baseline for detecting misclas- sified and out-of-distribution examples in neural networks,

    D. Hendrycks and K. Gimpel, “A baseline for detecting misclas- sified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations, 2017

Showing first 80 references.