Recognition: unknown
Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set
Pith reviewed 2026-05-09 18:34 UTC · model grok-4.3
The pith
Machine unlearning increases privacy risks for the data left in the model after removal requests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks. TC-UMIA is a population-level inference framework that leverages model predictions before and after unlearning to distinguish among the forget, retain, and unseen set and proves effective across a wide range of model architectures, datasets, and MU approaches.
What carries the argument
TC-UMIA, a tri-class membership inference attack that uses population-level signals from changes in model predictions before versus after unlearning to classify samples as forget-set, retain-set, or unseen.
Load-bearing premise
Changes in model predictions before and after unlearning provide a reliable population-level signal to separate retain-set members from unseen data.
What would settle it
A test in which an attacker armed only with pre- and post-unlearning predictions fails to classify retain-set samples versus unseen samples at accuracy significantly above random chance across multiple standard unlearning methods.
Figures
read the original abstract
Machine unlearning (MU) has emerged as a key mechanism for ensuring data privacy and regulatory compliance by enabling models to forget specific training samples. However, recent studies have shown that the removal of data can inadvertently introduce privacy leakages to the retain set,i.e., data that remain in the model after unlearning. In this paper, we extend the scope of privacy analysis in unlearning to the often-overlooked retained data. We introduce TC-UMIA, the first tri-class unlearning membership inference attack. TC-UMIA is a population-level inference framework that leverages model predictions before and after unlearning to distinguish among the forget, retain, and unseen set. Extensive experiments on five state-of-the-art unlearning algorithms and six real-world datasets demonstrate that: (i) unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks; (ii) TC-UMIA is effective across a wide range of model architectures, datasets, and MU approaches. Beyond launching the attack, we rigorously evaluate three defense mechanisms, namely label-only outputs, dropout, and differential privacy, to mitigate the privacy risks posed by TC- UMIA. Our results reveal a fundamental trade-off between privacy protection and model accuracy, with the dropout approach offering the most favorable balance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that machine unlearning can introduce additional privacy risks to the retain set, making it more susceptible to membership inference attacks. It proposes TC-UMIA, a tri-class unlearning membership inference attack that leverages model predictions before and after unlearning to distinguish between forget, retain, and unseen sets. Extensive experiments on five unlearning algorithms and six datasets show TC-UMIA's effectiveness with AUCs consistently above 0.75, outperforming binary baselines, and evaluate defenses including dropout, label-only outputs, and differential privacy, revealing accuracy-privacy trade-offs.
Significance. If the empirical results hold, this work is significant for the field of machine unlearning and privacy, as it identifies a new attack surface on retained data and provides a practical attack framework along with defense evaluations. The broad experimental validation across architectures, datasets, and methods adds to its impact. The paper credits the use of population-level signals from prediction shifts as a reliable distinguisher.
minor comments (2)
- [Abstract] Abstract: the claim of 'extensive experiments' and 'rigorous' defense evaluation is not supported by any mention of statistical tests, confidence intervals, or specific AUC values; adding one sentence summarizing key quantitative outcomes would improve the abstract's informativeness.
- [Experiments] Experimental sections: while ablation controls for architecture and hyperparameters are reported, the manuscript should explicitly state whether the tri-class formulation's superiority over binary MIA baselines was tested for statistical significance across all six datasets.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our work and the recommendation for minor revision. The referee's summary accurately reflects the core contributions regarding privacy risks to the retain set in machine unlearning and the proposed TC-UMIA framework. No specific major comments were included in the report.
Circularity Check
No significant circularity detected
full rationale
The paper is an empirical demonstration of a membership inference attack (TC-UMIA) that uses observable pre- and post-unlearning prediction shifts as a population-level signal. No mathematical derivations, fitted parameters, or equations are presented that could reduce claims to inputs by construction. Central results consist of reported attack AUCs, ablation studies, and defense evaluations across multiple datasets and unlearning methods; these are externally falsifiable experimental outcomes rather than self-referential definitions or self-citation chains. No load-bearing steps match any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
G. D. P. Regulation, “Gdpr,” https://gdpr-info.eu/, 2016
2016
-
[2]
T. C. C. P. A. (CCPA), “Ccpa,” https://oag.ca.gov/privacy/ccpa, 2018
2018
-
[3]
The privacy onion effect: Memorization is relative,
N. Carlini, M. Jagielski, C. Zhang, N. Papernot, A. Terzis, and F. Tram `er, “The privacy onion effect: Memorization is relative,” Advances in Neural Information Processing Systems, vol. 35, pp. 13 263–13 276, 2022
2022
-
[4]
Remember what you want to forget: Algorithms for machine unlearning,
A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what you want to forget: Algorithms for machine unlearning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 075–18 086, 2021
2021
-
[5]
Towards unbounded machine unlearning,
M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou, “Towards unbounded machine unlearning,” Advances in neural information processing systems, vol. 36, 2024
2024
-
[6]
Unrolling sgd: Understanding factors influencing machine unlearning,
A. Thudi, G. Deza, V . Chandrasekaran, and N. Papernot, “Unrolling sgd: Understanding factors influencing machine unlearning,” in 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 2022, pp. 303–319
2022
-
[7]
Certified edge unlearning for graph neural networks,
K. Wu, J. Shen, Y . Ning, T. Wang, and W. H. Wang, “Certified edge unlearning for graph neural networks,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 2606–2617
2023
-
[8]
Gif: A general graph unlearning strategy via influence function,
J. Wu, Y . Yang, Y . Qian, Y . Sui, X. Wang, and X. He, “Gif: A general graph unlearning strategy via influence function,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 651– 661
2023
-
[9]
Recommendation unlearning,
C. Chen, F. Sun, M. Zhang, and B. Ding, “Recommendation unlearning,” in Proceedings of the ACM web conference 2022, 2022, pp. 2768–2777
2022
-
[10]
Making rec- ommender systems forget: Learning and unlearning for erasable recommendation,
Y . Li, C. Chen, X. Zheng, J. Liu, and J. Wang, “Making rec- ommender systems forget: Learning and unlearning for erasable recommendation,” Knowledge-Based Systems, vol. 283, p. 111124, 2024
2024
-
[11]
Rethinking machine unlearning for large language models,
S. Liu, Y . Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, Y . Yao, C. Y . Liu, X. Xu, H. Li et al., “Rethinking machine unlearning for large language models,” Nature Machine Intelligence, pp. 1–14, 2025
2025
-
[12]
Large language model unlearning,
Y . Yao, X. Xu, and Y . Liu, “Large language model unlearning,” Advances in Neural Information Processing Systems, vol. 37, pp. 105 425–105 475, 2024
2024
-
[13]
When machine unlearning jeopardizes privacy,
M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y . Zhang, “When machine unlearning jeopardizes privacy,” in Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, 2021, pp. 896–911
2021
-
[14]
Forgetting outside the box: Scrubbing deep networks of information accessible from input- output observations,
A. Golatkar, A. Achille, and S. Soatto, “Forgetting outside the box: Scrubbing deep networks of information accessible from input- output observations,” in European Conference on Computer Vision. Springer, 2020, pp. 383–398
2020
-
[15]
Inexact unlearning needs more careful evaluations to avoid a false sense of privacy,
J. Hayes, I. Shumailov, E. Triantafillou, A. Khalifa, and N. Paper- not, “Inexact unlearning needs more careful evaluations to avoid a false sense of privacy,” arXiv preprint arXiv:2403.01218, 2024
-
[16]
Rectifying privacy and efficacy measurements in machine unlearning: A new inference attack perspective,
N. Naderloui, S. Yan, B. Wang, J. Fu, W. H. Wang, W. Liu, and Y . Hong, “Rectifying privacy and efficacy measurements in machine unlearning: A new inference attack perspective,” in 34th USENIX security symposium (USENIX Security 25), 2025
2025
-
[17]
Membership inference attacks against machine learning models,
R. Shokri, M. Stronati, C. Song, and V . Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE symposium on security and privacy (SP). IEEE, 2017, pp. 3–18
2017
-
[18]
A. Salem, Y . Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes, “Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models,” arXiv preprint arXiv:1806.01246, 2018
work page Pith review arXiv 2018
-
[19]
Learn to forget: Machine unlearning via neuron masking,
Z. Ma, Y . Liu, X. Liu, J. Liu, J. Ma, and K. Ren, “Learn to forget: Machine unlearning via neuron masking,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3194–3207, 2022
2022
-
[20]
Amnesiac machine learning,
L. Graves, V . Nagisetty, and V . Ganesh, “Amnesiac machine learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 516–11 524
2021
-
[21]
Senate bill 264: Interests of foreign countries,
T. F. Senate, “Senate bill 264: Interests of foreign countries,” https: //www.flsenate.gov/Session/Bill/2023/264, 2023
2023
-
[22]
Senate bill 17: Relating to the purchase or acquisition of an interest in real property by certain aliens or foreign entities,
T. T. Senate, “Senate bill 17: Relating to the purchase or acquisition of an interest in real property by certain aliens or foreign entities,” https://legiscan.com/TX/text/SB17/2025, 2025
2025
-
[23]
Membership inference attacks on machine learning: A survey,
H. Hu, Z. Salcic, L. Sun, G. Dobbie, P. S. Yu, and X. Zhang, “Membership inference attacks on machine learning: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 11s, pp. 1–37, 2022
2022
-
[24]
Membership inference attacks from first principles,
N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer, “Membership inference attacks from first principles,” in 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 1897– 1914
2022
-
[25]
Understanding membership in- ferences on well-generalized learning models,
Y . Long, V . Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. Gunter, and K. Chen, “Understanding membership in- ferences on well-generalized learning models,” arXiv preprint arXiv:1802.04889, 2018
-
[26]
Demystifying membership inference attacks in machine learning as a service,
S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “Demystifying membership inference attacks in machine learning as a service,” IEEE transactions on services computing, vol. 14, no. 6, pp. 2073– 2089, 2019
2073
-
[27]
Machine un- learning,
L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine un- learning,” in 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021, pp. 141–159
2021
-
[28]
T. T. Nguyen, T. T. Huynh, Z. Ren, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V . H. Nguyen, “A survey of machine unlearning,” arXiv preprint arXiv:2209.02299, 2022
-
[29]
Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy,
T. Shaik, X. Tao, H. Xie, L. Li, X. Zhu, and Q. Li, “Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy,” IEEE Transactions on Neural Networks and Learning Systems, 2024
2024
-
[30]
Certi- fied data removal from machine learning models,
C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten, “Certi- fied data removal from machine learning models,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 3832–3842
2020
-
[31]
Model sparsity can simplify machine unlearning,
J. Liu, P. Ram, Y . Yao, G. Liu, Y . Liu, P. Sharma, and S. Liu, “Model sparsity can simplify machine unlearning,” Advances in Neural Information Processing Systems, vol. 36, 2024
2024
-
[32]
Machine unlearn- ing: Linear filtration for logit-based classifiers,
T. Baumhauer, P. Sch ¨ottle, and M. Zeppelzauer, “Machine unlearn- ing: Linear filtration for logit-based classifiers,” Machine Learning, vol. 111, no. 9, pp. 3203–3226, 2022
2022
-
[33]
Membership inference attacks by exploiting loss trajectory,
Y . Liu, Z. Zhao, M. Backes, and Y . Zhang, “Membership inference attacks by exploiting loss trajectory,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2085–2098
2022
-
[34]
Enhanced membership inference attacks against machine learning models,
J. Ye, A. Maddi, S. K. Murakonda, V . Bindschaedler, and R. Shokri, “Enhanced membership inference attacks against machine learning models,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 3093–3106
2022
-
[35]
Canary in a coalmine: Better member- ship inference with ensembled adversarial queries,
Y . Wen, A. Bansal, H. Kazemi, E. Borgnia, M. Goldblum, J. Geip- ing, and T. Goldstein, “Canary in a coalmine: Better member- ship inference with ensembled adversarial queries,” arXiv preprint arXiv:2210.10750, 2022
-
[36]
Low-cost high-power mem- bership inference attacks,
S. Zarifzadeh, P. Liu, and R. Shokri, “Low-cost high-power mem- bership inference attacks,” in Forty-first International Conference on Machine Learning, 2024
2024
-
[37]
arXiv preprint arXiv:2201.06640 , year =
S. Goel, A. Prabhu, A. Sanyal, S.-N. Lim, P. Torr, and P. Ku- maraguru, “Towards adversarial evaluations for inexact machine unlearning,” arXiv preprint arXiv:2201.06640, 2022
-
[38]
Machine unlearning: Solu- tions and challenges,
J. Xu, Z. Wu, C. Wang, and X. Jia, “Machine unlearning: Solu- tions and challenges,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2024
2024
-
[39]
Datarobot inc
“Datarobot inc.” https://www.datarobot.com, 2017
2017
-
[40]
H2o.ai ai platform,
“H2o.ai ai platform,” https://h2o.ai, 2015
2015
-
[41]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hinton, “Learning multiple layers of features from tiny images,” 2009
2009
-
[42]
Cinic-10 is not imagenet or cifar-10.arXiv preprint arXiv:1810.03505, 2018
L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “Cinic-10 is not imagenet or cifar-10,” arXiv preprint arXiv:1810.03505, 2018
-
[43]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255
2009
-
[44]
Recursive deep models for semantic composi- tionality over a sentiment treebank,
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y . Ng, and C. Potts, “Recursive deep models for semantic composi- tionality over a sentiment treebank,” in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642
2013
-
[45]
Effective 20 newsgroups dataset cleaning,
K. Albishre, M. Albathan, and Y . Li, “Effective 20 newsgroups dataset cleaning,” in 2015 IEEE/WIC/ACM International Confer- ence on Web Intelligence and Intelligent Agent Technology (WI- IAT), vol. 3. IEEE, 2015, pp. 98–101
2015
-
[46]
Densely connected convolutional networks,
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708
2017
-
[47]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
2016
-
[48]
Pythia: A suite for analyzing large language models across training and scaling,
S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff et al., “Pythia: A suite for analyzing large language models across training and scaling,” in International Conference on Machine Learning. PMLR, 2023, pp. 2397–2430
2023
-
[49]
Eternal sunshine of the spotless net: Selective forgetting in deep networks,
A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312
2020
-
[50]
arXiv preprint arXiv:2008.05756 , year=
M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class clas- sification: an overview,” arXiv preprint arXiv:2008.05756, 2020
-
[51]
Privacy risk in machine learning: Analyzing the connection to overfitting,
S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in 2018 IEEE 31st computer security foundations symposium (CSF). IEEE, 2018, pp. 268–282
2018
-
[52]
Knowledge unlearning for mitigating privacy risks in language models,
J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo, “Knowledge unlearning for mitigating privacy risks in language models,” in 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Association for Compu- tational Linguistics (ACL), 2023, pp. 14 389–14 408
2023
-
[53]
Editing models with task arithmetic,
G. Ilharco, M. T. Ribeiro, M. Wortsman, L. Schmidt, H. Hajishirzi, and A. Farhadi, “Editing models with task arithmetic,” in The Eleventh International Conference on Learning Representations, 2022
2022
-
[54]
arXiv preprint arXiv:2404.05868 , year=
R. Zhang, L. Lin, Y . Bai, and S. Mei, “Negative preference optimization: From catastrophic collapse to effective unlearning,” arXiv preprint arXiv:2404.05868, 2024
-
[55]
Continual learning and private unlearning,
B. Liu, Q. Liu, and P. Stone, “Continual learning and private unlearning,” in Conference on Lifelong Learning Agents. PMLR, 2022, pp. 243–254
2022
-
[56]
Tofu: A task of fictitious unlearning for llms,
P. Maini, Z. Feng, A. Schwarzschild, Z. C. Lipton, and J. Z. Kolter, “Tofu: A task of fictitious unlearning for llms,” in First Conference on Language Modeling, 2024
2024
-
[57]
Label-only membership inference attacks,
C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” in International con- ference on machine learning. PMLR, 2021, pp. 1964–1974
2021
-
[58]
On the effectiveness of reg- ularization against membership inference attacks,
Y . Kaya, S. Hong, and T. Dumitras, “On the effectiveness of reg- ularization against membership inference attacks,” arXiv preprint arXiv:2006.05336, 2020
-
[59]
Defending privacy against more knowledgeable membership inference attackers,
Y . Yin, K. Chen, L. Shou, and G. Chen, “Defending privacy against more knowledgeable membership inference attackers,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2026–2036
2021
-
[60]
Improving neural networks with dropout,
N. Srivastava, “Improving neural networks with dropout,” Univer- sity of Toronto, vol. 182, no. 566, p. 7, 2013
2013
-
[61]
Differential privacy,
C. Dwork, “Differential privacy,” in International colloquium on automata, languages, and programming. Springer, 2006, pp. 1– 12
2006
-
[62]
Dpsur: Accelerating differentially private stochastic gradient de- scent using selective update and release,
J. Fu, Q. Ye, H. Hu, Z. Chen, L. Wang, K. Wang, and X. Ran, “Dpsur: Accelerating differentially private stochastic gradient de- scent using selective update and release,” Proceedings of the VLDB Endowment, vol. 17, no. 6, pp. 1200–1213, 2024
2024
-
[63]
Mem- guard: Defending against black-box membership inference attacks via adversarial examples,
J. Jia, A. Salem, M. Backes, Y . Zhang, and N. Z. Gong, “Mem- guard: Defending against black-box membership inference attacks via adversarial examples,” in Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 2019, pp. 259–274
2019
-
[64]
M. Naseri, J. Hayes, and E. De Cristofaro, “Local and central dif- ferential privacy for robustness and privacy in federated learning,” arXiv preprint arXiv:2009.03561, 2020
-
[65]
Deep learning with differential privacy,
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 308–318
2016
-
[66]
arXiv preprint arXiv:2210.01504 , year=
J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, and M. Seo, “Knowledge unlearning for mitigating privacy risks in language models,” arXiv preprint arXiv:2210.01504, 2022
-
[67]
Who’s harry potter? approximate unlearning in llms, 2023.URL https://arxiv
R. Eldan and M. Russinovich, “Who’s harry potter? approximate unlearning in llms,” arXiv preprint arXiv:2310.02238, 2023
-
[68]
In-context unlearning: Language models as few shot unlearners.arXiv preprint arXiv:2310.07579,
M. Pawelczyk, S. Neel, and H. Lakkaraju, “In-context unlearn- ing: Language models as few shot unlearners,” arXiv preprint arXiv:2310.07579, 2023
-
[69]
Detecting pretraining data from large lan- guage models,
W. Shi, A. Ajith, M. Xia, Y . Huang, D. Liu, T. Blevins, D. Chen, and L. Zettlemoyer, “Detecting pretraining data from large lan- guage models,” in The Twelfth International Conference on Learn- ing Representations (ICLR), 2024
2024
-
[70]
Extracting training data from large language models,
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models,” in 30th USENIX security symposium (USENIX Security 21), 2021, pp. 2633–2650
2021
-
[71]
Recovering private text in federated learning of language models,
S. Gupta, Y . Huang, Z. Zhong, T. Gao, K. Li, and D. Chen, “Recovering private text in federated learning of language models,” Advances in neural information processing systems, vol. 35, pp. 8130–8143, 2022
2022
-
[72]
Membership inference attacks against language models via neighbourhood comparison
J. Mattern, F. Mireshghallah, Z. Jin, B. Sch ¨olkopf, M. Sachan, and T. Berg-Kirkpatrick, “Membership inference attacks against language models via neighbourhood comparison,” arXiv preprint arXiv:2305.18462, 2023
-
[73]
arXiv preprint arXiv:2407.06460 , year=
W. Shi, J. Lee, Y . Huang, S. Malladi, J. Zhao, A. Holtzman, D. Liu, L. Zettlemoyer, N. A. Smith, and C. Zhang, “Muse: Machine unlearning six-way evaluation for language models,” arXiv preprint arXiv:2407.06460, 2024
-
[74]
Do membership inference attacks work on large language models?
M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettle- moyer, Y . Tsvetkov, Y . Choi, D. Evans, and H. Hajishirzi, “Do membership inference attacks work on large language models?” in Conference on Language Modeling (COLM), 2024
2024
-
[75]
What neural networks memorize and why: Discovering the long tail via influence estimation,
V . Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” Advances in Neural Information Processing Systems, vol. 33, pp. 2881–2891, 2020
2020
-
[76]
The privacy onion effect: Memorization is relative,
N. Carlini, M. Jagielski, C. Zhang, N. Papernot, A. Terzis, and F. Tramer, “The privacy onion effect: Memorization is relative,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35, 2022, pp. 13 263–13 276
2022
-
[77]
A maximum entropy approach to natural language processing,
A. Berger, S. A. Della Pietra, and V . J. Della Pietra, “A maximum entropy approach to natural language processing,” Computational linguistics, vol. 22, no. 1, pp. 39–71, 1996
1996
-
[78]
Truth serum: Poisoning machine learning models to reveal their secrets,
F. Tram `er, R. Shokri, A. San Joaquin, H. Le, M. Jagielski, S. Hong, and N. Carlini, “Truth serum: Poisoning machine learning models to reveal their secrets,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 2779–2792
2022
-
[79]
Stolen memories: Leveraging model memorization for calibrated white-box membership inference,
K. Leino and M. Fredrikson, “Stolen memories: Leveraging model memorization for calibrated white-box membership inference,” in 29th USENIX security symposium (USENIX Security 20), 2020, pp. 1605–1622
2020
-
[80]
A baseline for detecting misclas- sified and out-of-distribution examples in neural networks,
D. Hendrycks and K. Gimpel, “A baseline for detecting misclas- sified and out-of-distribution examples in neural networks,” in International Conference on Learning Representations, 2017
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.