CoUn: Empowering Machine Unlearning via Contrastive Learning

Hongliang Li; Mehdi Setayesh; Yasser H. Khalil

arxiv: 2509.16391 · v3 · pith:UU77BVZWnew · submitted 2025-09-19 · 💻 cs.LG · cs.AI· cs.CV

CoUn: Empowering Machine Unlearning via Contrastive Learning

Yasser H. Khalil , Mehdi Setayesh , Hongliang Li This is my paper

Pith reviewed 2026-05-21 22:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords machine unlearningcontrastive learningretain dataforget datarepresentation adjustmentsemantic similaritydata privacy

0 comments

The pith

CoUn improves machine unlearning by using contrastive learning on retain data to mimic a model retrained from scratch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CoUn as a machine unlearning approach that adjusts learned representations of data through contrastive learning and supervised learning applied only to the retain set. It draws from the observation that retraining a model solely on retain data causes it to classify forget data according to semantic similarities with the kept data. By emulating that behavior indirectly, CoUn aims to remove the influence of specific forget samples more thoroughly than prior methods based on label flips or weight changes. The result is intended to deliver stronger unlearning while preserving accuracy on the data that should remain.

Core claim

CoUn is a machine unlearning framework that emulates the classification behavior of a model retrained from scratch on retain data alone. It does so by leveraging semantic similarity between samples to indirectly adjust forget representations via contrastive learning, while using supervised learning to keep retain representations clustered together, with both steps performed exclusively on retain data.

What carries the argument

Contrastive learning module applied to retain data that indirectly adjusts forget representations according to semantic similarity.

If this is right

CoUn outperforms existing machine unlearning baselines on unlearning effectiveness across multiple datasets and model architectures.
Adding the contrastive learning module to prior unlearning methods increases their effectiveness at removing forget data influence.
The method maintains performance on retain data while achieving stronger removal of unwanted information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same representation-adjustment idea could be tested on class-level unlearning or on sequential forgetting tasks where multiple batches must be removed over time.
Because the method never touches the forget data during its adjustment step, it may reduce privacy risks compared with techniques that require access to the data being forgotten.
The core premise suggests that future unlearning work might benefit from focusing on how representations relate across the entire dataset rather than on direct parameter or label edits.

Load-bearing premise

A model retrained from scratch using only retain data will classify forget data according to their semantic similarity to the retain data.

What would settle it

An experiment that measures how a retrained-from-scratch model actually classifies forget samples and finds that its decisions do not align with semantic similarity to retain clusters.

Figures

Figures reproduced from arXiv: 2509.16391 by Hongliang Li, Mehdi Setayesh, Yasser H. Khalil.

**Figure 1.** Figure 1: Representation space of the Retrain model trained with ResNet-18 and CIFAR-10, excluding ‘truck’ class samples (left) and excluding 10% randomly selected samples (right). Small dots represent retain samples from different clusters, while larger dots indicate forget samples classified into clusters of retain samples that exhibit the highest semantic similarity to them. Thus, to closely match the performanc… view at source ↗

**Figure 2.** Figure 2: CoUn framework. Two augmented views are generated from a batch of retain image samples I. These views are processed by the feature extractor fθu , yielding retain representations (Z, Z ′ ). A CL module adjusts the representations, while supervised learning applied via the classifier head hθu enforces their cluster separation. Let (I,Y ) denote a batch of images and their corresponding labels sampled from D… view at source ↗

**Figure 3.** Figure 3: Representation space of FT and CoUn unlearned models (rows). Columns correspond to two forgetting scenarios: class-wise (‘truck’) and random (10% forget ratio). The Original model is trained on CIFAR-10 using ResNet-18. Small dots represent retain samples from different clusters, while larger dots indicate forget samples classified into the corresponding clusters. To achieve effective unlearning, CoUn ad… view at source ↗

**Figure 4.** Figure 4: Percentage improvement from integrating CoUn’s CL module into baseline methods. Incorporating our CL module consistently improves baseline unlearning performance compared to the original MU methods (without CL). The performance improvements further increase with a 50% forget ratio. Sequential Unlearning [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Performance comparison of MU methods on CIFAR100 with ResNet-18, where 10% (left) and 50% (right) of training data are randomly selected as forget data. The best performance of each method is reported. CoUn outperforms all baselines, and integrating its CL module empowers baseline performance. Although CL increases computational cost, the performance improvement persists even with the same computational … view at source ↗

**Figure 7.** Figure 7: Effect of scaling constant λ. Properly tuning λ in Equation (4) is essential for optimizing CoUn’s performance. 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 Av era g e G a p ( ) FT CoUn [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 10.** Figure 10: Effect of batch size n. Different n for CoUn, results in varying performance. Retrain batch size is set to 256. reducing computational cost. The impact of strong versus simple CL transformations on forget representations is further illustrated in Appendix F.4. Effect of Batch Size Batch size impacts the performance of CoUn [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Representation space of the Original model. The Original model is trained on the entire CIFAR-10 training data (i.e., union of retain and forget data) using ResNet-18. There are no misclassifications for either retain or forget samples since the model was trained on them. A single visualization of the Original’s model representation space is shown for both class-wise and random scenarios, as this model se… view at source ↗

**Figure 12.** Figure 12: Effect of CL transformation on forget data representations. t-SNE visualizations of forget data representations extracted from the penultimate layer of θu on CIFAR-10 with ResNet-18 under a 50% forget data ratio. Left: CoUn with a simple CL transformation (TCL = CHN). Right: CoUn with a strong CL transformation (TCL = CHJGN). The transformation for supervised learning is fixed at TCE = CHN. The CHJGN tran… view at source ↗

read the original abstract

Machine unlearning (MU) aims to remove the influence of specific "forget" data from a trained model while preserving its knowledge of the remaining "retain" data. Existing MU methods based on label manipulation or model weight perturbations often achieve limited unlearning effectiveness. To address this, we introduce CoUn, a novel MU framework inspired by the observation that a model retrained from scratch using only retain data classifies forget data based on their semantic similarity to the retain data. CoUn emulates this behavior by adjusting learned data representations through contrastive learning (CL) and supervised learning, applied exclusively to retain data. Specifically, CoUn (1) leverages semantic similarity between data samples to indirectly adjust forget representations using CL, and (2) maintains retain representations within their respective clusters through supervised learning. Extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines in unlearning effectiveness. Additionally, integrating our CL module into existing baselines empowers their unlearning effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoUn uses contrastive learning only on retain data to mimic retrained-model behavior on forget samples, but the key premise lacks direct checks and the evidence is still thin.

read the letter

The main point is that CoUn tries to improve machine unlearning by running contrastive learning exclusively on the retain set. The idea is that this will push forget representations into clusters that match what a model trained from scratch on retain data would do, based on semantic similarity. They combine it with supervised learning to hold the retain clusters steady and report that it beats standard baselines while also boosting those baselines when plugged in.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CoUn, a machine unlearning framework motivated by the claim that a model retrained from scratch on retain data alone will classify forget samples according to their semantic similarity with retain clusters. CoUn emulates this behavior by performing contrastive learning (to push forget representations toward retain-like clusters via semantic similarities among retain samples) and supervised learning (to preserve retain clusters), both applied exclusively to retain data. The paper reports that this yields superior unlearning effectiveness over state-of-the-art baselines across multiple datasets and architectures, and that the contrastive module can be plugged into existing methods to improve them.

Significance. If the motivating observation about retrained-model behavior is empirically substantiated and the reported gains prove robust under standard controls, CoUn would represent a useful addition to the MU literature by offering a representation-level approach that avoids direct forget-data access or aggressive weight perturbation. The modular integration claim, if verified, could have practical value for improving existing baselines.

major comments (2)

[Introduction / motivation] Introduction / motivation section: the central premise that scratch-retrained models classify forget data strictly according to semantic similarity with retain clusters is asserted without any direct supporting measurement (e.g., embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, or controlled counter-example tests on datasets with distinctive low-level statistics). Because this observation is invoked to justify performing contrastive learning on retain data alone, its lack of validation is load-bearing for the method's rationale.
[Experiments] Experimental section: the abstract states that 'extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines,' yet no quantitative tables, exact unlearning metrics (forget-set accuracy, MIA success rate, retain-set accuracy), error bars, number of runs, or statistical significance tests are referenced. Without these details the outperformance claim cannot be assessed.

minor comments (1)

[Abstract] Abstract: the phrases 'various datasets and model architectures' and 'state-of-the-art MU baselines' are left unspecified; naming the concrete datasets, architectures, and baselines would improve clarity and allow readers to judge scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: Introduction / motivation section: the central premise that scratch-retrained models classify forget data strictly according to semantic similarity with retain clusters is asserted without any direct supporting measurement (e.g., embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, or controlled counter-example tests on datasets with distinctive low-level statistics). Because this observation is invoked to justify performing contrastive learning on retain data alone, its lack of validation is load-bearing for the method's rationale.

Authors: We agree that direct empirical validation of the motivating observation would improve the paper. In the revision we will add embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, and controlled counter-example tests on datasets with distinctive low-level statistics to substantiate that retrained models classify forget data according to semantic similarity with retain clusters. revision: yes
Referee: Experimental section: the abstract states that 'extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines,' yet no quantitative tables, exact unlearning metrics (forget-set accuracy, MIA success rate, retain-set accuracy), error bars, number of runs, or statistical significance tests are referenced. Without these details the outperformance claim cannot be assessed.

Authors: Section 4 already presents quantitative tables with exact metrics (forget-set accuracy, MIA success rate, retain-set accuracy) for CoUn and baselines across datasets and architectures, based on multiple runs. To address the concern we will add explicit cross-references to these tables from the abstract, include error bars, state the number of runs, and report statistical significance tests in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper motivates CoUn from an external observation that scratch-retrained models classify forget samples by semantic similarity to retain data, then applies contrastive learning plus supervised learning solely on retain data to emulate that behavior. No equations, parameter fits, or derivations are described that define any quantity in terms of itself or rename a fitted input as a prediction. The central claim rests on an asserted empirical premise rather than any self-referential construction or self-citation chain, so the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one key domain assumption about retrained models and standard supervised/contrastive training procedures; no free parameters or new invented entities are introduced in the abstract.

axioms (1)

domain assumption A model retrained from scratch using only retain data classifies forget data based on their semantic similarity to the retain data.
This observation is stated as the inspiration for emulating the behavior through contrastive learning applied exclusively to retain data.

pith-pipeline@v0.9.0 · 5707 in / 1283 out tokens · 42724 ms · 2026-05-21T22:35:40.363539+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CoUn emulates this behavior by adjusting learned data representations through contrastive learning (CL) and supervised learning, applied exclusively to retain data... leverages semantic similarity between data samples to indirectly adjust forget representations using CL

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Interference-Aware Multi-Task Unlearning
cs.AI 2026-05 unverdicted novelty 7.0

Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013

Alessandro Mantelero. The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013. 10

work page 2013
[2]

AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

Alessandro Achille, Michael Kearns, Carson Klingenberg, and Stefano Soatto. AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

work page 2024
[3]

Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, and Anmin Fu. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

work page 2025
[4]

Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Xiaofeng Zhu, and Qing Li. Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024
[5]

Machine unlearning: Solutions and challenges

Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

work page 2024
[6]

Hidden poison: Machine unlearning enables camouflaged poisoning attacks

Jimmy Z Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, and Ayush Sekhari. Hidden poison: Machine unlearning enables camouflaged poisoning attacks. InNeurIPS ML Safety Workshop, 2022

work page 2022
[7]

Arcane: An efficient architecture for exact machine unlearning

Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. Arcane: An efficient architecture for exact machine unlearning. InIJCAI, volume 6, page 19, 2022

work page 2022
[8]

Not: Federated unlearning via weight negation

Yasser H Khalil, Leo Brunswic, Soufiane Lamghari, Xu Li, Mahdi Beitollahi, and Xi Chen. Not: Federated unlearning via weight negation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25759–25769, 2025

work page 2025
[9]

SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[10]

Model sparsity can simplify machine unlearning

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[11]

What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

Kairan Zhao, Meghdad Kurmanji, George-Octavian B˘arbulescu, Eleni Triantafillou, and Peter Triantafillou. What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

work page 2024
[12]

Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7766–7775, 2023

work page 2023
[13]

Privacy risks of securing machine learning models against adversarial examples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. InProceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 241–257, 2019

work page 2019
[14]

Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

work page 1957
[15]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

work page 2021
[16]

Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun. 2023

work page 2023
[17]

A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, and Xiaoning Liu. A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

work page 2024
[18]

A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 11

work page 2024
[19]

A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

work page 2020
[20]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

work page 2020
[21]

Similarity contrastive estimation for self-supervised soft contrastive learning

Julien Denize, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault, and Stéphane Canu. Similarity contrastive estimation for self-supervised soft contrastive learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2706–2716, 2023

work page 2023
[22]

CO2: Consistent contrast for unsupervised visual representation learning

Chen Wei, Huiyu Wang, Wei Shen, and Alan Yuille. CO2: Consistent contrast for unsupervised visual representation learning. InInternational Conference on Learning Representations, 2021

work page 2021
[23]

Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, and Stefanie Jegelka. Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

work page 2020
[24]

Unrolling SGD: Understanding factors influencing machine unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling SGD: Understanding factors influencing machine unlearning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), pages 303–319. IEEE, 2022

work page 2022
[25]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

work page 2020
[26]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019
[27]

Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, et al. Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

work page 2021
[28]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 12043–12051, 2024

work page 2024
[29]

What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

work page 2020
[30]

Improved Baselines with Momentum Contrastive Learning

Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning.arXiv preprint arXiv:2003.04297, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003
[31]

Barlow twins: Self- supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021

work page 2021
[32]

VICReg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regular- ization for self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022
[33]

Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong, et al. Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

work page arXiv 2024
[34]

A theoretical analysis of contrastive unsupervised representation learning

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khande- parkar. A theoretical analysis of contrastive unsupervised representation learning. InInterna- tional Conference on Machine Learning, pages 5628–5637. PMLR, 2019

work page 2019
[35]

Towards the generalization of contrastive self-supervised learning

Weiran Huang, Mingyang Yi, Xuyang Zhao, and Zihao Jiang. Towards the generalization of contrastive self-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023. 12

work page 2023
[36]

The CIFAR-10 dataset.online: http://www

Alex Krizhevsky, Vinod Nair, Geoffrey Hinton, et al. The CIFAR-10 dataset.online: http://www. cs. toronto. edu/kriz/cifar. html, 55(5):2, 2014

work page 2014
[37]

Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015
[38]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[39]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[40]

Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

Seung Hoon Lee, Seunghyun Lee, and Byung Cheol Song. Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

work page arXiv 2021
[41]

Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

work page 2020
[42]

Flat minima.Neural computation, 9(1):1–42, 1997

Sepp Hochreiter and Jürgen Schmidhuber. Flat minima.Neural computation, 9(1):1–42, 1997

work page 1997
[43]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. InProc. Advances in Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada, Dec. 2019

work page 2019
[44]

ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015. 13 We provide more details and results about our work in the appendices. Here are the content...

work page 2015
[45]

Forgetting ScenarioMethod L2 - (∆↓) Avg

The difference (∆) and the (best) average difference between each method and Retrain are reported. Forgetting ScenarioMethod L2 - (∆↓) Avg. Diff.↓Automobile Airplane Ship Class(‘truck’) Original 0.93 0.97 0.96 -Retrain 0.90 (0.00) 0.96 (0.00) 0.95 (0.00) 0.00FT 0.86 (0.04) 0.94 (0.02) 0.91 (0.04) 0.033CoUn 0.87 (0.03)0.96 (0.00)0.93 (0.02)0.017 Statistica...

work page 2023

[1] [1]

The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013

Alessandro Mantelero. The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013. 10

work page 2013

[2] [2]

AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

Alessandro Achille, Michael Kearns, Carson Klingenberg, and Stefano Soatto. AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

work page 2024

[3] [3]

Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, and Anmin Fu. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

work page 2025

[4] [4]

Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Xiaofeng Zhu, and Qing Li. Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024

[5] [5]

Machine unlearning: Solutions and challenges

Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

work page 2024

[6] [6]

Hidden poison: Machine unlearning enables camouflaged poisoning attacks

Jimmy Z Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, and Ayush Sekhari. Hidden poison: Machine unlearning enables camouflaged poisoning attacks. InNeurIPS ML Safety Workshop, 2022

work page 2022

[7] [7]

Arcane: An efficient architecture for exact machine unlearning

Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. Arcane: An efficient architecture for exact machine unlearning. InIJCAI, volume 6, page 19, 2022

work page 2022

[8] [8]

Not: Federated unlearning via weight negation

Yasser H Khalil, Leo Brunswic, Soufiane Lamghari, Xu Li, Mahdi Beitollahi, and Xi Chen. Not: Federated unlearning via weight negation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25759–25769, 2025

work page 2025

[9] [9]

SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[10] [10]

Model sparsity can simplify machine unlearning

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[11] [11]

What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

Kairan Zhao, Meghdad Kurmanji, George-Octavian B˘arbulescu, Eleni Triantafillou, and Peter Triantafillou. What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

work page 2024

[12] [12]

Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7766–7775, 2023

work page 2023

[13] [13]

Privacy risks of securing machine learning models against adversarial examples

Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. InProceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 241–257, 2019

work page 2019

[14] [14]

Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

work page 1957

[15] [15]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

work page 2021

[16] [16]

Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun. 2023

work page 2023

[17] [17]

A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, and Xiaoning Liu. A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

work page 2024

[18] [18]

A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 11

work page 2024

[19] [19]

A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

work page 2020

[20] [20]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

work page 2020

[21] [21]

Similarity contrastive estimation for self-supervised soft contrastive learning

Julien Denize, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault, and Stéphane Canu. Similarity contrastive estimation for self-supervised soft contrastive learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2706–2716, 2023

work page 2023

[22] [22]

CO2: Consistent contrast for unsupervised visual representation learning

Chen Wei, Huiyu Wang, Wei Shen, and Alan Yuille. CO2: Consistent contrast for unsupervised visual representation learning. InInternational Conference on Learning Representations, 2021

work page 2021

[23] [23]

Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, and Stefanie Jegelka. Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

work page 2020

[24] [24]

Unrolling SGD: Understanding factors influencing machine unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling SGD: Understanding factors influencing machine unlearning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), pages 303–319. IEEE, 2022

work page 2022

[25] [25]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

work page 2020

[26] [26]

The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

work page 2019

[27] [27]

Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, et al. Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

work page 2021

[28] [28]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 12043–12051, 2024

work page 2024

[29] [29]

What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

work page 2020

[30] [30]

Improved Baselines with Momentum Contrastive Learning

Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning.arXiv preprint arXiv:2003.04297, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003

[31] [31]

Barlow twins: Self- supervised learning via redundancy reduction

Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021

work page 2021

[32] [32]

VICReg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regular- ization for self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022

[33] [33]

Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong, et al. Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

work page arXiv 2024

[34] [34]

A theoretical analysis of contrastive unsupervised representation learning

Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khande- parkar. A theoretical analysis of contrastive unsupervised representation learning. InInterna- tional Conference on Machine Learning, pages 5628–5637. PMLR, 2019

work page 2019

[35] [35]

Towards the generalization of contrastive self-supervised learning

Weiran Huang, Mingyang Yi, Xuyang Zhao, and Zihao Jiang. Towards the generalization of contrastive self-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023. 12

work page 2023

[36] [36]

The CIFAR-10 dataset.online: http://www

Alex Krizhevsky, Vinod Nair, Geoffrey Hinton, et al. The CIFAR-10 dataset.online: http://www. cs. toronto. edu/kriz/cifar. html, 55(5):2, 2014

work page 2014

[37] [37]

Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015

[38] [38]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[39] [39]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[40] [40]

Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

Seung Hoon Lee, Seunghyun Lee, and Byung Cheol Song. Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

work page arXiv 2021

[41] [41]

Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

work page 2020

[42] [42]

Flat minima.Neural computation, 9(1):1–42, 1997

Sepp Hochreiter and Jürgen Schmidhuber. Flat minima.Neural computation, 9(1):1–42, 1997

work page 1997

[43] [43]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. InProc. Advances in Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada, Dec. 2019

work page 2019

[44] [44]

ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015. 13 We provide more details and results about our work in the appendices. Here are the content...

work page 2015

[45] [45]

Forgetting ScenarioMethod L2 - (∆↓) Avg

The difference (∆) and the (best) average difference between each method and Retrain are reported. Forgetting ScenarioMethod L2 - (∆↓) Avg. Diff.↓Automobile Airplane Ship Class(‘truck’) Original 0.93 0.97 0.96 -Retrain 0.90 (0.00) 0.96 (0.00) 0.95 (0.00) 0.00FT 0.86 (0.04) 0.94 (0.02) 0.91 (0.04) 0.033CoUn 0.87 (0.03)0.96 (0.00)0.93 (0.02)0.017 Statistica...

work page 2023