pith. sign in

arxiv: 2509.16391 · v3 · pith:UU77BVZWnew · submitted 2025-09-19 · 💻 cs.LG · cs.AI· cs.CV

CoUn: Empowering Machine Unlearning via Contrastive Learning

Pith reviewed 2026-05-21 22:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords machine unlearningcontrastive learningretain dataforget datarepresentation adjustmentsemantic similaritydata privacy
0
0 comments X

The pith

CoUn improves machine unlearning by using contrastive learning on retain data to mimic a model retrained from scratch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CoUn as a machine unlearning approach that adjusts learned representations of data through contrastive learning and supervised learning applied only to the retain set. It draws from the observation that retraining a model solely on retain data causes it to classify forget data according to semantic similarities with the kept data. By emulating that behavior indirectly, CoUn aims to remove the influence of specific forget samples more thoroughly than prior methods based on label flips or weight changes. The result is intended to deliver stronger unlearning while preserving accuracy on the data that should remain.

Core claim

CoUn is a machine unlearning framework that emulates the classification behavior of a model retrained from scratch on retain data alone. It does so by leveraging semantic similarity between samples to indirectly adjust forget representations via contrastive learning, while using supervised learning to keep retain representations clustered together, with both steps performed exclusively on retain data.

What carries the argument

Contrastive learning module applied to retain data that indirectly adjusts forget representations according to semantic similarity.

If this is right

  • CoUn outperforms existing machine unlearning baselines on unlearning effectiveness across multiple datasets and model architectures.
  • Adding the contrastive learning module to prior unlearning methods increases their effectiveness at removing forget data influence.
  • The method maintains performance on retain data while achieving stronger removal of unwanted information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same representation-adjustment idea could be tested on class-level unlearning or on sequential forgetting tasks where multiple batches must be removed over time.
  • Because the method never touches the forget data during its adjustment step, it may reduce privacy risks compared with techniques that require access to the data being forgotten.
  • The core premise suggests that future unlearning work might benefit from focusing on how representations relate across the entire dataset rather than on direct parameter or label edits.

Load-bearing premise

A model retrained from scratch using only retain data will classify forget data according to their semantic similarity to the retain data.

What would settle it

An experiment that measures how a retrained-from-scratch model actually classifies forget samples and finds that its decisions do not align with semantic similarity to retain clusters.

Figures

Figures reproduced from arXiv: 2509.16391 by Hongliang Li, Mehdi Setayesh, Yasser H. Khalil.

Figure 1
Figure 1. Figure 1: Representation space of the Retrain model trained with ResNet-18 and CIFAR-10, excluding ‘truck’ class samples (left) and excluding 10% randomly selected samples (right). Small dots represent retain samples from different clusters, while larger dots indicate forget sam￾ples classified into clusters of retain samples that exhibit the highest semantic similarity to them. Thus, to closely match the performanc… view at source ↗
Figure 2
Figure 2. Figure 2: CoUn framework. Two augmented views are generated from a batch of retain image samples I. These views are processed by the feature extractor fθu , yielding retain representations (Z, Z ′ ). A CL module adjusts the representations, while supervised learning applied via the classifier head hθu enforces their cluster separation. Let (I,Y ) denote a batch of images and their corresponding labels sampled from D… view at source ↗
Figure 3
Figure 3. Figure 3: Representation space of FT and CoUn un￾learned models (rows). Columns correspond to two for￾getting scenarios: class-wise (‘truck’) and random (10% forget ratio). The Original model is trained on CIFAR-10 using ResNet-18. Small dots represent retain samples from different clusters, while larger dots indicate forget samples classified into the corresponding clusters. To achieve effective unlearning, CoUn ad… view at source ↗
Figure 4
Figure 4. Figure 4: Percentage improvement from integrating CoUn’s CL module into baseline methods. Incorporating our CL module consistently improves baseline unlearning performance compared to the original MU methods (with￾out CL). The performance improvements further increase with a 50% forget ratio. Sequential Unlearning [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of MU methods on CIFAR￾100 with ResNet-18, where 10% (left) and 50% (right) of training data are randomly selected as forget data. The best performance of each method is reported. CoUn outperforms all baselines, and integrating its CL module empowers baseline performance. Although CL increases computational cost, the performance im￾provement persists even with the same computational … view at source ↗
Figure 7
Figure 7. Figure 7: Effect of scaling con￾stant λ. Properly tuning λ in Equa￾tion (4) is essential for optimizing CoUn’s performance. 0.0 0.2 0.4 0.6 0.8 1.0 1 2 3 4 5 6 7 Av era g e G a p ( ) FT CoUn [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of batch size n. Different n for CoUn, results in varying performance. Re￾train batch size is set to 256. reducing computational cost. The impact of strong ver￾sus simple CL transformations on forget representations is further illustrated in Appendix F.4. Effect of Batch Size Batch size impacts the perfor￾mance of CoUn [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Representation space of the Original model. The Original model is trained on the entire CIFAR-10 training data (i.e., union of retain and forget data) using ResNet-18. There are no misclassifications for either retain or forget samples since the model was trained on them. A single visualization of the Original’s model representation space is shown for both class-wise and random scenarios, as this model se… view at source ↗
Figure 12
Figure 12. Figure 12: Effect of CL transformation on forget data representations. t-SNE visualizations of forget data representations extracted from the penultimate layer of θu on CIFAR-10 with ResNet-18 under a 50% forget data ratio. Left: CoUn with a simple CL transformation (TCL = CHN). Right: CoUn with a strong CL transformation (TCL = CHJGN). The transformation for supervised learning is fixed at TCE = CHN. The CHJGN tran… view at source ↗
read the original abstract

Machine unlearning (MU) aims to remove the influence of specific "forget" data from a trained model while preserving its knowledge of the remaining "retain" data. Existing MU methods based on label manipulation or model weight perturbations often achieve limited unlearning effectiveness. To address this, we introduce CoUn, a novel MU framework inspired by the observation that a model retrained from scratch using only retain data classifies forget data based on their semantic similarity to the retain data. CoUn emulates this behavior by adjusting learned data representations through contrastive learning (CL) and supervised learning, applied exclusively to retain data. Specifically, CoUn (1) leverages semantic similarity between data samples to indirectly adjust forget representations using CL, and (2) maintains retain representations within their respective clusters through supervised learning. Extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines in unlearning effectiveness. Additionally, integrating our CL module into existing baselines empowers their unlearning effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces CoUn, a machine unlearning framework motivated by the claim that a model retrained from scratch on retain data alone will classify forget samples according to their semantic similarity with retain clusters. CoUn emulates this behavior by performing contrastive learning (to push forget representations toward retain-like clusters via semantic similarities among retain samples) and supervised learning (to preserve retain clusters), both applied exclusively to retain data. The paper reports that this yields superior unlearning effectiveness over state-of-the-art baselines across multiple datasets and architectures, and that the contrastive module can be plugged into existing methods to improve them.

Significance. If the motivating observation about retrained-model behavior is empirically substantiated and the reported gains prove robust under standard controls, CoUn would represent a useful addition to the MU literature by offering a representation-level approach that avoids direct forget-data access or aggressive weight perturbation. The modular integration claim, if verified, could have practical value for improving existing baselines.

major comments (2)
  1. [Introduction / motivation] Introduction / motivation section: the central premise that scratch-retrained models classify forget data strictly according to semantic similarity with retain clusters is asserted without any direct supporting measurement (e.g., embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, or controlled counter-example tests on datasets with distinctive low-level statistics). Because this observation is invoked to justify performing contrastive learning on retain data alone, its lack of validation is load-bearing for the method's rationale.
  2. [Experiments] Experimental section: the abstract states that 'extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines,' yet no quantitative tables, exact unlearning metrics (forget-set accuracy, MIA success rate, retain-set accuracy), error bars, number of runs, or statistical significance tests are referenced. Without these details the outperformance claim cannot be assessed.
minor comments (1)
  1. [Abstract] Abstract: the phrases 'various datasets and model architectures' and 'state-of-the-art MU baselines' are left unspecified; naming the concrete datasets, architectures, and baselines would improve clarity and allow readers to judge scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: Introduction / motivation section: the central premise that scratch-retrained models classify forget data strictly according to semantic similarity with retain clusters is asserted without any direct supporting measurement (e.g., embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, or controlled counter-example tests on datasets with distinctive low-level statistics). Because this observation is invoked to justify performing contrastive learning on retain data alone, its lack of validation is load-bearing for the method's rationale.

    Authors: We agree that direct empirical validation of the motivating observation would improve the paper. In the revision we will add embedding-space nearest-neighbor analysis, cosine-similarity scores between forget samples and retain class centroids, and controlled counter-example tests on datasets with distinctive low-level statistics to substantiate that retrained models classify forget data according to semantic similarity with retain clusters. revision: yes

  2. Referee: Experimental section: the abstract states that 'extensive experiments across various datasets and model architectures show that CoUn consistently outperforms state-of-the-art MU baselines,' yet no quantitative tables, exact unlearning metrics (forget-set accuracy, MIA success rate, retain-set accuracy), error bars, number of runs, or statistical significance tests are referenced. Without these details the outperformance claim cannot be assessed.

    Authors: Section 4 already presents quantitative tables with exact metrics (forget-set accuracy, MIA success rate, retain-set accuracy) for CoUn and baselines across datasets and architectures, based on multiple runs. To address the concern we will add explicit cross-references to these tables from the abstract, include error bars, state the number of runs, and report statistical significance tests in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper motivates CoUn from an external observation that scratch-retrained models classify forget samples by semantic similarity to retain data, then applies contrastive learning plus supervised learning solely on retain data to emulate that behavior. No equations, parameter fits, or derivations are described that define any quantity in terms of itself or rename a fitted input as a prediction. The central claim rests on an asserted empirical premise rather than any self-referential construction or self-citation chain, so the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one key domain assumption about retrained models and standard supervised/contrastive training procedures; no free parameters or new invented entities are introduced in the abstract.

axioms (1)
  • domain assumption A model retrained from scratch using only retain data classifies forget data based on their semantic similarity to the retain data.
    This observation is stated as the inspiration for emulating the behavior through contrastive learning applied exclusively to retain data.

pith-pipeline@v0.9.0 · 5707 in / 1283 out tokens · 42724 ms · 2026-05-21T22:35:40.363539+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    CoUn emulates this behavior by adjusting learned data representations through contrastive learning (CL) and supervised learning, applied exclusively to retain data... leverages semantic similarity between data samples to indirectly adjust forget representations using CL

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Interference-Aware Multi-Task Unlearning

    cs.AI 2026-05 unverdicted novelty 7.0

    Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013

    Alessandro Mantelero. The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’.Computer Law & Security Review, 29(3):229–235, 2013. 10

  2. [2]

    AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

    Alessandro Achille, Michael Kearns, Carson Klingenberg, and Stefano Soatto. AI model disgorgement: Methods and choices.Proceedings of the National Academy of Sciences, 121(18):e2307304121, 2024

  3. [3]

    Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

    Na Li, Chunyi Zhou, Yansong Gao, Hui Chen, Zhi Zhang, Boyu Kuang, and Anmin Fu. Machine unlearning: Taxonomy, metrics, applications, challenges, and prospects.IEEE Transactions on Neural Networks and Learning Systems, 2025

  4. [4]

    Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

    Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Xiaofeng Zhu, and Qing Li. Exploring the landscape of machine unlearning: A comprehensive survey and taxonomy.IEEE Transactions on Neural Networks and Learning Systems, 2024

  5. [5]

    Machine unlearning: Solutions and challenges

    Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

  6. [6]

    Hidden poison: Machine unlearning enables camouflaged poisoning attacks

    Jimmy Z Di, Jack Douglas, Jayadev Acharya, Gautam Kamath, and Ayush Sekhari. Hidden poison: Machine unlearning enables camouflaged poisoning attacks. InNeurIPS ML Safety Workshop, 2022

  7. [7]

    Arcane: An efficient architecture for exact machine unlearning

    Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. Arcane: An efficient architecture for exact machine unlearning. InIJCAI, volume 6, page 19, 2022

  8. [8]

    Not: Federated unlearning via weight negation

    Yasser H Khalil, Leo Brunswic, Soufiane Lamghari, Xu Li, Mahdi Beitollahi, and Xi Chen. Not: Federated unlearning via weight negation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25759–25769, 2025

  9. [9]

    SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

    Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, and Sijia Liu. SalUn: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. InThe Twelfth International Conference on Learning Representations, 2024

  10. [10]

    Model sparsity can simplify machine unlearning

    Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  11. [11]

    What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

    Kairan Zhao, Meghdad Kurmanji, George-Octavian B˘arbulescu, Eleni Triantafillou, and Peter Triantafillou. What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024

  12. [12]

    Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary

    Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7766–7775, 2023

  13. [13]

    Privacy risks of securing machine learning models against adversarial examples

    Liwei Song, Reza Shokri, and Prateek Mittal. Privacy risks of securing machine learning models against adversarial examples. InProceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 241–257, 2019

  14. [14]

    Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

    Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards un- bounded machine unlearning.Advances in neural information processing systems, 36:1957– 1987, 2023

  15. [15]

    Amnesiac machine learning

    Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

  16. [16]

    Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun

    Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teach- ing induce forgetting? unlearning in deep networks using an incompetent teacher.Proceedings of the AAAI Conference on Artificial Intelligence, 37(6):7210–7217, Jun. 2023

  17. [17]

    A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

    Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, and Xiaoning Liu. A survey on federated unlearning: Challenges, methods, and future directions.ACM Computing Surveys, 57(1):1–38, 2024

  18. [18]

    A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, and Dacheng Tao. A survey on self-supervised learning: Algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 11

  19. [19]

    A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

    Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning.Technologies, 9(1):2, 2020

  20. [20]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

  21. [21]

    Similarity contrastive estimation for self-supervised soft contrastive learning

    Julien Denize, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault, and Stéphane Canu. Similarity contrastive estimation for self-supervised soft contrastive learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2706–2716, 2023

  22. [22]

    CO2: Consistent contrast for unsupervised visual representation learning

    Chen Wei, Huiyu Wang, Wei Shen, and Alan Yuille. CO2: Consistent contrast for unsupervised visual representation learning. InInternational Conference on Learning Representations, 2021

  23. [23]

    Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

    Ching-Yao Chuang, Joshua Robinson, Yen-Chen Lin, Antonio Torralba, and Stefanie Jegelka. Debiased contrastive learning.Advances in neural information processing systems, 33:8765– 8775, 2020

  24. [24]

    Unrolling SGD: Understanding factors influencing machine unlearning

    Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling SGD: Understanding factors influencing machine unlearning. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P), pages 303–319. IEEE, 2022

  25. [25]

    Eternal sunshine of the spotless net: Selective forgetting in deep networks

    Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

  26. [26]

    The lottery ticket hypothesis: Finding sparse, trainable neural networks

    Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. InInternational Conference on Learning Representations, 2019

  27. [27]

    Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

    Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, et al. Sanity checks for lottery tickets: Does your winning ticket really win the jackpot?Advances in Neural Information Processing Systems, 34:12749–12760, 2021

  28. [28]

    Fast machine unlearning without retraining through selective synaptic dampening

    Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 12043–12051, 2024

  29. [29]

    What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

    Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning?Advances in neural information processing systems, 33:6827–6839, 2020

  30. [30]

    Improved Baselines with Momentum Contrastive Learning

    Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning.arXiv preprint arXiv:2003.04297, 2020

  31. [31]

    Barlow twins: Self- supervised learning via redundancy reduction

    Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. Barlow twins: Self- supervised learning via redundancy reduction. InInternational conference on machine learning, pages 12310–12320. PMLR, 2021

  32. [32]

    VICReg: Variance-invariance-covariance regular- ization for self-supervised learning

    Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regular- ization for self-supervised learning. InInternational Conference on Learning Representations, 2022

  33. [33]

    Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

    Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong, et al. Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

  34. [34]

    A theoretical analysis of contrastive unsupervised representation learning

    Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khande- parkar. A theoretical analysis of contrastive unsupervised representation learning. InInterna- tional Conference on Machine Learning, pages 5628–5637. PMLR, 2019

  35. [35]

    Towards the generalization of contrastive self-supervised learning

    Weiran Huang, Mingyang Yi, Xuyang Zhao, and Zihao Jiang. Towards the generalization of contrastive self-supervised learning. InThe Eleventh International Conference on Learning Representations, 2023. 12

  36. [36]

    The CIFAR-10 dataset.online: http://www

    Alex Krizhevsky, Vinod Nair, Geoffrey Hinton, et al. The CIFAR-10 dataset.online: http://www. cs. toronto. edu/kriz/cifar. html, 55(5):2, 2014

  37. [37]

    Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

    Ya Le and Xuan Yang. Tiny ImageNet visual recognition challenge.CS 231N, 7(7):3, 2015

  38. [38]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  39. [39]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

  40. [40]

    Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

    Seung Hoon Lee, Seunghyun Lee, and Byung Cheol Song. Vision transformer for small-size datasets.arXiv preprint arXiv:2112.13492, 2021

  41. [41]

    Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning.Advances in neural information processing systems, 33:18661–18673, 2020

  42. [42]

    Flat minima.Neural computation, 9(1):1–42, 1997

    Sepp Hochreiter and Jürgen Schmidhuber. Flat minima.Neural computation, 9(1):1–42, 1997

  43. [43]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. InProc. Advances in Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada, Dec. 2019

  44. [44]

    ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. ImageNet large scale visual recognition challenge.International journal of computer vision, 115:211–252, 2015. 13 We provide more details and results about our work in the appendices. Here are the content...

  45. [45]

    Forgetting ScenarioMethod L2 - (∆↓) Avg

    The difference (∆) and the (best) average difference between each method and Retrain are reported. Forgetting ScenarioMethod L2 - (∆↓) Avg. Diff.↓Automobile Airplane Ship Class(‘truck’) Original 0.93 0.97 0.96 -Retrain 0.90 (0.00) 0.96 (0.00) 0.95 (0.00) 0.00FT 0.86 (0.04) 0.94 (0.02) 0.91 (0.04) 0.033CoUn 0.87 (0.03)0.96 (0.00)0.93 (0.02)0.017 Statistica...