pith. sign in

arxiv: 2604.15166 · v2 · pith:T6HXDXI5new · submitted 2026-04-16 · 💻 cs.CV · cs.AI· cs.LG

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

Pith reviewed 2026-05-21 00:04 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords dampdirectionsforgettingclassdepth-awarelayersremovalunlearning
0
0 comments X

The pith

DAMP removes forget-specific directions from neural networks using depth-aware projections to achieve better class unlearning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Machine unlearning seeks to erase specific knowledge from a model without retraining from scratch. The paper shows that many existing class unlearning methods leave forget-class information encoded in the internal layers. DAMP addresses this by computing prototypes for forget and retain classes at each layer and projecting out the forget directions. It scales the strength of these edits according to the layer depth to balance forgetting with retained utility. On several image classification benchmarks, this results in forgetting that is closer to that of a freshly retrained model.

Core claim

At each network stage DAMP calculates the forget direction as the residual of the forget-class prototype minus the retain-class prototype in the input space to the next learnable operator, then updates the weights by projecting away a depth-scaled portion of that direction to lower the downstream sensitivity to forget-class inputs.

What carries the argument

Depth-Aware Modulation by Projection (DAMP), a closed-form weight update that removes forget residuals with larger edits in deeper layers.

Load-bearing premise

Computing and removing residuals relative to retain-class prototypes at each layer removes the underlying forget-class information from the network rather than just suppressing it at the output.

What would settle it

Training a linear probe on the network's deep-layer features after DAMP and finding that it can classify forget classes well above chance would show that the information remains in the representations.

Figures

Figures reproduced from arXiv: 2604.15166 by Arman Hatami, Ilya E. Monosov, Romina Aalishah.

Figure 1
Figure 1. Figure 1: Comparison of [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of DAMP Selectivity with the baseline re￾trained model, Gradient Ascent Unlearning (GAU), knowledge￾distillation unlearning (KDU), Data Deletion Fine-Tuning (DD￾FT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearn￾ing (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget classes 3 (Cat) and 5 (Dog). Existing methods exhibit weak, near… view at source ↗
Figure 3
Figure 3. Figure 3: The first panel (top-left) shows the difference between the RDM of the original model that has not been subjected to unlearning [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of DAMP bias shift with the baseline, re￾trained model, Gradient Ascent Unlearning (GAU), knowledge￾distillation unlearning (KDU), Data Deletion Fine-Tuning (DD￾FT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearn￾ing (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget class 3 (Cat). The asterisk marks the forget class. Existing unl… view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the proposed DAMP. Starting from a pretrained network, we compute class prototypes in the edit space of each stage. For each forget class, its prototype is decomposed into a component explained by the retain span and a forget residual. The resulting residual directions are orthonormalized. In parallel, a layer coefficient αℓ is computed from depth and forget-retain separability. These two quant… view at source ↗
read the original abstract

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DAMP (Depth-Aware Modulation by Projection), a one-shot closed-form weight-surgery method for class unlearning. At each stage it computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection update; a parameter-free depth-aware scaling rule modulates edit magnitude by layer depth. The central claim is that across MNIST, CIFAR-10, CIFAR-100 and Tiny ImageNet, and across CNN and transformer backbones, DAMP more closely matches the retraining gold standard than prior methods by improving selective forgetting, preserving retain-class accuracy, and reducing residual forget-class structure in deep layers.

Significance. If the method genuinely erases forget-class information from internal representations rather than merely shifting decision boundaries at the output head, it would offer an efficient, optimization-free alternative to retraining or gradient-based unlearning while addressing the documented failure mode of residual structure in deep layers. The closed-form derivation and explicit handling of multi-class forgetting via low-rank subspaces are notable strengths.

major comments (3)
  1. [Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.
  2. [Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.
  3. [Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.
minor comments (2)
  1. [Abstract] Notation for 'forget-specific directions' and 'probe separability' is introduced without a formal definition or reference to prior usage.
  2. [Method] The extension to multi-class forgetting via low-rank subspace removal is mentioned but lacks a concrete algorithmic description or complexity statement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, clarifying the empirical support and methodological details while committing to revisions where the presentation can be strengthened.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.

    Authors: We acknowledge that the abstract's central claims would benefit from more immediate quantitative support. The full manuscript reports results across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet for both CNN and transformer backbones, with comparisons to prior methods and retraining from scratch on metrics including forget-class accuracy, retain-class accuracy, and measures of residual structure. To address the concern directly, we will revise the abstract and add a consolidated summary table (with error bars from repeated runs) that explicitly quantifies how DAMP aligns with the retraining baseline relative to other methods. revision: yes

  2. Referee: [Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.

    Authors: We agree that the depth-aware scaling rule requires an explicit derivation to substantiate the 'parameter-free' claim. In the revised method section we will insert the full equation for the per-layer multiplier together with a step-by-step derivation that shows how separability statistics (computed once from probe activations) are mapped to scaling factors. No fitted quantities or hidden hyperparameters are introduced; the rule uses only the observed separability values at each depth. revision: yes

  3. Referee: [Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.

    Authors: This is a substantive point about the strength of evidence for representational rather than merely head-level forgetting. Our experiments already demonstrate reduced residual forget-class structure in deep layers through representation-level metrics and closer alignment with retraining performance. We will add a direct linear-probing analysis of internal activations before and after DAMP to quantify the removal of linear forget directions. Regarding possible non-linear entanglement or redundant pathways, we note that the method explicitly targets linear subspaces; any remaining non-linear components would constitute a limitation that we will discuss explicitly in the revision. revision: partial

Circularity Check

0 steps flagged

DAMP derivation is self-contained closed-form without circular reduction

full rationale

The paper presents DAMP as a one-shot closed-form weight-surgery procedure that computes prototypes in per-operator input space, extracts residuals, and applies projection updates with a parameter-free depth-aware scaling rule derived from probe separability. No load-bearing step reduces by construction to a fitted quantity defined inside the paper, nor does any central claim rest on a self-citation chain or imported uniqueness theorem. Empirical comparisons to retraining across datasets and architectures provide external validation, so the derivation chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that forget-class information is linearly separable in the input space to each layer and that removing the residual direction will not destroy retain-class utility. No explicit free parameters are introduced; the scaling rule is stated to be parameter-free. No new physical entities are postulated.

axioms (2)
  • domain assumption Class prototypes computed in the input space of each learnable operator capture the relevant forget-specific directions.
    Invoked when the method extracts residuals relative to retain-class prototypes at each stage.
  • domain assumption Projection-based weight updates at each layer remove downstream sensitivity without requiring gradient optimization.
    Core premise of the one-shot closed-form surgery.
invented entities (1)
  • forget-specific directions no independent evidence
    purpose: Residual vectors in layer input space used as targets for projection removal.
    Newly defined construct extracted from class prototypes; no independent evidence outside the method itself is provided.

pith-pipeline@v0.9.0 · 5799 in / 1525 out tokens · 32211 ms · 2026-05-21T00:04:39.999953+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation

    cs.LG 2026-05 conditional novelty 7.0

    Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Understanding intermediate layers using linear classifier probes

    Guillaume Alain and Yoshua Bengio. Understanding inter- mediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. 6

  2. [2]

    Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

    Thomas Baumhauer, Pascal Sch ¨ottle, and Matthias Zep- pelzauer. Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

  3. [3]

    Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models

    Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 6

  4. [4]

    Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images

    Jacopo Bonato, Marco Cotogni, and Luigi Sabetta. Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 3, 6, 7

  5. [5]

    Machine unlearning

    Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3

  6. [6]

    Deep unlearn: Benchmarking machine unlearning for image classification

    Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 1, 2, 3, 7

  7. [7]

    O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024

    Yuchen Cai and Ding Cao. O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024. 6

  8. [8]

    Towards making systems for- get with machine unlearning

    Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. 1, 3

  9. [9]

    Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023

    Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023. 4

  10. [10]

    Forget unlearning: To- wards true data-deletion in machine learning

    Rishav Chourasia and Neil Shah. Forget unlearning: To- wards true data-deletion in machine learning. InInterna- tional conference on machine learning, pages 6028–6073. PMLR, 2023. 3, 6, 7

  11. [11]

    Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023

    Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023. 1, 3

  12. [12]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 6

  13. [13]

    SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

    Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,

  14. [14]

    Fast machine unlearning without retraining through selective synaptic dampening

    Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 1, 3, 4, 7, 6

  15. [15]

    Amne- siac machine learning

    Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pages 11516–11524, 2021. 3

  16. [16]

    Inexact unlearning needs more careful evaluations to avoid a false sense of privacy

    Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khal- ifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025. 2, 3, 4

  17. [17]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

  18. [18]

    Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

    Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, and Xiaolin Huang. Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

  19. [19]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4, 8

  20. [20]

    Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024

    Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, and Xiaolin Huang. Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024. 3

  21. [21]

    Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023

    Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023. 3, 4

  22. [22]

    Junyaup Kim and Simon S. Woo. Efficient two-stage model retraining for machine unlearning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4361–4369, 2022. 3

  23. [23]

    Deep unlearning: Fast and efficient gradient-free class forgetting

    Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research, 2024. 3, 4, 6, 7

  24. [24]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Uni- versity of Toronto, 2009. 6

  25. [25]

    Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023

    Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023. 1, 3

  26. [26]

    Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002

    Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 6

  27. [27]

    Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

    Junde Li and Swaroop Ghosh. Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

  28. [28]

    Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025

    Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, and V olkan Cevher. Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025. 3, 6, 7

  29. [29]

    Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022. 5, 6

  30. [30]

    Mass-Editing Memory in a Transformer

    Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a trans- former.arXiv preprint arXiv:2210.07229, 2022. 5, 6

  31. [31]

    Selective unlearning via repre- sentation erasure using domain adversarial training

    Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via repre- sentation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 3, 4

  32. [32]

    Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017

    Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 5

  33. [33]

    Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024

    Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, and Junjie Hu. Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024. arXiv preprint arXiv:2405.13967. 6

  34. [34]

    Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 4, 3

  35. [35]

    Machine Unlearning: A Comprehensive Survey

    Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Ma- chine unlearning: A comprehensive survey.arXiv preprint arXiv:2405.07406, 2024. 1, 2, 3

  36. [36]

    Tiny imagenet challenge.Technical report, 2017

    Jiayu Wu, Qixiang Zhang, and Guoxi Xu. Tiny imagenet challenge.Technical report, 2017. 6

  37. [37]

    How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

    Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2, 4

  38. [38]

    Decoupled distillation to erase: A general unlearning method for any class-centric tasks

    Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 20350–20359, 2025. 1, 3

  39. [39]

    balanced

    Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning.arXiv preprint arXiv:2406.08288, 2024. 4 Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Supplementary Material A. Additional Analyses and Supplementary Results A.1. Implementation...