Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

Arman Hatami; Ilya E. Monosov; Romina Aalishah

arxiv: 2604.15166 · v2 · pith:T6HXDXI5new · submitted 2026-04-16 · 💻 cs.CV · cs.AI· cs.LG

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

Arman Hatami , Romina Aalishah , Ilya E. Monosov This is my paper

Pith reviewed 2026-05-21 00:04 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords dampdirectionsforgettingclassdepth-awarelayersremovalunlearning

0 comments

The pith

DAMP removes forget-specific directions from neural networks using depth-aware projections to achieve better class unlearning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Machine unlearning seeks to erase specific knowledge from a model without retraining from scratch. The paper shows that many existing class unlearning methods leave forget-class information encoded in the internal layers. DAMP addresses this by computing prototypes for forget and retain classes at each layer and projecting out the forget directions. It scales the strength of these edits according to the layer depth to balance forgetting with retained utility. On several image classification benchmarks, this results in forgetting that is closer to that of a freshly retrained model.

Core claim

At each network stage DAMP calculates the forget direction as the residual of the forget-class prototype minus the retain-class prototype in the input space to the next learnable operator, then updates the weights by projecting away a depth-scaled portion of that direction to lower the downstream sensitivity to forget-class inputs.

What carries the argument

Depth-Aware Modulation by Projection (DAMP), a closed-form weight update that removes forget residuals with larger edits in deeper layers.

Load-bearing premise

Computing and removing residuals relative to retain-class prototypes at each layer removes the underlying forget-class information from the network rather than just suppressing it at the output.

What would settle it

Training a linear probe on the network's deep-layer features after DAMP and finding that it can classify forget classes well above chance would show that the information remains in the representations.

Figures

Figures reproduced from arXiv: 2604.15166 by Arman Hatami, Ilya E. Monosov, Romina Aalishah.

**Figure 2.** Figure 2: Comparison of DAMP Selectivity with the baseline retrained model, Gradient Ascent Unlearning (GAU), knowledgedistillation unlearning (KDU), Data Deletion Fine-Tuning (DDFT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearning (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget classes 3 (Cat) and 5 (Dog). Existing methods exhibit weak, near… view at source ↗

**Figure 3.** Figure 3: The first panel (top-left) shows the difference between the RDM of the original model that has not been subjected to unlearning [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of DAMP bias shift with the baseline, retrained model, Gradient Ascent Unlearning (GAU), knowledgedistillation unlearning (KDU), Data Deletion Fine-Tuning (DDFT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearning (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget class 3 (Cat). The asterisk marks the forget class. Existing unl… view at source ↗

**Figure 5.** Figure 5: Overview of the proposed DAMP. Starting from a pretrained network, we compute class prototypes in the edit space of each stage. For each forget class, its prototype is decomposed into a component explained by the retain span and a forget residual. The resulting residual directions are orthonormalized. In parallel, a layer coefficient αℓ is computed from depth and forget-retain separability. These two quant… view at source ↗

read the original abstract

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DAMP gives a clean one-shot projection method for removing forget-class directions layer by layer, but the evidence that it truly erases internal structure rather than just shifting outputs is still thin.

read the letter

The main thing to know is that this paper puts forward DAMP, a closed-form weight edit that computes retain prototypes at each operator's input space, pulls out forget residuals, and projects them away with a depth-aware scale that grows larger in deeper layers. It claims this gets closer to full retraining than some prior unlearning baselines across MNIST, CIFAR, and Tiny ImageNet on both conv nets and transformers, while cutting residual forget structure in deep representations and keeping retain performance higher.

Referee Report

3 major / 2 minor

Summary. The paper introduces DAMP (Depth-Aware Modulation by Projection), a one-shot closed-form weight-surgery method for class unlearning. At each stage it computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection update; a parameter-free depth-aware scaling rule modulates edit magnitude by layer depth. The central claim is that across MNIST, CIFAR-10, CIFAR-100 and Tiny ImageNet, and across CNN and transformer backbones, DAMP more closely matches the retraining gold standard than prior methods by improving selective forgetting, preserving retain-class accuracy, and reducing residual forget-class structure in deep layers.

Significance. If the method genuinely erases forget-class information from internal representations rather than merely shifting decision boundaries at the output head, it would offer an efficient, optimization-free alternative to retraining or gradient-based unlearning while addressing the documented failure mode of residual structure in deep layers. The closed-form derivation and explicit handling of multi-class forgetting via low-rank subspaces are notable strengths.

major comments (3)

[Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.
[Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.
[Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.

minor comments (2)

[Abstract] Notation for 'forget-specific directions' and 'probe separability' is introduced without a formal definition or reference to prior usage.
[Method] The extension to multi-class forgetting via low-rank subspace removal is mentioned but lacks a concrete algorithmic description or complexity statement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, clarifying the empirical support and methodological details while committing to revisions where the presentation can be strengthened.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.

Authors: We acknowledge that the abstract's central claims would benefit from more immediate quantitative support. The full manuscript reports results across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet for both CNN and transformer backbones, with comparisons to prior methods and retraining from scratch on metrics including forget-class accuracy, retain-class accuracy, and measures of residual structure. To address the concern directly, we will revise the abstract and add a consolidated summary table (with error bars from repeated runs) that explicitly quantifies how DAMP aligns with the retraining baseline relative to other methods. revision: yes
Referee: [Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.

Authors: We agree that the depth-aware scaling rule requires an explicit derivation to substantiate the 'parameter-free' claim. In the revised method section we will insert the full equation for the per-layer multiplier together with a step-by-step derivation that shows how separability statistics (computed once from probe activations) are mapped to scaling factors. No fitted quantities or hidden hyperparameters are introduced; the rule uses only the observed separability values at each depth. revision: yes
Referee: [Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.

Authors: This is a substantive point about the strength of evidence for representational rather than merely head-level forgetting. Our experiments already demonstrate reduced residual forget-class structure in deep layers through representation-level metrics and closer alignment with retraining performance. We will add a direct linear-probing analysis of internal activations before and after DAMP to quantify the removal of linear forget directions. Regarding possible non-linear entanglement or redundant pathways, we note that the method explicitly targets linear subspaces; any remaining non-linear components would constitute a limitation that we will discuss explicitly in the revision. revision: partial

Circularity Check

0 steps flagged

DAMP derivation is self-contained closed-form without circular reduction

full rationale

The paper presents DAMP as a one-shot closed-form weight-surgery procedure that computes prototypes in per-operator input space, extracts residuals, and applies projection updates with a parameter-free depth-aware scaling rule derived from probe separability. No load-bearing step reduces by construction to a fitted quantity defined inside the paper, nor does any central claim rest on a self-citation chain or imported uniqueness theorem. Empirical comparisons to retraining across datasets and architectures provide external validation, so the derivation chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that forget-class information is linearly separable in the input space to each layer and that removing the residual direction will not destroy retain-class utility. No explicit free parameters are introduced; the scaling rule is stated to be parameter-free. No new physical entities are postulated.

axioms (2)

domain assumption Class prototypes computed in the input space of each learnable operator capture the relevant forget-specific directions.
Invoked when the method extracts residuals relative to retain-class prototypes at each stage.
domain assumption Projection-based weight updates at each layer remove downstream sensitivity without requiring gradient optimization.
Core premise of the one-shot closed-form surgery.

invented entities (1)

forget-specific directions no independent evidence
purpose: Residual vectors in layer input space used as targets for projection removal.
Newly defined construct extracted from class prototypes; no independent evidence outside the method itself is provided.

pith-pipeline@v0.9.0 · 5799 in / 1525 out tokens · 32211 ms · 2026-05-21T00:04:39.999953+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
cs.LG 2026-05 conditional novelty 7.0

Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding inter- mediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

Thomas Baumhauer, Pascal Sch ¨ottle, and Matthias Zep- pelzauer. Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

work page
[3]

Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models

Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 6

work page 2025
[4]

Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images

Jacopo Bonato, Marco Cotogni, and Luigi Sabetta. Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 3, 6, 7

work page 2024
[5]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3

work page 2021
[6]

Deep unlearn: Benchmarking machine unlearning for image classification

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 1, 2, 3, 7

work page 2025
[7]

O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024

Yuchen Cai and Ding Cao. O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024. 6

work page arXiv 2024
[8]

Towards making systems for- get with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. 1, 3

work page 2015
[9]

Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023. 4

work page arXiv 2023
[10]

Forget unlearning: To- wards true data-deletion in machine learning

Rishav Chourasia and Neil Shah. Forget unlearning: To- wards true data-deletion in machine learning. InInterna- tional conference on machine learning, pages 6028–6073. PMLR, 2023. 3, 6, 7

work page 2023
[11]

Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023. 1, 3

work page 2023
[12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 6

work page internal anchor Pith review Pith/arXiv arXiv 2010
[13]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 1, 3, 4, 7, 6

work page 2024
[15]

Amne- siac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pages 11516–11524, 2021. 3

work page 2021
[16]

Inexact unlearning needs more careful evaluations to avoid a false sense of privacy

Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khal- ifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025. 2, 3, 4

work page 2025
[17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

work page 2016
[18]

Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, and Xiaolin Huang. Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

work page
[19]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4, 8

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024

Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, and Xiaolin Huang. Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024. 3

work page 2024
[21]

Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023. 3, 4

work page 2023
[22]

Junyaup Kim and Simon S. Woo. Efficient two-stage model retraining for machine unlearning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4361–4369, 2022. 3

work page 2022
[23]

Deep unlearning: Fast and efficient gradient-free class forgetting

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research, 2024. 3, 4, 6, 7

work page 2024
[24]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Uni- versity of Toronto, 2009. 6

work page 2009
[25]

Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023. 1, 3

work page 1957
[26]

Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 6

work page 2002
[27]

Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

Junde Li and Swaroop Ghosh. Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

work page arXiv
[28]

Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025

Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, and V olkan Cevher. Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025. 3, 6, 7

work page arXiv 2025
[29]

Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022. 5, 6

work page 2022
[30]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a trans- former.arXiv preprint arXiv:2210.07229, 2022. 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2022
[31]

Selective unlearning via repre- sentation erasure using domain adversarial training

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via repre- sentation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 3, 4

work page 2025
[32]

Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 5

work page 2017
[33]

Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024

Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, and Junjie Hu. Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024. arXiv preprint arXiv:2405.13967. 6

work page arXiv 2024
[34]

Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 4, 3

work page 2008
[35]

Machine Unlearning: A Comprehensive Survey

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Ma- chine unlearning: A comprehensive survey.arXiv preprint arXiv:2405.07406, 2024. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Tiny imagenet challenge.Technical report, 2017

Jiayu Wu, Qixiang Zhang, and Guoxi Xu. Tiny imagenet challenge.Technical report, 2017. 6

work page 2017
[37]

How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2, 4

work page 2014
[38]

Decoupled distillation to erase: A general unlearning method for any class-centric tasks

Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 20350–20359, 2025. 1, 3

work page 2025
[39]

balanced

Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning.arXiv preprint arXiv:2406.08288, 2024. 4 Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Supplementary Material A. Additional Analyses and Supplementary Results A.1. Implementation...

work page arXiv 2024

[1] [1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding inter- mediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. 6

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

Thomas Baumhauer, Pascal Sch ¨ottle, and Matthias Zep- pelzauer. Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

work page

[3] [3]

Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models

Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 6

work page 2025

[4] [4]

Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images

Jacopo Bonato, Marco Cotogni, and Luigi Sabetta. Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 3, 6, 7

work page 2024

[5] [5]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3

work page 2021

[6] [6]

Deep unlearn: Benchmarking machine unlearning for image classification

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 1, 2, 3, 7

work page 2025

[7] [7]

O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024

Yuchen Cai and Ding Cao. O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024. 6

work page arXiv 2024

[8] [8]

Towards making systems for- get with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. 1, 3

work page 2015

[9] [9]

Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023. 4

work page arXiv 2023

[10] [10]

Forget unlearning: To- wards true data-deletion in machine learning

Rishav Chourasia and Neil Shah. Forget unlearning: To- wards true data-deletion in machine learning. InInterna- tional conference on machine learning, pages 6028–6073. PMLR, 2023. 3, 6, 7

work page 2023

[11] [11]

Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023. 1, 3

work page 2023

[12] [12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 6

work page internal anchor Pith review Pith/arXiv arXiv 2010

[13] [13]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 1, 3, 4, 7, 6

work page 2024

[15] [15]

Amne- siac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pages 11516–11524, 2021. 3

work page 2021

[16] [16]

Inexact unlearning needs more careful evaluations to avoid a false sense of privacy

Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khal- ifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025. 2, 3, 4

work page 2025

[17] [17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

work page 2016

[18] [18]

Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, and Xiaolin Huang. Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

work page

[19] [19]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4, 8

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024

Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, and Xiaolin Huang. Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024. 3

work page 2024

[21] [21]

Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023. 3, 4

work page 2023

[22] [22]

Junyaup Kim and Simon S. Woo. Efficient two-stage model retraining for machine unlearning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4361–4369, 2022. 3

work page 2022

[23] [23]

Deep unlearning: Fast and efficient gradient-free class forgetting

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research, 2024. 3, 4, 6, 7

work page 2024

[24] [24]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Uni- versity of Toronto, 2009. 6

work page 2009

[25] [25]

Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023. 1, 3

work page 1957

[26] [26]

Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 6

work page 2002

[27] [27]

Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

Junde Li and Swaroop Ghosh. Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

work page arXiv

[28] [28]

Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025

Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, and V olkan Cevher. Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025. 3, 6, 7

work page arXiv 2025

[29] [29]

Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022. 5, 6

work page 2022

[30] [30]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a trans- former.arXiv preprint arXiv:2210.07229, 2022. 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2022

[31] [31]

Selective unlearning via repre- sentation erasure using domain adversarial training

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via repre- sentation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 3, 4

work page 2025

[32] [32]

Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 5

work page 2017

[33] [33]

Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024

Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, and Junjie Hu. Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024. arXiv preprint arXiv:2405.13967. 6

work page arXiv 2024

[34] [34]

Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 4, 3

work page 2008

[35] [35]

Machine Unlearning: A Comprehensive Survey

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Ma- chine unlearning: A comprehensive survey.arXiv preprint arXiv:2405.07406, 2024. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Tiny imagenet challenge.Technical report, 2017

Jiayu Wu, Qixiang Zhang, and Guoxi Xu. Tiny imagenet challenge.Technical report, 2017. 6

work page 2017

[37] [37]

How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2, 4

work page 2014

[38] [38]

Decoupled distillation to erase: A general unlearning method for any class-centric tasks

Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 20350–20359, 2025. 1, 3

work page 2025

[39] [39]

balanced

Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning.arXiv preprint arXiv:2406.08288, 2024. 4 Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Supplementary Material A. Additional Analyses and Supplementary Results A.1. Implementation...

work page arXiv 2024