Class Unlearning via Depth-Aware Removal of Forget-Specific Directions
Pith reviewed 2026-05-21 00:04 UTC · model grok-4.3
The pith
DAMP removes forget-specific directions from neural networks using depth-aware projections to achieve better class unlearning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
At each network stage DAMP calculates the forget direction as the residual of the forget-class prototype minus the retain-class prototype in the input space to the next learnable operator, then updates the weights by projecting away a depth-scaled portion of that direction to lower the downstream sensitivity to forget-class inputs.
What carries the argument
Depth-Aware Modulation by Projection (DAMP), a closed-form weight update that removes forget residuals with larger edits in deeper layers.
Load-bearing premise
Computing and removing residuals relative to retain-class prototypes at each layer removes the underlying forget-class information from the network rather than just suppressing it at the output.
What would settle it
Training a linear probe on the network's deep-layer features after DAMP and finding that it can classify forget classes well above chance would show that the information remains in the representations.
Figures
read the original abstract
Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DAMP (Depth-Aware Modulation by Projection), a one-shot closed-form weight-surgery method for class unlearning. At each stage it computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection update; a parameter-free depth-aware scaling rule modulates edit magnitude by layer depth. The central claim is that across MNIST, CIFAR-10, CIFAR-100 and Tiny ImageNet, and across CNN and transformer backbones, DAMP more closely matches the retraining gold standard than prior methods by improving selective forgetting, preserving retain-class accuracy, and reducing residual forget-class structure in deep layers.
Significance. If the method genuinely erases forget-class information from internal representations rather than merely shifting decision boundaries at the output head, it would offer an efficient, optimization-free alternative to retraining or gradient-based unlearning while addressing the documented failure mode of residual structure in deep layers. The closed-form derivation and explicit handling of multi-class forgetting via low-rank subspaces are notable strengths.
major comments (3)
- [Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.
- [Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.
- [Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.
minor comments (2)
- [Abstract] Notation for 'forget-specific directions' and 'probe separability' is introduced without a formal definition or reference to prior usage.
- [Method] The extension to multi-class forgetting via low-rank subspace removal is mentioned but lacks a concrete algorithmic description or complexity statement.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment point by point below, clarifying the empirical support and methodological details while committing to revisions where the presentation can be strengthened.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that DAMP 'more closely resembles the retraining gold standard' and 'reduc[es] residual forget-class structure in deep layers' is asserted without any quantitative tables, error bars, ablation studies, or numerical comparisons to retraining or baselines, so the central empirical claim cannot be evaluated from the manuscript.
Authors: We acknowledge that the abstract's central claims would benefit from more immediate quantitative support. The full manuscript reports results across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet for both CNN and transformer backbones, with comparisons to prior methods and retraining from scratch on metrics including forget-class accuracy, retain-class accuracy, and measures of residual structure. To address the concern directly, we will revise the abstract and add a consolidated summary table (with error bars from repeated runs) that explicitly quantifies how DAMP aligns with the retraining baseline relative to other methods. revision: yes
-
Referee: [Method] Method (depth-aware scaling paragraph): the assertion that the scaling rule is 'parameter-free' and 'derived from probe separability' is not accompanied by an explicit equation or derivation showing how separability measurements translate into per-layer multipliers without introducing fitted quantities or hidden hyperparameters.
Authors: We agree that the depth-aware scaling rule requires an explicit derivation to substantiate the 'parameter-free' claim. In the revised method section we will insert the full equation for the per-layer multiplier together with a step-by-step derivation that shows how separability statistics (computed once from probe activations) are mapped to scaling factors. No fitted quantities or hidden hyperparameters are introduced; the rule uses only the observed separability values at each depth. revision: yes
-
Referee: [Method] Central assumption (prototype-residual step): the claim that residuals relative to retain-class prototypes isolate forget-specific directions that are both linear and sufficient to capture all encoded forget-class information is load-bearing for the 'true forgetting' versus 'head suppression' distinction, yet the manuscript provides no direct test (e.g., linear probing of internal activations before/after DAMP) to rule out non-linear entanglement or redundant pathways.
Authors: This is a substantive point about the strength of evidence for representational rather than merely head-level forgetting. Our experiments already demonstrate reduced residual forget-class structure in deep layers through representation-level metrics and closer alignment with retraining performance. We will add a direct linear-probing analysis of internal activations before and after DAMP to quantify the removal of linear forget directions. Regarding possible non-linear entanglement or redundant pathways, we note that the method explicitly targets linear subspaces; any remaining non-linear components would constitute a limitation that we will discuss explicitly in the revision. revision: partial
Circularity Check
DAMP derivation is self-contained closed-form without circular reduction
full rationale
The paper presents DAMP as a one-shot closed-form weight-surgery procedure that computes prototypes in per-operator input space, extracts residuals, and applies projection updates with a parameter-free depth-aware scaling rule derived from probe separability. No load-bearing step reduces by construction to a fitted quantity defined inside the paper, nor does any central claim rest on a self-citation chain or imported uniqueness theorem. Empirical comparisons to retraining across datasets and architectures provide external validation, so the derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Class prototypes computed in the input space of each learnable operator capture the relevant forget-specific directions.
- domain assumption Projection-based weight updates at each layer remove downstream sensitivity without requiring gradient optimization.
invented entities (1)
-
forget-specific directions
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
Reference graph
Works this paper leans on
-
[1]
Understanding intermediate layers using linear classifier probes
Guillaume Alain and Yoshua Bengio. Understanding inter- mediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. 6
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Thomas Baumhauer, Pascal Sch ¨ottle, and Matthias Zep- pelzauer. Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,
-
[3]
Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models
Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 6
work page 2025
-
[4]
Jacopo Bonato, Marco Cotogni, and Luigi Sabetta. Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 3, 6, 7
work page 2024
-
[5]
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3
work page 2021
-
[6]
Deep unlearn: Benchmarking machine unlearning for image classification
Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 1, 2, 3, 7
work page 2025
-
[7]
Yuchen Cai and Ding Cao. O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024. 6
-
[8]
Towards making systems for- get with machine unlearning
Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. 1, 3
work page 2015
-
[9]
Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023
Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023. 4
-
[10]
Forget unlearning: To- wards true data-deletion in machine learning
Rishav Chourasia and Neil Shah. Forget unlearning: To- wards true data-deletion in machine learning. InInterna- tional conference on machine learning, pages 6028–6073. PMLR, 2023. 3, 6, 7
work page 2023
-
[11]
Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023. 1, 3
work page 2023
-
[12]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 6
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[13]
Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Fast machine unlearning without retraining through selective synaptic dampening
Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 1, 3, 4, 7, 6
work page 2024
-
[15]
Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pages 11516–11524, 2021. 3
work page 2021
-
[16]
Inexact unlearning needs more careful evaluations to avoid a false sense of privacy
Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khal- ifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025. 2, 3, 4
work page 2025
-
[17]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6
work page 2016
-
[18]
Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,
Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, and Xiaolin Huang. Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,
-
[19]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4, 8
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[20]
Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, and Xiaolin Huang. Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024. 3
work page 2024
-
[21]
Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023. 3, 4
work page 2023
-
[22]
Junyaup Kim and Simon S. Woo. Efficient two-stage model retraining for machine unlearning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4361–4369, 2022. 3
work page 2022
-
[23]
Deep unlearning: Fast and efficient gradient-free class forgetting
Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research, 2024. 3, 4, 6, 7
work page 2024
-
[24]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Uni- versity of Toronto, 2009. 6
work page 2009
-
[25]
Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023. 1, 3
work page 1957
-
[26]
Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 6
work page 2002
-
[27]
Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,
Junde Li and Swaroop Ghosh. Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,
-
[28]
Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025
Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, and V olkan Cevher. Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025. 3, 6, 7
-
[29]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022. 5, 6
work page 2022
-
[30]
Mass-Editing Memory in a Transformer
Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a trans- former.arXiv preprint arXiv:2210.07229, 2022. 5, 6
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[31]
Selective unlearning via repre- sentation erasure using domain adversarial training
Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via repre- sentation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 3, 4
work page 2025
-
[32]
Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 5
work page 2017
-
[33]
Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024
Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, and Junjie Hu. Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024. arXiv preprint arXiv:2405.13967. 6
-
[34]
Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 4, 3
work page 2008
-
[35]
Machine Unlearning: A Comprehensive Survey
Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Ma- chine unlearning: A comprehensive survey.arXiv preprint arXiv:2405.07406, 2024. 1, 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Tiny imagenet challenge.Technical report, 2017
Jiayu Wu, Qixiang Zhang, and Guoxi Xu. Tiny imagenet challenge.Technical report, 2017. 6
work page 2017
-
[37]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2, 4
work page 2014
-
[38]
Decoupled distillation to erase: A general unlearning method for any class-centric tasks
Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 20350–20359, 2025. 1, 3
work page 2025
-
[39]
Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning.arXiv preprint arXiv:2406.08288, 2024. 4 Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Supplementary Material A. Additional Analyses and Supplementary Results A.1. Implementation...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.