arxiv: 2604.15166 · v1 · submitted 2026-04-16 · 💻 cs.CV · cs.AI· cs.LG

Recognition: unknown

Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

Arman Hatami , Romina Aalishah , Ilya E. Monosov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:42 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords machine unlearningclass forgettingneural network editingdirectional projectiondepth-aware scalingselective forgettingrepresentation removalimage classification

0 comments

The pith

By projecting out forget-specific directions layer by layer with depth-aware scaling, a closed-form method achieves class unlearning closer to full retraining than prior approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing class-unlearning methods often leave forget-class patterns intact inside the network or only suppress them at the final output layer rather than truly erasing the targeted knowledge. DAMP counters this by computing prototypes for retain and forget classes at the input to each operator, extracting forget directions as residuals, and applying a projection update to reduce sensitivity to those directions. A parameter-free scaling rule adjusts the strength of the edit according to layer depth, using smaller changes early and larger ones deeper based on class separability. Experiments across MNIST, CIFAR-10, CIFAR-100, Tiny ImageNet, and both convolutional and transformer models indicate that the resulting models forget the target classes more completely while preserving accuracy on retain classes better than some baselines. The outcome is performance that more closely matches the gold standard of retraining the model from scratch without the forget data.

Core claim

DAMP computes class prototypes in the input space of each learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies projection-based weight updates scaled by a depth-aware rule derived from probe separability, thereby removing targeted class information from internal representations without gradient optimization and producing behavior closer to retraining from scratch.

What carries the argument

Depth-Aware Modulation by Projection (DAMP), a one-shot closed-form procedure that isolates forget directions via residuals to retain prototypes and projects them out of the network weights with layer-specific scaling.

If this is right

Selective forgetting on forget classes improves while retain-class performance is preserved better than some prior methods.
Residual forget-class structure detectable in deep representations is reduced.
The technique applies across convolutional and transformer architectures without modification.
Multi-class forgetting extends directly via low-rank subspace removal.
Unlearning completes in one closed-form step with no optimization loop or full retraining data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Forget knowledge appears localized in identifiable directional components of activation space that can be isolated using class prototypes.
Depth-dependent scaling may prove necessary for other internal model edits beyond unlearning.
The method implies that intervening at multiple layers is required for genuine representational change rather than output-level suppression alone.
Similar directional removal could extend to editing other forms of embedded knowledge if suitable prototypes can be estimated.

Load-bearing premise

The forget directions extracted as residuals relative to retain-class prototypes at each layer accurately isolate only the targeted knowledge.

What would settle it

Train a linear probe on activations from deep layers after DAMP application; if the probe still classifies forget-class examples with high accuracy, the claim that forget-specific information has been removed fails.

Figures

Figures reproduced from arXiv: 2604.15166 by Arman Hatami, Ilya E. Monosov, Romina Aalishah.

**Figure 2.** Figure 2: Comparison of DAMP Selectivity with the baseline retrained model, Gradient Ascent Unlearning (GAU), knowledgedistillation unlearning (KDU), Data Deletion Fine-Tuning (DDFT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearning (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget classes 3 (Cat) and 5 (Dog). Existing methods exhibit weak, near… view at source ↗

**Figure 3.** Figure 3: The first panel (top-left) shows the difference between the RDM of the original model that has not been subjected to unlearning [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of DAMP bias shift with the baseline, retrained model, Gradient Ascent Unlearning (GAU), knowledgedistillation unlearning (KDU), Data Deletion Fine-Tuning (DDFT), Logit Masking (LM), Random Relabeling (RandRelabel), Selective Synaptic Dampening (SSD), and Saliency Unlearning (SalUn) for a 5-layer CNN (CNN-5) on CIFAR-10; forget class 3 (Cat). The asterisk marks the forget class. Existing unl… view at source ↗

**Figure 5.** Figure 5: Overview of the proposed DAMP. Starting from a pretrained network, we compute class prototypes in the edit space of each stage. For each forget class, its prototype is decomposed into a component explained by the retain span and a forget residual. The resulting residual directions are orthonormalized. In parallel, a layer coefficient αℓ is computed from depth and forget-retain separability. These two quant… view at source ↗

read the original abstract

Machine unlearning aims to remove targeted knowledge from a trained model without the cost of retraining from scratch. In class unlearning, however, reducing accuracy on forget classes does not necessarily imply true forgetting: forgotten information can remain encoded in internal representations, and apparent forgetting may arise from classifier-head suppression rather than representational removal. We show that existing class-unlearning methods often exhibit weak or negative selectivity, preserve forget-class structure in deep representations, or rely heavily on final-layer bias shifts. We then introduce DAMP (Depth-Aware Modulation by Projection), a one-shot, closed-form weight-surgery method that removes forget-specific directions from a pretrained network without gradient-based optimization. At each stage, DAMP computes class prototypes in the input space of the next learnable operator, extracts forget directions as residuals relative to retain-class prototypes, and applies a projection-based update to reduce downstream sensitivity to those directions. To preserve utility, DAMP uses a parameter-free depth-aware scaling rule derived from probe separability, applying smaller edits in early layers and larger edits in deeper layers. The method naturally extends to multi-class forgetting through low-rank subspace removal. Across MNIST, CIFAR-10, CIFAR-100, and Tiny ImageNet, and across convolutional and transformer architectures, DAMP more closely resembles the retraining gold standard than some of the prior methods, improving selective forgetting while better preserving retain-class performance and reducing residual forget-class structure in deep layers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DAMP offers a simple closed-form projection technique for class unlearning that subtracts forget residuals from retain prototypes layer by layer with depth-aware scaling, but the prototype step risks missing variance and the sequential edits risk misalignment.

read the letter

DAMP is a one-shot weight update that extracts forget directions as residuals to retain-class means in the input space to each subsequent layer, then projects them out with a scaling factor that grows with depth based on probe separability. The paper shows this gets closer to full retraining than some earlier unlearning baselines on MNIST, CIFAR-10/100, and Tiny ImageNet, for both conv nets and transformers, while cutting down leftover forget-class signals in deep features and keeping retain accuracy higher. The closed-form nature and lack of any optimization loop are the clearest practical advantages. The depth-aware rule is a reasonable attempt to protect early layers where features are more general. Experiments appear to include direct comparisons to retraining as the gold standard, which is the right reference point. The main weakness is that class means are first-order statistics; they can easily miss multi-modal structure or features that are shared across classes, so the extracted direction may not isolate the targeted knowledge cleanly. Because each projection changes the input distribution to later layers, the precomputed residuals can drift out of alignment by the time they are applied. The separability-based scaling inherits the same limitation: it is data-dependent and may not track actual downstream utility loss. This paper is for groups working on efficient model editing or privacy-driven unlearning in vision. Readers who need a lightweight alternative to retraining or gradient methods will find the comparisons useful. It deserves a serious referee because the core procedure is distinct and the evaluation covers multiple architectures and datasets, even though the isolation claim needs tighter verification. I would send it for review and ask the authors to add checks on prototype robustness and post-edit direction drift.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes DAMP (Depth-Aware Modulation by Projection), a one-shot closed-form weight-surgery technique for class unlearning. It first critiques existing methods for weak selectivity, residual forget-class structure in deep layers, and reliance on final-layer bias shifts rather than true representational removal. DAMP extracts forget directions at each layer as residuals relative to retain-class prototypes in the input space to the next operator, projects them out, and applies a parameter-free depth-aware scaling rule (smaller edits early, larger edits deeper) derived from probe separability to preserve utility. The method extends to multi-class forgetting via low-rank subspaces. Experiments across MNIST, CIFAR-10/100, Tiny ImageNet, and both convolutional and transformer architectures claim that DAMP more closely matches the retraining gold standard than prior approaches in selective forgetting, retain-class preservation, and reduction of deep-layer forget structure.

Significance. If the central claims hold, DAMP offers an efficient, optimization-free alternative to retraining or gradient-based unlearning that targets internal representations rather than classifier heads. Its closed-form nature, extension to multi-class cases, and explicit depth-dependent modulation are strengths that could make verifiable class forgetting more practical in computer vision pipelines. The comparative resemblance to retraining on multiple benchmarks and architectures would be a notable advance if supported by rigorous ablations.

major comments (2)

[Method (DAMP extraction and projection steps)] Method description (DAMP procedure): forget directions are defined as residuals relative to retain-class prototypes (class means) at each layer. This first-order statistic may fail to isolate targeted knowledge when class-conditional distributions exhibit substantial intra-class variance or multi-modality, and sequential projections alter the input distribution to later layers, so directions computed on the original model may misalign after earlier edits. This assumption is load-bearing for the claim of true representational removal closer to retraining.
[Depth-aware scaling rule (abstract and method)] Depth-aware scaling rule: presented as parameter-free, yet derived from probe separability measurements on the data. This introduces a data-dependent step that reduces the independence of the final performance claims and risks new failure modes on retain classes if separability does not track actual utility loss.

minor comments (2)

[Abstract] Abstract: the phrase 'more closely resembles the retraining gold standard than some of the prior methods' is vague; explicit naming of the strongest baselines and quantitative deltas would improve clarity.
[Method] The manuscript would benefit from an explicit statement of the precise mathematical form of the projection update and the separability-based scaling formula (including any constants or thresholds) to allow direct reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review. We address each major comment point by point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: Method description (DAMP procedure): forget directions are defined as residuals relative to retain-class prototypes (class means) at each layer. This first-order statistic may fail to isolate targeted knowledge when class-conditional distributions exhibit substantial intra-class variance or multi-modality, and sequential projections alter the input distribution to later layers, so directions computed on the original model may misalign after earlier edits. This assumption is load-bearing for the claim of true representational removal closer to retraining.

Authors: We thank the referee for identifying these methodological assumptions. The use of class prototypes enables the closed-form extraction of forget-specific directions without optimization, and our experiments show consistent reduction of forget-class structure in deep layers across datasets with varying degrees of intra-class variation. Nevertheless, we agree that first-order statistics may be insufficient for highly multi-modal distributions and that sequential application of projections could introduce misalignment. In the revised manuscript we will (i) add an explicit limitations paragraph discussing these cases and (ii) include a new ablation that recomputes directions after each layer edit versus the current one-shot computation, quantifying any difference in forgetting and utility metrics. These additions will better support the claim of representational removal. revision: yes
Referee: Depth-aware scaling rule: presented as parameter-free, yet derived from probe separability measurements on the data. This introduces a data-dependent step that reduces the independence of the final performance claims and risks new failure modes on retain classes if separability does not track actual utility loss.

Authors: We agree that the terminology 'parameter-free' requires clarification. The scaling factors are automatically derived from separability probes rather than being manually tuned, but they do rely on data measurements. In the revision we will update the abstract and method section to describe the rule as 'hyperparameter-free yet data-informed.' We will also add analysis showing the observed correlation between probe separability and retain-class utility across the reported benchmarks, together with a brief discussion of potential edge cases where this correlation might weaken. These changes will make the data dependence transparent without altering the method's practical advantages. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces DAMP as a closed-form, one-shot projection method for class unlearning. Its steps—prototype computation, residual direction extraction, and depth-aware scaling from probe separability—are presented as algorithmic choices rather than a mathematical derivation or prediction that reduces to the inputs by construction. Performance claims are empirical comparisons to retraining on held-out benchmarks (MNIST, CIFAR, etc.), with no self-citation load-bearing the core argument, no fitted parameter renamed as a prediction, and no ansatz or uniqueness theorem invoked circularly. The scaling rule, while data-informed, is an explicit design choice for utility preservation and does not force the reported outcomes tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on domain assumptions about how class-specific information is linearly separable in intermediate representations and that projection removes it without collateral damage.

axioms (2)

domain assumption Forget-specific directions can be isolated as residuals between forget-class and retain-class prototypes computed at the input to each learnable operator.
Invoked to define the projection update at each stage.
domain assumption Probe separability at each depth provides a reliable signal for scaling the magnitude of the edit without harming retain-class performance.
Used to derive the parameter-free depth-aware scaling rule.

pith-pipeline@v0.9.0 · 5568 in / 1402 out tokens · 48591 ms · 2026-05-10T11:42:48.804913+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
cs.LG 2026-05 conditional novelty 7.0

Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.

Reference graph

Works this paper leans on

39 extracted references · 12 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding inter- mediate layers using linear classifier probes.arXiv preprint arXiv:1610.01644, 2016. 6

work page Pith review arXiv 2016
[2]

Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,

Thomas Baumhauer, Pascal Sch ¨ottle, and Matthias Zep- pelzauer. Machine unlearning: Linear filtration for logit- based classifiers.Machine Learning, 111(9):3203–3226,
[3]

Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models

Shristi Das Biswas, Arani Roy, and Kaushik Roy. Cure: Con- cept unlearning via orthogonal representation editing in dif- fusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 6

2025
[4]

Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images

Jacopo Bonato, Marco Cotogni, and Luigi Sabetta. Is retain set all you need in machine unlearning? restoring perfor- mance of unlearned models with out-of-distribution images. InEuropean Conference on Computer Vision, pages 1–19. Springer, 2024. 3, 6, 7

2024
[5]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3

2021
[6]

Deep unlearn: Benchmarking machine unlearning for image classification

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 1, 2, 3, 7

2025
[7]

O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024

Yuchen Cai and Ding Cao. O-edit: Orthogonal subspace editing for language model sequential editing.arXiv preprint arXiv:2410.11469, 2024. 6

work page arXiv 2024
[8]

Towards making systems for- get with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015. 1, 3

2015
[9]

Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning.arXiv preprint arXiv:2303.11570, 2023. 4

work page arXiv 2023
[10]

Forget unlearning: To- wards true data-deletion in machine learning

Rishav Chourasia and Neil Shah. Forget unlearning: To- wards true data-deletion in machine learning. InInterna- tional conference on machine learning, pages 6028–6073. PMLR, 2023. 3, 6, 7

2023
[11]

Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Zero-shot machine unlearning.IEEE Transactions on Information Forensics and Security, 18: 2345–2354, 2023. 1, 3

2023
[12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 6

work page internal anchor Pith review Pith/arXiv arXiv 2010
[13]

arXiv preprint arXiv:2310.12508 (2023)

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,

work page arXiv
[14]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 1, 3, 4, 7, 6

2024
[15]

Amne- siac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), pages 11516–11524, 2021. 3

2021
[16]

Inexact unlearning needs more careful evaluations to avoid a false sense of privacy

Jamie Hayes, Ilia Shumailov, Eleni Triantafillou, Amr Khal- ifa, and Nicolas Papernot. Inexact unlearning needs more careful evaluations to avoid a false sense of privacy. In 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 497–519. IEEE, 2025. 2, 3, 4

2025
[17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

2016
[18]

Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,

Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, and Xiaolin Huang. Towards natural machine unlearning.IEEE Transactions on Pattern Analysis and Machine Intelligence,
[19]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4, 8

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024

Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, and Xiaolin Huang. Unified gradient-based machine unlearning with remain geometry enhancement.Advances in Neural Information Processing Systems, 37:26377–26414, 2024. 3

2024
[21]

Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023

Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning.Advances in Neu- ral Information Processing Systems, 36:51584–51605, 2023. 3, 4

2023
[22]

Junyaup Kim and Simon S. Woo. Efficient two-stage model retraining for machine unlearning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 4361–4369, 2022. 3

2022
[23]

Deep unlearning: Fast and efficient gradient-free class forgetting

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting. Transactions on Machine Learning Research, 2024. 3, 4, 6, 7

2024
[24]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Uni- versity of Toronto, 2009. 6

2009
[25]

Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing.Advances in neural information processing systems, 36: 1957–1987, 2023. 1, 3

1957
[26]

Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002

Yann LeCun, L ´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86(11):2278–2324, 2002. 6

2002
[27]

Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

Junde Li and Swaroop Ghosh. Random relabeling for effi- cient machine unlearning.arXiv preprint arXiv:2305.12320,

work page arXiv
[28]

Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025

Ioannis Mavrothalassitis, Pol Puigdemont, Noam Itzhak Levi, and V olkan Cevher. Ascent fails to forget.arXiv preprint arXiv:2509.26427, 2025. 3, 6, 7

work page arXiv 2025
[29]

Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Be- linkov. Locating and editing factual associations in gpt.Ad- vances in neural information processing systems, 35:17359– 17372, 2022. 5, 6

2022
[30]

Mass-Editing Memory in a Transformer

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a trans- former.arXiv preprint arXiv:2210.07229, 2022. 5, 6

work page internal anchor Pith review arXiv 2022
[31]

Selective unlearning via repre- sentation erasure using domain adversarial training

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via repre- sentation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Represen- tations, 2025. 3, 4

2025
[32]

Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017

Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 5

2017
[33]

Uppaal, A

Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, and Junjie Hu. Model editing as a robust and denoised vari- ant of dpo: A case study on toxicity, 2024. arXiv preprint arXiv:2405.13967. 6

work page arXiv 2024
[34]

Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9 (86):2579–2605, 2008. 4, 3

2008
[35]

Machine Unlearning: A Comprehensive Survey

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Ma- chine unlearning: A comprehensive survey.arXiv preprint arXiv:2405.07406, 2024. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Tiny imagenet challenge.Technical report, 2017

Jiayu Wu, Qixiang Zhang, and Guoxi Xu. Tiny imagenet challenge.Technical report, 2017. 6

2017
[37]

How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks?Ad- vances in neural information processing systems, 27, 2014. 2, 4

2014
[38]

Decoupled distillation to erase: A general unlearning method for any class-centric tasks

Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin, and Wei-Shi Zheng. Decoupled distillation to erase: A general unlearning method for any class-centric tasks. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 20350–20359, 2025. 1, 3

2025
[39]

arXiv preprint arXiv:2406.08288 (2024)

Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning.arXiv preprint arXiv:2406.08288, 2024. 4 Class Unlearning via Depth-Aware Removal of Forget-Specific Directions Supplementary Material A. Additional Analyses and Supplementary Results A.1. Implementation...

work page arXiv 2024