Erased, but Not Gone: Output Forgetting Is Not True Forgetting

Chee Seng Chan; Teresa Pui Yee Yong; Win Kent Ong

arxiv: 2606.25001 · v1 · pith:BVNAKMEBnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI

Erased, but Not Gone: Output Forgetting Is Not True Forgetting

Teresa Pui Yee Yong , Win Kent Ong , Chee Seng Chan This is my paper

Pith reviewed 2026-06-26 00:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords machine unlearningoutput forgettingrepresentation spaceretrainingforget setretain setmembership inference

0 comments

The pith

Output forgetting in machine unlearning often leaves structured representation mismatches relative to retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper questions whether low output accuracy or reduced membership inference on forget data truly certifies that a model has forgotten in the sense of matching a retrained model. It introduces retraining-consistent representation forgetting as a stronger check, comparing unlearned models to models trained from scratch without the forget data. Results across methods, datasets, and models show that output success frequently coexists with systematic residuals in representation space, including forget/retain asymmetry and directional concentration. A reader would care because this means current evaluations may certify apparent rather than actual forgetting, with implications for privacy and data removal claims. The work demonstrates that retraining exposes discrepancies hidden by output-only checks.

Core claim

The central claim is that standard output-level evaluation can systematically overestimate unlearning success because output forgetting can coexist with retraining-inconsistent residuals in representation space. Under this lens, methods often partially align with retraining on forget samples, remain more inconsistent on retain samples, and leave residual discrepancy concentrated along retraining-related directions rather than diffuse.

What carries the argument

Retrain-consistent representation forgetting, which treats the model retrained from scratch without the forget data as the operational reference for correct forgetting in representation space.

If this is right

Unlearned models show partial alignment with retraining on forget samples but greater inconsistency on retain samples.
Residual mismatches concentrate along retraining-related directions rather than appearing diffuse in representation space.
Current methods often produce apparent output forgetting without achieving retraining-consistent forgetting.
Standard evaluations based on output accuracy or logit-level inference can overestimate true unlearning progress.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Evaluation protocols may need to incorporate representation-space comparisons to the retrained reference to avoid overestimating forgetting.
Applications requiring verifiable data removal, such as regulatory compliance, could be affected if only output metrics are used.
Methods that directly optimize for reduced representation mismatch to retraining might address the identified gaps.

Load-bearing premise

The retrained model trained from scratch without the forget data serves as a valid operational reference for what correct forgetting should look like in representation space.

What would settle it

An empirical result showing that representation distances and directional alignments between unlearned models and the retrained reference are statistically indistinguishable from random or zero across multiple methods would falsify the claim of systematic overestimation.

Figures

Figures reproduced from arXiv: 2606.25001 by Chee Seng Chan, Teresa Pui Yee Yong, Win Kent Ong.

**Figure 1.** Figure 1: Looks forgotten, but not close to retraining. A schematic summary of the evaluation gap studied in this paper. Output-level metrics may suggest successful forgetting, yet the same methods can remain far from the exact retraining reference in representation space. This discrepancy motivates our retraining-consistent analysis of forget/retain asymmetry, directional mismatch, and concentrated residuals. Thi… view at source ↗

**Figure 2.** Figure 2: Output forgetting can appear successful while forget set information remains recoverable [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: ρlogit against ρrep on CIFAR10 with ResNet-18. Methods above the diagonal exhibit more residual leakage in representation space than at the output-level. Method Forget set cos(∆u) ↑ Retain set cos(∆r) ↑ Asymmetry gap cos(∆u) − cos(∆r) ↓ SCRUB [1] 0.945 −0.345 1.290 Boundary Shrink [2] 0.916 −0.305 1.221 UNSIR [3] 0.720 0.206 0.514 Amnesiac [4] 0.795 0.432 0.364 SSD [5] 0.911 −0.389 1.300 POUR-P [14] 0.854… view at source ↗

**Figure 4.** Figure 4: Directional alignment and representation alignment on CIFAR-10 with ResNet-18. Forget and retain samples exhibit different patterns of mismatch relative to retraining. (a) ∆MIArep (b) CKAu [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Output-level and representationlevel forgetting across dataset complexity with ResNet-18. (a) ∆MIArep by subspace (b) CKAu by subspace [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 10.** Figure 10: Directional asymmetry across random seeds. 3.5 Same Structured Mismatch Persists across Scale We now test whether the diagnosed mismatch is a narrow artifact of one benchmark setting or a stable property of current unlearning behavior. Specifically, we rule out four weaker explanations, i.e., easy datasets, underpowered models, convolutional backbones, and favorable class or seed choices. Dataset complex… view at source ↗

**Figure 11.** Figure 11: ASR across MIA configurations on CIFAR-10 with ResNet-18. Lower ASR indicates [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: t-SNE visualization of feature representations on CIFAR-10 with ResNet-18. The forget [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 14.** Figure 14: Residual discrepancy ratio relative to the raw representation, showing how discrepancy is distributed across the parallel and orthogonal components. On retain samples Dr, the picture changes sharply. Most methods have magnitude ratios above 1, often far above 1, while cosine similarity is near zero or even negative. This indicates that retain-side behavior is not merely misdirected relative to retrainin… view at source ↗

**Figure 15.** Figure 15: Scatter plots of residual discrepancy across the parallel and orthogonal components. [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Directional alignment for CIFAR100 / ResNet-18. SCRUB Boundary ShrinkUNSIR Amnesiac SSD POUR-P POUR-D 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 c o s( ) Forget Retain [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗

**Figure 18.** Figure 18: Directional alignment for TinyImageNet / ResNet-18. Amnesiac shows slightly higher alignment on retain samples than on forget samples in this setting, deviating from the more common pattern observed elsewhere. SCRUB Boundary ShrinkUNSIR Amnesiac SSD POUR-P POUR-D 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 c o s( ) Forget Retain [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

read the original abstract

Machine unlearning (MU) is commonly judged by output forgetting, such as low forget-set accuracy or reduced logit-level membership inference. But if output-level success can coexist with retraining-inconsistent residuals in representation space, what kind of forgetting are current evaluations actually certifying? We study this question through retraining-consistent representation forgetting, using the retrained model (i.e., trained from scratch without the forget data) as an operational reference for correct forgetting. Across multiple unlearning methods, datasets, and models, our theoretical analysis and empirical results show that standard output-level evaluation can systematically overestimate the success of unlearning. Under this stronger lens, current methods often appear forgotten at the output layer while exhibiting a structured mismatch relative to retraining. They partially align with retraining on forget samples, remain more inconsistent on retain samples, and leave residual discrepancy concentrated along retraining-related directions rather than diffuse in representation space. This structured mismatch is characterized by forget/retain asymmetry, directional mismatch, and concentrated residuals along retraining-related directions. These results suggest that current MU is often evaluated for apparent forgetting rather than retraining-consistent forgetting. More broadly, retraining reveals what output forgetting hides.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows output unlearning can leave structured representation mismatches versus a retrained model, but treats that model as the sole valid reference without enough justification.

read the letter

The core observation is that several unlearning methods produce models whose outputs look forgotten on standard checks, yet their internal representations still deviate from a retrained-from-scratch baseline in non-random ways: more aligned on forget samples than retain ones, with residuals concentrated along particular directions. That pattern is presented with some empirical breadth across methods and datasets, and it usefully flags that output-only metrics can miss internal structure.

What the work does cleanly is make the comparison to retraining explicit and operational. Using the retrained model as a reference avoids some circularity that self-referential metrics can have, and the reported asymmetry plus directional concentration gives a concrete handle on where the mismatch lives.

The main limitation is the leap from "does not match retraining" to "not true forgetting." The stress-test point holds: nothing in the abstract rules out the possibility that other representationally distinct states could still prevent membership inference or reconstruction. Treating retraining as the canonical target needs more defense—either theoretical or by showing that alternatives fail on downstream privacy tasks. Without that, the claim that output evaluation "systematically overestimates" success rests on one reference trajectory rather than a broader argument.

The math and data details are not visible here, so I cannot judge the derivations or error bars, but the framing itself is coherent and engages the literature on evaluation gaps. This is for researchers building or auditing unlearning benchmarks. A reader focused on privacy metrics or regulatory compliance would find the asymmetry results worth checking. It deserves peer review because the empirical pattern raises a real question about current practice, even if the interpretation of what counts as forgetting needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard output-level metrics for machine unlearning (e.g., forget-set accuracy, logit-based membership inference) systematically overestimate success. Using the retrained model (trained from scratch on retain data only) as an operational reference for correct representation-space forgetting, the authors present theoretical analysis and empirical results across multiple unlearning methods, datasets, and models. They report that unlearned models exhibit structured mismatches: partial alignment with retraining on forget samples, greater inconsistency on retain samples, and residual discrepancy concentrated along retraining-related directions rather than diffuse.

Significance. If the empirical patterns hold, the work identifies a concrete limitation in current MU evaluation practices and motivates stronger, representation-consistent criteria. The multi-method, multi-dataset empirical component and the operational use of a retrained reference are strengths that make the overestimation claim testable and falsifiable.

major comments (2)

[Abstract, §3] Abstract and §3 (theoretical analysis): the central claim that output forgetting 'overestimates success' is load-bearing on the premise that deviation from the retrained model's representations constitutes incomplete forgetting. The manuscript does not provide a formal argument or empirical test showing that no other representationally distinct trajectories can achieve effective forgetting (zero membership inference, no reconstruction) while differing from the retrained model; this assumption requires explicit justification or a counter-example analysis.
[Empirical results (§4–5)] Empirical results section (likely §4–5): the reported forget/retain asymmetry and directional concentration are interpreted as evidence of incomplete forgetting, but the paper does not report whether these mismatches correlate with actual downstream risks (e.g., increased reconstruction success or membership inference beyond output level). Without that link, the structured mismatch alone does not yet demonstrate overestimation of unlearning success.

minor comments (2)

[Methods] Notation for 'retraining-consistent representation forgetting' is introduced in the abstract but would benefit from an explicit definition or equation in the methods section to avoid ambiguity with standard representation similarity measures.
[Figures] Figure captions should explicitly state the number of runs, random seeds, and statistical significance tests used for the reported asymmetries and directional concentrations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for strengthening the justification of our central claim and the empirical linkage to downstream risks. We address each point below and will incorporate revisions accordingly.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (theoretical analysis): the central claim that output forgetting 'overestimates success' is load-bearing on the premise that deviation from the retrained model's representations constitutes incomplete forgetting. The manuscript does not provide a formal argument or empirical test showing that no other representationally distinct trajectories can achieve effective forgetting (zero membership inference, no reconstruction) while differing from the retrained model; this assumption requires explicit justification or a counter-example analysis.

Authors: We agree that the operational use of the retrained model as the reference for true forgetting requires explicit justification. In the revised manuscript, we will expand §3 with a formal argument establishing that retraining from scratch on the retain set is the unique trajectory guaranteeing removal of forget-set influence (as any representationally distinct model retains latent structure from the forget data). We will also include a brief discussion of why output-level success on alternative trajectories does not constitute complete forgetting under a representation-consistent definition. revision: yes
Referee: [Empirical results (§4–5)] Empirical results section (likely §4–5): the reported forget/retain asymmetry and directional concentration are interpreted as evidence of incomplete forgetting, but the paper does not report whether these mismatches correlate with actual downstream risks (e.g., increased reconstruction success or membership inference beyond output level). Without that link, the structured mismatch alone does not yet demonstrate overestimation of unlearning success.

Authors: We acknowledge this limitation in the current empirical presentation. In the revised version of §4–5, we will add correlation analyses between the reported representation mismatch metrics (forget/retain asymmetry and directional concentration) and downstream risks, specifically reconstruction attack success rates and advanced membership inference performance beyond output logits. This will directly link the observed structured mismatches to overestimation of unlearning success. revision: yes

Circularity Check

0 steps flagged

No significant circularity; retrained model serves as independent external reference

full rationale

The paper's core evaluation compares unlearned models against a separately trained retrained model (trained from scratch on data excluding the forget set). This reference is constructed independently of any unlearning method outputs or fitted parameters within the paper. No equations or claims reduce a 'prediction' or result to the inputs by construction, no self-citations are load-bearing for the central argument, and no ansatz or uniqueness theorem is smuggled in. The derivation chain is self-contained against this external benchmark, consistent with the low circularity expectation for papers using independent references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is limited to the abstract; no free parameters, invented entities, or additional axioms are identifiable from the provided text.

axioms (1)

domain assumption The retrained model serves as the correct operational reference for true forgetting.
Explicitly invoked in the abstract as the basis for measuring representation forgetting.

pith-pipeline@v0.9.1-grok · 5741 in / 1154 out tokens · 34177 ms · 2026-06-26T00:03:32.594864+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 5 linked inside Pith

[1]

Towards unbounded ma- chine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded ma- chine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023

1957
[2]

Boundary unlearning: Rapid forget- ting of deep networks via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forget- ting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on CVPR, pages 7766–7775, 2023

2023
[3]

Fast yet effective ma- chine unlearning.IEEE transactions on neural networks and learning systems, 35(9):13046–13055, 2023

Ayush K Tarun, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli. Fast yet effective ma- chine unlearning.IEEE transactions on neural networks and learning systems, 35(9):13046–13055, 2023

2023
[4]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

2021
[5]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI Conference, volume 38, pages 12043–12051, 2024

2024
[6]

Member- ship inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Member- ship inference attacks from first principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022

1914
[7]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015

2015
[8]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021

2021
[9]

Machine unlearning: Solutions and challenges.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

2024
[10]

Making ai forget you: Data deletion in machine learning.Advances in neural information processing systems, 32, 2019

Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning.Advances in neural information processing systems, 32, 2019

2019
[11]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7210–7217, 2023

2023
[12]

Unrolling sgd: Understanding factors influencing machine unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling sgd: Understanding factors influencing machine unlearning. In2022 IEEE 7th EuroS&P, pages 303–319. IEEE, 2022

2022
[13]

Maverick: Collaboration-free federated unlearning for medical privacy

Win Kent Ong and Chee Seng Chan. Maverick: Collaboration-free federated unlearning for medical privacy. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 358–368. Springer, 2025

2025
[14]

Pour: A provably optimal method for unlearning representations via neural collapse.arXiv preprint arXiv:2511.19339, 2025

Anjie Le, Can Peng, Yuyuan Liu, and J Alison Noble. Pour: A provably optimal method for unlearning representations via neural collapse.arXiv preprint arXiv:2511.19339, 2025

Pith/arXiv arXiv 2025
[15]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

2020
[16]

Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. InEuropean Conference on Computer Vision, pages 383–398. Springer, 2020

2020
[17]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019

2019
[18]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

2009
[19]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Yann Le, Xuan Yang, et al. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

2015
[20]

Deep unlearning: Fast and efficient gradient-free class forgetting.Transactions on Machine Learning Research, 2024

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting.Transactions on Machine Learning Research, 2024. 10

2024
[21]

An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un- terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

Pith/arXiv arXiv 2010
[22]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017

2017
[23]

Esc: Erasing space concept for knowledge deletion

Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, and Gyeong-Moon Park. Esc: Erasing space concept for knowledge deletion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5010–5019, 2025

2025
[24]

Learn to unlearn for deep neural networks: Minimizing unlearning interference with gradient projection

Tuan Hoang, Santu Rana, Sunil Gupta, and Svetha Venkatesh. Learn to unlearn for deep neural networks: Minimizing unlearning interference with gradient projection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4819–4828, 2024

2024
[25]

Representation unlearning: Forgetting through information compression.arXiv preprint arXiv:2601.21564, 2026

Antonio Almud ´evar and Alfonso Ortega. Representation unlearning: Forgetting through information compression.arXiv preprint arXiv:2601.21564, 2026

Pith/arXiv arXiv 2026
[26]

Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong, et al. Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

arXiv 2024
[27]

Erase at the core: Representation unlearning for ma- chine unlearning.arXiv preprint arXiv:2602.05375, 2026

Jaewon Lee, Yongwoo Kim, and Donghyun Kim. Erase at the core: Representation unlearning for ma- chine unlearning.arXiv preprint arXiv:2602.05375, 2026

arXiv 2026
[28]

Duck: Distance-based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

Marco Cotogni, Jacopo Bonato, Luigi Sabetta, Francesco Pelosin, and Alessandro Nicolosi. Duck: Distance-based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

arXiv 2023
[29]

Selective unlearning via representation erasure using domain adversarial training

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via representation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[30]

Efficient attribute unlearning: To- wards selective removal of input attributes from feature representations.arXiv preprint arXiv:2202.13295, 2022

Tao Guo, Song Guo, Jiewei Zhang, Wenchao Xu, and Junxiao Wang. Efficient attribute unlearning: To- wards selective removal of input attributes from feature representations.arXiv preprint arXiv:2202.13295, 2022

arXiv 2022
[31]

Are we truly forgetting? a critical re-examination of machine unlearning evaluation protocols.Engineering Applications of Artificial Intelligence, 167:113785, 2026

Yongwoo Kim, Sungmin Cha, and Donghyun Kim. Are we truly forgetting? a critical re-examination of machine unlearning evaluation protocols.Engineering Applications of Artificial Intelligence, 167:113785, 2026

2026
[32]

An information theoretic evaluation metric for strong unlearning

Dongjae Jeon, Wonje Jeung, Taeheon Kim, Albert No, and Jonghyun Choi. An information theoretic evaluation metric for strong unlearning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22173–22181, 2026

2026
[33]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008
[34]

Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

Pith/arXiv arXiv 2023
[35]

Statistical mia: Rethinking membership inference attack for reliable unlearning auditing

Jialong Sun, Zeming Wei, Jiaxuan Zou, Jiacheng Gong, Guanheng Wang, Chengyang Dong, Jialong Li, and Bo Liu. Statistical mia: Rethinking membership inference attack for reliable unlearning auditing. arXiv preprint arXiv:2602.01150, 2026

Pith/arXiv arXiv 2026
[36]

Lotus: Large- scale machine unlearning with a taste of uncertainty

Christoforos N Spartalis, Theodoros Semertzidis, Efstratios Gavves, and Petros Daras. Lotus: Large- scale machine unlearning with a taste of uncertainty. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10046–10055, 2025

2025
[37]

On the impossibility of retrain equivalence in machine unlearning.arXiv preprint arXiv:2510.16629, 2025

Jiatong Yu, Yinghui He, Anirudh Goyal, and Sanjeev Arora. On the impossibility of retrain equivalence in machine unlearning.arXiv preprint arXiv:2510.16629, 2025. 11 A Proof of Theorem 1 Definition 2 motivates a retraining-consistent representation lens because output agreement alone does not determine the internal state of a model. A model may match the ...

arXiv 2025

[1] [1]

Towards unbounded ma- chine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded ma- chine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023

1957

[2] [2]

Boundary unlearning: Rapid forget- ting of deep networks via shifting the decision boundary

Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forget- ting of deep networks via shifting the decision boundary. InProceedings of the IEEE/CVF Conference on CVPR, pages 7766–7775, 2023

2023

[3] [3]

Fast yet effective ma- chine unlearning.IEEE transactions on neural networks and learning systems, 35(9):13046–13055, 2023

Ayush K Tarun, Vikram S Chundawat, Murari Mandal, and Mohan Kankanhalli. Fast yet effective ma- chine unlearning.IEEE transactions on neural networks and learning systems, 35(9):13046–13055, 2023

2023

[4] [4]

Amnesiac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amnesiac machine learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11516–11524, 2021

2021

[5] [5]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI Conference, volume 38, pages 12043–12051, 2024

2024

[6] [6]

Member- ship inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, and Florian Tramer. Member- ship inference attacks from first principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914. IEEE, 2022

1914

[7] [7]

Towards making systems forget with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015

2015

[8] [8]

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021

2021

[9] [9]

Machine unlearning: Solutions and challenges.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

Jie Xu, Zihan Wu, Cong Wang, and Xiaohua Jia. Machine unlearning: Solutions and challenges.IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

2024

[10] [10]

Making ai forget you: Data deletion in machine learning.Advances in neural information processing systems, 32, 2019

Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. Making ai forget you: Data deletion in machine learning.Advances in neural information processing systems, 32, 2019

2019

[11] [11]

Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher

Vikram S Chundawat, Ayush K Tarun, Murari Mandal, and Mohan Kankanhalli. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7210–7217, 2023

2023

[12] [12]

Unrolling sgd: Understanding factors influencing machine unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. Unrolling sgd: Understanding factors influencing machine unlearning. In2022 IEEE 7th EuroS&P, pages 303–319. IEEE, 2022

2022

[13] [13]

Maverick: Collaboration-free federated unlearning for medical privacy

Win Kent Ong and Chee Seng Chan. Maverick: Collaboration-free federated unlearning for medical privacy. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 358–368. Springer, 2025

2025

[14] [14]

Pour: A provably optimal method for unlearning representations via neural collapse.arXiv preprint arXiv:2511.19339, 2025

Anjie Le, Can Peng, Yuyuan Liu, and J Alison Noble. Pour: A provably optimal method for unlearning representations via neural collapse.arXiv preprint arXiv:2511.19339, 2025

Pith/arXiv arXiv 2025

[15] [15]

Eternal sunshine of the spotless net: Selective forgetting in deep networks

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

2020

[16] [16]

Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations

Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. InEuropean Conference on Computer Vision, pages 383–398. Springer, 2020

2020

[17] [17]

Similarity of neural network representations revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019

2019

[18] [18]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

2009

[19] [19]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Yann Le, Xuan Yang, et al. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

2015

[20] [20]

Deep unlearning: Fast and efficient gradient-free class forgetting.Transactions on Machine Learning Research, 2024

Sangamesh Kodge, Gobinda Saha, and Kaushik Roy. Deep unlearning: Fast and efficient gradient-free class forgetting.Transactions on Machine Learning Research, 2024. 10

2024

[21] [21]

An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un- terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

Pith/arXiv arXiv 2010

[22] [22]

Membership inference attacks against machine learning models

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017

2017

[23] [23]

Esc: Erasing space concept for knowledge deletion

Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, and Gyeong-Moon Park. Esc: Erasing space concept for knowledge deletion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5010–5019, 2025

2025

[24] [24]

Learn to unlearn for deep neural networks: Minimizing unlearning interference with gradient projection

Tuan Hoang, Santu Rana, Sunil Gupta, and Svetha Venkatesh. Learn to unlearn for deep neural networks: Minimizing unlearning interference with gradient projection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4819–4828, 2024

2024

[25] [25]

Representation unlearning: Forgetting through information compression.arXiv preprint arXiv:2601.21564, 2026

Antonio Almud ´evar and Alfonso Ortega. Representation unlearning: Forgetting through information compression.arXiv preprint arXiv:2601.21564, 2026

Pith/arXiv arXiv 2026

[26] [26]

Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

Qiuchen Zhang, Carl Yang, Jian Lou, Li Xiong, et al. Contrastive unlearning: A contrastive approach to machine unlearning.arXiv preprint arXiv:2401.10458, 2024

arXiv 2024

[27] [27]

Erase at the core: Representation unlearning for ma- chine unlearning.arXiv preprint arXiv:2602.05375, 2026

Jaewon Lee, Yongwoo Kim, and Donghyun Kim. Erase at the core: Representation unlearning for ma- chine unlearning.arXiv preprint arXiv:2602.05375, 2026

arXiv 2026

[28] [28]

Duck: Distance-based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

Marco Cotogni, Jacopo Bonato, Luigi Sabetta, Francesco Pelosin, and Alessandro Nicolosi. Duck: Distance-based unlearning via centroid kinematics.arXiv preprint arXiv:2312.02052, 2023

arXiv 2023

[29] [29]

Selective unlearning via representation erasure using domain adversarial training

Nazanin Mohammadi Sepahvand, Eleni Triantafillou, Hugo Larochelle, Doina Precup, James J Clark, Daniel M Roy, and Gintare Karolina Dziugaite. Selective unlearning via representation erasure using domain adversarial training. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[30] [30]

Efficient attribute unlearning: To- wards selective removal of input attributes from feature representations.arXiv preprint arXiv:2202.13295, 2022

Tao Guo, Song Guo, Jiewei Zhang, Wenchao Xu, and Junxiao Wang. Efficient attribute unlearning: To- wards selective removal of input attributes from feature representations.arXiv preprint arXiv:2202.13295, 2022

arXiv 2022

[31] [31]

Are we truly forgetting? a critical re-examination of machine unlearning evaluation protocols.Engineering Applications of Artificial Intelligence, 167:113785, 2026

Yongwoo Kim, Sungmin Cha, and Donghyun Kim. Are we truly forgetting? a critical re-examination of machine unlearning evaluation protocols.Engineering Applications of Artificial Intelligence, 167:113785, 2026

2026

[32] [32]

An information theoretic evaluation metric for strong unlearning

Dongjae Jeon, Wonje Jeung, Taeheon Kim, Albert No, and Jonghyun Choi. An information theoretic evaluation metric for strong unlearning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22173–22181, 2026

2026

[33] [33]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008

[34] [34]

Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023

Pith/arXiv arXiv 2023

[35] [35]

Statistical mia: Rethinking membership inference attack for reliable unlearning auditing

Jialong Sun, Zeming Wei, Jiaxuan Zou, Jiacheng Gong, Guanheng Wang, Chengyang Dong, Jialong Li, and Bo Liu. Statistical mia: Rethinking membership inference attack for reliable unlearning auditing. arXiv preprint arXiv:2602.01150, 2026

Pith/arXiv arXiv 2026

[36] [36]

Lotus: Large- scale machine unlearning with a taste of uncertainty

Christoforos N Spartalis, Theodoros Semertzidis, Efstratios Gavves, and Petros Daras. Lotus: Large- scale machine unlearning with a taste of uncertainty. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10046–10055, 2025

2025

[37] [37]

On the impossibility of retrain equivalence in machine unlearning.arXiv preprint arXiv:2510.16629, 2025

Jiatong Yu, Yinghui He, Anirudh Goyal, and Sanjeev Arora. On the impossibility of retrain equivalence in machine unlearning.arXiv preprint arXiv:2510.16629, 2025. 11 A Proof of Theorem 1 Definition 2 motivates a retraining-consistent representation lens because output agreement alone does not determine the internal state of a model. A model may match the ...

arXiv 2025