Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning
Pith reviewed 2026-05-21 07:56 UTC · model grok-4.3
The pith
Current visual unlearning methods retain substantial class structure in representations even after passing output-level certification tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mirage shows that unlearning methods passing output-level certification still retain substantial class structure in their representations. Linear Probe Recovery scores exceed the retrained baseline by up to 15.4 points, Centered Kernel Alignment indicates greater similarity to the original model than to the retrained reference, and feature separability scores confirm persistent geometric discrimination between classes. Class-level unlearning leaves recoverable traces up to 97 percent while sample-level unlearning falls to chance levels around 50 percent, with residual class information detectable across network layers.
What carries the argument
Mirage, a representation-level auditing framework that applies Linear Probe Recovery, Centered Kernel Alignment, Feature Separability Scoring, and Layer-Wise Recovery Analysis to detect retained class structure after unlearning.
If this is right
- Output-level certification alone is insufficient to guarantee that class information has been removed from internal representations.
- No existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting.
- Class-level unlearning preserves strong representational traces while sample-level unlearning becomes indistinguishable from random.
- Residual class information persists through multiple layers of the network after unlearning.
- Evaluation standards for federated unlearning should shift to include representation-level checks.
Where Pith is reading between the lines
- Unlearning algorithms may need explicit penalties on feature-space separability to close the observed gap.
- The difference between class-level and sample-level outcomes points to distinct mechanisms that future methods could exploit.
- Similar representation-level gaps could be checked in non-federated or non-vision settings to test generality.
- Production systems may need to fall back to full retraining when representation traces cannot be tolerated.
Load-bearing premise
The four diagnostics accurately detect whether representation-level forgetting has occurred rather than measuring some unrelated property of the features.
What would settle it
Finding an unlearning method where Linear Probe Recovery scores match the retrained baseline, Centered Kernel Alignment aligns with the retrained model, and feature separability drops to chance levels while utility remains high would show that representation-level forgetting is achievable.
Figures
read the original abstract
Machine unlearning in Vertical Federated Learning (VFL) has attracted growing interest, yet existing methods certify forgetting solely using output-level metrics. We challenge these claims by introducing Mirage, a representation-level auditing framework comprising four complementary diagnostics: Linear Probe Recovery (LPR), Centered Kernel Alignment (CKA), Feature Separability Scoring, and Layer-Wise Recovery Analysis. Through experiments across seven datasets and seven baseline methods following recent VFL unlearning protocols, Mirage reveals three key findings: (i) Forgetting gap: methods that pass output-level certification still retain substantial class structure in their representations, with LPR exceeding the retrained baseline by up to 15.4 points; CKA shows these models remain structurally closer to the original than to the retrained reference, while separability scores indicate persistent geometric discrimination. (ii) Unlearning trilemma: no existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting. (iii) Class-sample asymmetry: class-level forgetting leaves strong representational traces (LPR up to 97%), whereas sample-level forgetting is indistinguishable from chance (LPR approx. 50%); layer-wise analysis further shows residual class information persists across network depths. These findings call for representation-aware evaluation standards in federated unlearning research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that output-level certification of unlearning in Vertical Federated Learning is insufficient because methods that pass such tests still retain substantial class structure in their internal representations. It introduces the Mirage auditing framework consisting of four diagnostics (Linear Probe Recovery (LPR), Centered Kernel Alignment (CKA), Feature Separability Scoring, and Layer-Wise Recovery Analysis). Experiments across seven datasets and seven baselines following recent VFL unlearning protocols reveal a forgetting gap (LPR exceeding retrained baseline by up to 15.4 points), an unlearning trilemma (no method achieves high utility plus both output- and representation-level forgetting), and class-sample asymmetry (strong traces for class-level forgetting with LPR up to 97% vs. chance-level for sample-level).
Significance. If the four diagnostics are shown to specifically isolate retention of the forgotten class rather than general feature separability or training artifacts, the work would be significant for establishing representation-aware evaluation standards in federated unlearning. The broad experimental scope across datasets and baselines, plus the identification of the trilemma and asymmetry, provides a useful empirical foundation that could steer future method design toward more complete forgetting guarantees.
major comments (3)
- [§3.2] §3.2 (Linear Probe Recovery definition): LPR is defined as linear probe accuracy on the forgotten class and is reported to exceed the retrained-from-scratch baseline by up to 15.4 points. This comparison lacks a control that holds overall feature utility fixed while varying only the presence of the specific forgotten class (e.g., label permutation or synthetic data ablation), leaving open whether the gap reflects unlearning failure or differences in optimization trajectory and general feature quality.
- [§4.1] §4.1 and §5 (validation of the four diagnostics): The manuscript presents LPR, CKA, Feature Separability Scoring, and Layer-Wise Recovery Analysis as complementary measures of representation-level forgetting. However, no direct validation is provided (such as correlation with known retention cases or controls for general separability) demonstrating that these metrics isolate specific retention of the unlearned class information rather than detecting unrelated properties of the feature space.
- [§5.3] Abstract and §5.3 (unlearning trilemma claim): The trilemma conclusion that no existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting rests on the seven evaluated baselines. The claim would be more robust with an explicit discussion of whether the observed trade-offs are fundamental or potentially addressable by hybrid or novel methods outside the current baseline set.
minor comments (2)
- [Table 1] Table 1 and Figure 3: Include standard deviations or confidence intervals for all reported LPR, CKA, and separability scores to allow readers to assess the statistical reliability of the 15.4-point gap and other quantitative findings.
- [§2] §2 (Related Work): The discussion of prior unlearning evaluation could be expanded with additional citations to representation-level probing techniques from the broader machine learning literature to better contextualize the proposed diagnostics.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the presentation and strengthen the empirical claims. We address each major comment below and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Linear Probe Recovery definition): LPR is defined as linear probe accuracy on the forgotten class and is reported to exceed the retrained-from-scratch baseline by up to 15.4 points. This comparison lacks a control that holds overall feature utility fixed while varying only the presence of the specific forgotten class (e.g., label permutation or synthetic data ablation), leaving open whether the gap reflects unlearning failure or differences in optimization trajectory and general feature quality.
Authors: We thank the referee for this observation. The retrained-from-scratch model is the standard reference for complete forgetting because it has never observed the forgotten class during training. Nevertheless, we agree that differences in optimization trajectories could contribute to the observed gap. In the revised manuscript we will add a controlled ablation that applies label permutation to the forgotten class while freezing the feature extractor weights from the original model; this isolates class-specific retention while holding general feature quality fixed. The new results and discussion will be placed in §3.2. revision: yes
-
Referee: [§4.1] §4.1 and §5 (validation of the four diagnostics): The manuscript presents LPR, CKA, Feature Separability Scoring, and Layer-Wise Recovery Analysis as complementary measures of representation-level forgetting. However, no direct validation is provided (such as correlation with known retention cases or controls for general separability) demonstrating that these metrics isolate specific retention of the unlearned class information rather than detecting unrelated properties of the feature space.
Authors: We acknowledge that explicit validation strengthens the interpretation of the diagnostics. While each metric draws on prior literature (CKA for representational similarity, linear probes for class separability), we will add a dedicated validation subsection in the revision. This will include: (i) results on the original (non-unlearned) model showing uniformly high retention across all four metrics, (ii) results on a model trained without the forgotten class aligning with the retrained baseline, and (iii) a control that measures the same metrics on non-forgotten classes to confirm specificity to the unlearned class. These additions will appear in §4.1 and §5. revision: yes
-
Referee: [§5.3] Abstract and §5.3 (unlearning trilemma claim): The trilemma conclusion that no existing method simultaneously achieves high utility, output-level forgetting, and representation-level forgetting rests on the seven evaluated baselines. The claim would be more robust with an explicit discussion of whether the observed trade-offs are fundamental or potentially addressable by hybrid or novel methods outside the current baseline set.
Authors: We agree that the trilemma is an empirical observation based on the seven baselines that follow current VFL unlearning protocols. In the revised §5.3 we will explicitly state that the trade-off is demonstrated for existing methods and remains an open question for future work. We will add a paragraph discussing potential avenues such as hybrid regularization that jointly optimizes output-level and representation-level objectives, while noting that our current results do not prove the trilemma is fundamental. revision: partial
Circularity Check
No significant circularity; empirical metrics compared to external baselines
full rationale
The paper introduces four diagnostics (LPR, CKA, Feature Separability Scoring, Layer-Wise Recovery Analysis) and reports empirical gaps relative to retrained-from-scratch baselines across datasets. No equations or derivations are presented that reduce a claimed result to a fitted parameter or self-referential definition by construction. The central findings rest on direct comparisons to independent reference models rather than on any self-citation chain or ansatz smuggled via prior work. This is a standard empirical auditing study whose claims are falsifiable against the reported baselines and do not exhibit the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The retrained model without the forgotten data serves as the correct reference for complete forgetting.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Mirage evaluates unlearning relative to a retrained-from-scratch baseline and formalizes the forgetting gap... LPR(Θ) = max h∈Hlin Ex∈D [1[h(ϕl(x)) = 1[y∈Yu]]]
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Feature Separability Scoring... F = ∥μu − μr∥² / (tr(Σu) + tr(Σr))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Understanding intermediate layers using linear classifier probes
Alain, G., Bengio, Y.: Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Compu- tational Linguistics48(1), 207–219 (2022)
Belinkov, Y.: Probing classifiers: Promises, shortcomings, and advances. Compu- tational Linguistics48(1), 207–219 (2022)
work page 2022
-
[3]
In: 2021 IEEE symposium on security and privacy (SP)
Bourtoule, L., Chandrasekaran, V., Choquette-Choo, C.A., Jia, H., Travers, A., Zhang, B., Lie, D., Papernot, N.: Machine unlearning. In: 2021 IEEE symposium on security and privacy (SP). pp. 141–159. IEEE (2021)
work page 2021
-
[4]
In: 2015 IEEE symposium on security and privacy
Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE symposium on security and privacy. pp. 463–480. IEEE (2015)
work page 2015
-
[5]
In: International conference on machine learning
Che, T., Zhou, Y., Zhang, Z., Lyu, L., Liu, J., Yan, D., Dou, D., Huan, J.: Fast federated machine unlearning with nonlinear functional theory. In: International conference on machine learning. pp. 4241–4268. PMLR (2023)
work page 2023
-
[6]
In: Proceedings of the 58th annual meeting of the association for computational linguistics
Chen, J., Yang, Z., Yang, D.: Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics. pp. 2147–2157 (2020)
work page 2020
-
[7]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Chen, M., Gao, W., Liu, G., Peng, K., Wang, C.: Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7766–7775 (2023)
work page 2023
-
[8]
Chowdhury, M.E., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M.A., Mahbub, Z.B., Islam, K.R., Khan, M.S., Iqbal, A., Al Emadi, N., et al.: Can ai help in screening viral and covid-19 pneumonia? Ieee Access8, 132665–132676 (2020)
work page 2020
-
[9]
IEEE Transactions on Information Forensics and Security18, 2345– 2354 (2023)
Chundawat, V.S., Tarun, A.K., Mandal, M., Kankanhalli, M.: Zero-shot machine unlearning. IEEE Transactions on Information Forensics and Security18, 2345– 2354 (2023)
work page 2023
-
[10]
In: Proceedings of the AAAI conference on artificial intelligence
Foster, J., Schoepf, S., Brintrup, A.: Fast machine unlearning without retraining through selective synaptic dampening. In: Proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 12043–12051 (2024)
work page 2024
-
[11]
Advances in neural information processing systems32(2019)
Ginart, A., Guan, M., Valiant, G., Zou, J.Y.: Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems32(2019)
work page 2019
-
[12]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Golatkar, A., Achille, A., Soatto, S.: Eternal sunshine of the spotless net: Selec- tive forgetting in deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9304–9312 (2020)
work page 2020
-
[13]
In: Proceedings of the AAAI conference on artificial intelligence
Graves, L., Nagisetty, V., Ganesh, V.: Amnesiac machine learning. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 11516–11524 (2021)
work page 2021
-
[14]
In: The Fourteenth International Conference on Learning Representations (2026)
Gu, H., Tae, H.X., Fan, L., Chan, C.S.: Towards privacy-guaranteed label unlearn- ing in vertical federated learning: Few-shot forgetting without disclosure. In: The Fourteenth International Conference on Learning Representations (2026)
work page 2026
-
[15]
In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)
Hayes, J., Shumailov, I., Triantafillou, E., Khalifa, A., Papernot, N.: Inexact un- learning needs more careful evaluations to avoid a false sense of privacy. In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). pp. 497–519. IEEE (2025)
work page 2025
-
[16]
He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[17]
Advances in Neural Information Processing Systems36, 51584–51605 (2023) 16 Yu et al
Jia, J., Liu, J., Ram, P., Yao, Y., Liu, G., Liu, Y., Sharma, P., Liu, S.: Model spar- sity can simplify machine unlearning. Advances in Neural Information Processing Systems36, 51584–51605 (2023) 16 Yu et al
work page 2023
-
[18]
In: International conference on machine learning
Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: International conference on machine learning. pp. 1885–1894. PMLR (2017)
work page 2017
-
[19]
In: International conference on machine learning
Kornblith, S., Norouzi, M., Lee, H., Hinton, G.: Similarity of neural network rep- resentations revisited. In: International conference on machine learning. pp. 3519–
-
[20]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
work page 2009
-
[21]
Advances in neural information processing systems36, 1957– 1987 (2023)
Kurmanji, M., Triantafillou, P., Hayes, J., Triantafillou, E.: Towards unbounded machine unlearning. Advances in neural information processing systems36, 1957– 1987 (2023)
work page 1957
-
[22]
Proceedings of the IEEE86(11), 2278–2324 (2002)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), 2278–2324 (2002)
work page 2002
-
[23]
In: 2021 IEEE/ACM 29th in- ternational symposium on quality of service (IWQOS)
Liu, G., Ma, X., Yang, Y., Wang, C., Liu, J.: Federaser: Enabling efficient client- level data removal from federated learning models. In: 2021 IEEE/ACM 29th in- ternational symposium on quality of service (IWQOS). pp. 1–10. IEEE (2021)
work page 2021
-
[24]
Proceedings of the National Academy of Sciences117(40), 24652–24663 (2020)
Papyan, V., Han, X., Donoho, D.L.: Prevalence of neural collapse during the ter- minal phase of deep learning training. Proceedings of the National Academy of Sciences117(40), 24652–24663 (2020)
work page 2020
-
[25]
Computers in biology and medicine132, 104319 (2021)
Rahman, T., Khandakar, A., Qiblawey, Y., Tahir, A., Kiranyaz, S., Kashem, S.B.A., Islam, M.T., Al Maadeed, S., Zughaier, S.M., Khan, M.S., et al.: Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images. Computers in biology and medicine132, 104319 (2021)
work page 2021
-
[26]
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks againstmachinelearningmodels.In:2017IEEEsymposiumonsecurityandprivacy (SP). pp. 3–18. IEEE (2017)
work page 2017
-
[27]
In: Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security
Song, C., Ristenpart, T., Shmatikov, V.: Machine learning models that remember too much. In: Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security. pp. 587–601 (2017)
work page 2017
-
[28]
IEEE transactions on neural networks and learning systems 35(9), 13046–13055 (2023)
Tarun, A.K., Chundawat, V.S., Mandal, M., Kankanhalli, M.: Fast yet effective machine unlearning. IEEE transactions on neural networks and learning systems 35(9), 13046–13055 (2023)
work page 2023
-
[29]
In: 31st USENIX security symposium (USENIX Security 22)
Thudi, A., Jia, H., Shumailov, I., Papernot, N.: On the necessity of auditable algo- rithmic definitions for machine unlearning. In: 31st USENIX security symposium (USENIX Security 22). pp. 4007–4022 (2022)
work page 2022
-
[30]
arXiv preprint arXiv:2501.13683 (2025)
Varshney, A.K., Vandikas, K., Torra, V.: Unlearning clients, features and samples in vertical federated learning. arXiv preprint arXiv:2501.13683 (2025)
-
[31]
Split learning for health: Distributed deep learning without sharing raw patient data
Vepakomma, P., Gupta, O., Swedish, T., Raskar, R.: Split learning for health: Distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
IEEE Transactions on Cognitive Communications and Networking (2025)
Wang, J., Lin, Y., Niyato, D., Gao, Z., Du, H., Zhang, T., Tang, X., Fan, J., Han, Z.: A zero-shot federated unlearning framework with stability verification. IEEE Transactions on Cognitive Communications and Networking (2025)
work page 2025
-
[33]
ACM Transactions on Internet Technology24(2), 1–22 (2024)
Wang, Z., Gao, X., Wang, C., Cheng, P., Chen, J.: Efficient vertical federated unlearning via fast retraining. ACM Transactions on Internet Technology24(2), 1–22 (2024)
work page 2024
-
[34]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1912–1920 (2015)
work page 1912
-
[35]
IEEE Transactions on Privacy 2, 131–143 (2025) Mirage 17
Yang, W., Al-Masri, E., Kotevska, O.: Mic-dp: A scalable correlation-aware differ- ential privacy framework for high-dimensional data. IEEE Transactions on Privacy 2, 131–143 (2025) Mirage 17
work page 2025
-
[36]
In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)
Yu, Z., Chan, C.S.: Yuan: Yielding unblemished aesthetics through a unified net- work for visual imperfections removal in generated images. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)
work page 2025
-
[37]
Yu, Z., Han, L., Wang, P., IDRIS, M.Y.I., Xiang, Y.: Instantforget: Training-free functionalfeatureunlearningviasubspaceprojectionandinference-timesmoothing (2025)
work page 2025
-
[38]
Engineering Applications of Artificial Intelligence161, 112087 (2025)
Yu, Z., Idris, M.Y.I., Wang, P., Xia, Y., Xiang, Y.: Forgetme: Benchmarking the selective forgetting capabilities of generative models. Engineering Applications of Artificial Intelligence161, 112087 (2025)
work page 2025
-
[39]
Verification of Machine Unlearning is Fragile
Zhang, B., Chen, Z., Shen, C., Li, J.: Verification of machine unlearning is fragile. arXiv preprint arXiv:2408.00929 (2024) Mirage 1 A1 Additional t-SNE Visualizations We provide t-SNE visualizations of bottom-model features for all remaining datasets. In each panel, blue points represent retained classes and red points represent the forgotten class. Acr...
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.