Recognition: no theorem link
Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
Pith reviewed 2026-05-12 04:02 UTC · model grok-4.3
The pith
The prediction of forgotten classes can be suppressed by decreasing bias terms in the final classification head.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Retain-set-only optimization tends to reduce the biases of absent classes because of gradient flow under softmax cross-entropy, allowing the prediction of forgotten classes to be suppressed simply by decreasing the corresponding bias terms in the classification head; BiasShift demonstrates this shortcut while TS-BGRM and LB-HR mitigate it and BSC, MBG, MBS track the resulting bias stability.
What carries the argument
Bias terms in the final classification head, whose reduction under retain-set optimization suppresses predictions for absent classes via analyzed softmax cross-entropy gradient dynamics.
Load-bearing premise
Retain-set-only optimization tends to reduce the biases of absent classes due to the analyzed gradient dynamics under softmax cross-entropy.
What would settle it
Measure the change in classification-head bias values for absent classes after retain-set-only training on a pre-trained model and check whether they decrease substantially.
Figures
read the original abstract
Class-level machine unlearning aims to remove the influence of specified classes while preserving model utility on retained classes. Existing methods are commonly evaluated by retain-set accuracy, forget-set accuracy, and unlearning time, but these metrics provide limited insight into how forgetting is achieved internally. In this paper, we reveal a bias-dominated shortcut in class-level unlearning: the prediction of forgotten classes can be suppressed by decreasing the corresponding bias terms in the final classification head. We first analyze the gradient dynamics of classification-head biases under softmax cross-entropy training, explaining why retain-set-only optimization tends to reduce the biases of absent classes. Based on this observation, we introduce BiasShift as a diagnostic baseline, showing that simple bias manipulation can satisfy conventional unlearning metrics while leaving abnormal bias patterns that reveal forgotten labels. To mitigate excessive forgotten-class bias suppression, we propose two bias-aware mechanisms, namely Two-Stage Bias Gradient Reversal Mechanism (TS-BGRM) and Lower-Bound Hinge Regularization (LB-HR). We further introduce three bias-oriented metrics, including Bias Stability Coefficient (BSC), Median Bias Gap (MBG), and Minimal Bias Score (MBS), to quantify bias dependence and potential leakage. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that the proposed methods maintain competitive unlearning performance while producing more stable bias distributions. We have released our code at {https://github.com/zwd2024/Beyond-the-Shadow-of-Bias-From-Classification-Head-Bias-to-Parameter-Redistribution}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that class-level machine unlearning frequently exploits a bias-dominated shortcut: predictions for forgotten classes are suppressed simply by decreasing the corresponding bias terms in the final classification head. It derives this from the gradient dynamics of softmax cross-entropy, where the bias update for an absent class equals p_c (strictly positive under retain-set-only optimization), independent of feature-extractor changes. The authors introduce BiasShift as a diagnostic baseline that satisfies standard retain/forget accuracy metrics via bias manipulation alone while exposing abnormal bias patterns, propose TS-BGRM and LB-HR to limit excessive bias suppression, and define three new bias-oriented metrics (BSC, MBG, MBS). Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show competitive unlearning performance with more stable bias distributions; code is released.
Significance. If the central observation holds, the work is significant for exposing a mechanistic shortcut that conventional metrics overlook, thereby motivating more robust evaluation and mitigation in unlearning. Strengths include the direct, parameter-free gradient analysis (bias gradient = p_c - y_c) that explains the phenomenon without circular fitting, the empirical demonstration that BiasShift alone can satisfy existing metrics, the introduction of bias-aware mitigations and metrics, and the public code release for reproducibility. This could shift unlearning research toward internal-mechanism diagnostics rather than surface-level accuracy checks.
minor comments (4)
- The definitions and exact computation of the new metrics BSC, MBG, and MBS (introduced after the mitigation methods) should include explicit formulas or pseudocode in the main text or appendix to ensure immediate reproducibility.
- The experimental section would benefit from reporting standard deviations or confidence intervals across multiple random seeds for both accuracy and the proposed bias metrics, as single-run results limit assessment of stability claims.
- The hyperparameter choices for the free parameters in TS-BGRM (stage thresholds, reversal strengths) and LB-HR (hinge lower-bound) are listed but their sensitivity analysis or selection procedure could be expanded for clarity.
- Figure captions and axis labels for bias-distribution plots should explicitly reference the new metrics (BSC/MBG/MBS) to connect visuals directly to the quantitative claims.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of our work, the recognition of its significance in exposing the bias-dominated shortcut in class-level unlearning, and the recommendation for minor revision. We appreciate the acknowledgment of the gradient analysis, BiasShift diagnostic, proposed mitigations, new metrics, and code release.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper's central analysis derives the bias gradient as p_c - y_c under softmax cross-entropy and shows that retain-set-only optimization reduces absent-class biases because the update is independent of features when y_c=0. This follows directly from standard loss mathematics with no fitting to target results or self-referential definitions. BiasShift is introduced as an explicit diagnostic baseline that satisfies conventional metrics via bias manipulation alone, exposing the shortcut rather than predicting it. The proposed TS-BGRM, LB-HR, and new metrics (BSC, MBG, MBS) are defined independently to quantify and mitigate bias dependence without reducing to any fitted quantities or prior self-citations. No load-bearing step collapses to an input by construction, and the argument remains externally falsifiable via the released code and standard benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- stage thresholds and reversal strengths in TS-BGRM
- hinge lower-bound value in LB-HR
axioms (1)
- domain assumption Gradient dynamics of classification-head biases under softmax cross-entropy training cause retain-set-only optimization to reduce biases of absent classes
Reference graph
Works this paper leans on
-
[1]
General data protection regulation (gdpr),
European Union, “General data protection regulation (gdpr),” 2016. [Online]. Available: https://eur-lex.europa.eu/ legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
work page 2016
-
[2]
California consumer privacy act (ccpa),
California Department of Justice, “California consumer privacy act (ccpa),” 2018. [Online]. Available: https://oag.ca.gov/privacy/ ccpa
work page 2018
-
[3]
Data security law of the people’s republic of china,
Standing Committee of the National People’s Congress, “Data security law of the people’s republic of china,” National People’s Congress Website, 2021, [Online; accessed 2023-10- 01]. [Online]. Available: http://www.npc.gov.cn/npc/c2/c30834/ 202106/t20210610_311888.html
work page 2021
-
[4]
Class Unlearning via Depth-Aware Removal of Forget-Specific Directions
A. Hatami, R. Aalishah, and I. E. Monosov, “Class unlearning via depth-aware removal of forget-specific directions,”arXiv preprint arXiv:2604.15166, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
Partially blinded unlearning: Class unlearning for deep networks from bayesian perspective,
S. Panda, S. Souravet al., “Partially blinded unlearning: Class unlearning for deep networks from bayesian perspective,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 6, 2025, pp. 6372–6380
work page 2025
-
[6]
An Illusion of Unlearning? Assessing Machine Unlearning Through Internal Representations
Y. Gao, A. Unal, A. Rangamani, and Z. Zhu, “An illusion of unlearning? assessing machine unlearning through internal rep- resentations,”arXiv preprint arXiv:2604.08271, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[7]
Machine unlearning on pre-trained models by residual feature alignment using lora,
L. Qin, T. Zhu, L. Wang, and W. Zhou, “Machine unlearning on pre-trained models by residual feature alignment using lora,” IEEE Transactions on Dependable and Secure Computing, 2026
work page 2026
-
[8]
Fast yet effective machine unlearning,
A. K. Tarun, V . S. Chundawat, M. Mandal, and M. Kankanhalli, “Fast yet effective machine unlearning,”IEEE transactions on neural networks and learning systems, vol. 35, no. 9, pp. 13 046–13 055, 2023
work page 2023
-
[9]
Can bad teaching induce forgetting? unlearning in deep net- works using an incompetent teacher,
V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Can bad teaching induce forgetting? unlearning in deep net- works using an incompetent teacher,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 7210– 7217
work page 2023
-
[10]
Decoupled distillation to erase: A general unlearning method for any class-centric tasks,
Y. Zhou, D. Zheng, Q. Mo, R. Lu, K.-Y. Lin, and W.-S. Zheng, “Decoupled distillation to erase: A general unlearning method for any class-centric tasks,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 20 350–20 359
work page 2025
-
[11]
V . S. Chundawat, A. K. Tarun, M. Mandal, and M. Kankanhalli, “Zero-shot machine unlearning,”IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2345–2354, 2023
work page 2023
-
[12]
L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearn- ing,” in2021 IEEE symposium on security and privacy (SP). IEEE, 2021, pp. 141–159
work page 2021
-
[13]
Arcane: An efficient architecture for exact machine unlearning
H. Yan, X. Li, Z. Guo, H. Li, F. Li, and X. Lin, “Arcane: An efficient architecture for exact machine unlearning.” inIjcai, vol. 6, 2022, p. 19
work page 2022
-
[14]
Understanding black-box predictions via influence functions,
P . W. Koh and P . Liang, “Understanding black-box predictions via influence functions,” inInternational conference on machine learning. PMLR, 2017, pp. 1885–1894
work page 2017
-
[15]
Our data, ourselves: Privacy via distributed noise generation,
C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data, ourselves: Privacy via distributed noise generation,” inAnnual international conference on the theory and applications of cryptographic techniques. Springer, 2006, pp. 486–503
work page 2006
-
[16]
Puma: Performance unchanged model augmentation for training data removal,
G. Wu, M. Hashemi, and C. Srinivasa, “Puma: Performance unchanged model augmentation for training data removal,” in Proceedings of the AAAI conference on artificial intelligence, vol. 36, no. 8, 2022, pp. 8675–8682
work page 2022
-
[17]
Eternal sunshine of the spotless net: Selective forgetting in deep networks,
A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9304–9312
work page 2020
-
[18]
Mixed-privacy forgetting in deep networks,
A. Golatkar, A. Achille, A. Ravichandran, M. Polito, and S. Soatto, “Mixed-privacy forgetting in deep networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 792–801
work page 2021
-
[19]
Accurate and fast machine unlearning with hessian- guided overfitting approximation,
W. Zheng, W. Zhang, K. Chen, T. Liang, F. Yang, H. Lu, and Y. Pang, “Accurate and fast machine unlearning with hessian- guided overfitting approximation,”Neurocomputing, p. 133369, 2026
work page 2026
-
[20]
Model sparsity can simplify machine unlearning,
J. Jia, J. Liu, P . Ram, Y. Yao, G. Liu, Y. Liu, P . Sharma, and S. Liu, “Model sparsity can simplify machine unlearning,”Advances in Neural Information Processing Systems, vol. 36, pp. 51 584–51 605, 2023
work page 2023
-
[21]
D. Choi and D. Na, “Towards machine unlearning benchmarks: Forgetting the personal identities in facial recognition systems,” arXiv preprint arXiv:2311.02240, 2023
-
[22]
arXiv preprint arXiv:2310.12508 (2023)
C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, and S. Liu, “Salun: Em- powering machine unlearning via gradient-based weight saliency in both image classification and generation,”arXiv preprint arXiv:2310.12508, 2023
-
[23]
To- wards unbounded machine unlearning,
M. Kurmanji, P . Triantafillou, J. Hayes, and E. Triantafillou, “To- wards unbounded machine unlearning,”Advances in neural infor- mation processing systems, vol. 36, pp. 1957–1987, 2023
work page 1957
-
[24]
Fast machine unlearning without retraining through selective synaptic dampening,
J. Foster, S. Schoepf, and A. Brintrup, “Fast machine unlearning without retraining through selective synaptic dampening,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 11, 2024, pp. 12 043–12 051
work page 2024
-
[25]
Machine unlearning for random forests,
J. Brophy and D. Lowd, “Machine unlearning for random forests,” inInternational conference on machine learning. PMLR, 2021, pp. 1092–1104
work page 2021
-
[26]
Federated unlearning via class- discriminative pruning,
J. Wang, S. Guo, X. Xie, and H. Qi, “Federated unlearning via class- discriminative pruning,” inProceedings of the ACM web conference 2022, 2022, pp. 622–632
work page 2022
-
[27]
Blockful: Enabling unlearning in blockchained federated learning,
X. Liu, M. Li, G. Yu, X. Wang, W. Ni, L. Li, H. Peng, and R. P . Liu, “Blockful: Enabling unlearning in blockchained federated learning,”IEEE Transactions on Information Forensics and Security, 2025
work page 2025
-
[28]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,”arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang, “Graph unlearning,” inProceedings of the 2022 ACM SIGSAC conference on computer and communications security, 2022, pp. 499–513
work page 2022
-
[30]
Single image unlearning: Efficient machine unlearning in mul- timodal large language models,
J. Li, Q. Wei, C. Zhang, G. Qi, M. Du, Y. Chen, S. Bi, and F. Liu, “Single image unlearning: Efficient machine unlearning in mul- timodal large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 35 414–35 453, 2024
work page 2024
-
[31]
Large language model unlearning,
Y. Yao and X. Xu, “Large language model unlearning,”Advances in Neural Information Processing Systems, vol. 37, pp. 105 425–105 475, 2024
work page 2024
-
[32]
Machine unlearning of pre-trained large language models,
J. Yao, E. Chien, M. Du, X. Niu, T. Wang, Z. Cheng, and X. Yue, “Machine unlearning of pre-trained large language models,” in Proceedings of the 62nd annual meeting of the association for computa- tional linguistics (volume 1: Long papers), 2024, pp. 8403–8419. 16
work page 2024
-
[33]
Rethinking machine unlearning for large language models,
S. Liu, Y. Yao, J. Jia, S. Casper, N. Baracaldo, P . Hase, Y. Yao, C. Y. Liu, X. Xu, H. Liet al., “Rethinking machine unlearning for large language models,”Nature Machine Intelligence, vol. 7, no. 2, pp. 181–194, 2025
work page 2025
-
[34]
Cifar-10 (canadian institute for advanced research),
A. Krizhevsky, V . Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”URL http://www. cs. toronto. edu/kriz/cifar. html, vol. 5, no. 4, p. 1, 2010
work page 2010
-
[35]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.