arxiv: 2604.08111 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.CV

Recognition: unknown

Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?

Yunusa Haruna , Adamu Lawan , Ibrahim Haruna Abdulhamid , Hamza Mohammed Dauda , Jiaquan Zhang , Chaoning Zhang , Shamsuddeen Hassan Muhammad

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:11 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords machine unlearningbias redistributionCLIPdemographic fairnessCelebAgender biaszero-shot classificationvisual models

0 comments

The pith

Machine unlearning in CLIP models does not remove demographic bias but shifts it along gender lines instead of age lines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether selectively forgetting a demographic group in visual models like CLIP actually neutralizes the associated concept or merely moves its influence to correlated groups. Experiments on CelebA with age-gender intersections show that unlearning the dominant Young Female group reliably improves zero-shot accuracy on Old Female across three model scales and three unlearning methods. This transfer occurs because the embedding space organizes concepts more strongly by gender than by age. The finding implies that standard unlearning approaches can amplify disparities in retained subgroups rather than achieving neutral forgetting.

Core claim

Unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance.

What carries the argument

The redistribution score, which measures per-group accuracy shifts and demographic parity gaps after unlearning, exposes the gender-dominant structure that organizes CLIP's embedding space.

If this is right

Unlearning one intersectional group can increase accuracy on a correlated group of the same gender.
Refusal Vector reduces the amount of redistribution compared with Prompt Erasure and Prompt Reweighting.
Current unlearning methods risk amplifying bias in retained groups when embedding geometry is ignored.
Complete forgetting without performance degradation on other groups remains unachieved by the tested methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fairness evaluations of unlearning should include explicit checks for cross-group performance transfers rather than only measuring removal success.
Embedding-space geometry may need to be directly modeled or regularized in future unlearning algorithms to isolate concepts more cleanly.
The gender-dominant pattern could appear in other multimodal models whose training data contain similar demographic correlations.

Load-bearing premise

The observed performance transfers reflect a general property of the embedding geometry rather than artifacts of the three specific unlearning methods, the redistribution score definition, or the age-gender correlations present in CelebA.

What would settle it

Repeating the unlearning experiments on a dataset where age and gender attributes are statistically independent and finding no consistent accuracy gain on the Old Female group after unlearning Young Female would falsify the redistribution claim.

Figures

Figures reproduced from arXiv: 2604.08111 by Adamu Lawan, Chaoning Zhang, Hamza Mohammed Dauda, Ibrahim Haruna Abdulhamid, Jiaquan Zhang, Shamsuddeen Hassan Muhammad, Yunusa Haruna.

**Figure 2.** Figure 2: t-SNE [25] projections of CLIP ViT-B/32 image embeddings (500 samples per group) before and after each method. Top row: colored by demographic group. Bottom row: colored by prediction correctness (green = correct, red = misclassified). Arrows indicate centroid drift relative to the Original. The proximity of YF and OF clusters explains why forgetting YF redistributes to OF rather than YM [PITH_FULL_IMAGE:… view at source ↗

**Figure 3.** Figure 3: Demographic Parity Gap (DP) before and after unlearn [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Patch similarity heatmaps for one representative face per demographic group across four conditions (Original, PE, PR, RV). Each [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Pairwise cosine similarity between group mean im [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Forget–Fairness tradeoff curve. FA (x-axis, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Machine unlearning enables models to selectively forget training data, driven by privacy regulations such as GDPR and CCPA. However, its fairness implications remain underexplored: when a model forgets a demographic group, does it neutralize that concept or redistribute it to correlated groups, potentially amplifying bias? We investigate this bias redistribution phenomenon on CelebA using CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) under a zero-shot classification setting across intersectional groups defined by age and gender. We evaluate three unlearning methods, Prompt Erasure, Prompt Reweighting, and Refusal Vector using per-group accuracy shifts, demographic parity gaps, and a redistribution score. Our results show that unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance. These findings highlight a fundamental limitation of current unlearning methods: without accounting for embedding geometry, they risk amplifying bias in retained groups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Unlearning on CLIP for CelebA shifts rather than erases bias, with young-female forgetting boosting old-female accuracy, but the gender-geometry claim needs controls for data correlations and method artifacts.

read the letter

The paper's core observation is that machine unlearning does not simply neutralize a demographic concept in CLIP zero-shot classifiers. Instead, forgetting the young female group on CelebA consistently improves accuracy on the old female group across ViT-B/32, ViT-L/14, and ViT-B/16, while the shift runs more along gender than age boundaries. They document this with per-group accuracy, demographic parity gaps, and a redistribution score for three methods: Prompt Erasure, Prompt Reweighting, and Refusal Vector. Refusal Vector reduces the redistribution but still leaves incomplete forgetting and hurts retained performance overall. That pattern is the useful new piece: it flags a practical fairness side-effect that prior unlearning work on accuracy or privacy alone had not highlighted for intersectional visual groups.

Referee Report

2 major / 3 minor

Summary. The manuscript investigates bias redistribution in machine unlearning for CLIP vision-language models on the CelebA dataset under zero-shot classification. It evaluates three unlearning methods (Prompt Erasure, Prompt Reweighting, Refusal Vector) across ViT variants and intersectional age-gender groups, reporting that unlearning does not eliminate bias but redistributes it primarily along gender boundaries. In particular, forgetting the dominant Young Female group transfers performance to Old Female, which the authors interpret as evidence of a gender-dominant structure in CLIP's embedding space. Metrics include per-group accuracy shifts, demographic parity gaps, and a redistribution score.

Significance. If the gender-dominant redistribution pattern is shown to be intrinsic to embedding geometry rather than method- or data-specific, the result would be significant for fairness research in unlearning and multimodal models. The work merits credit for its consistent empirical patterns across three model scales and three distinct unlearning methods, which strengthens the reporting of the observed transfers. However, the broader implications for embedding geometry and method limitations remain provisional without further controls.

major comments (2)

[Results] The central claim that unlearning reveals a 'gender-dominant structure in CLIP's embedding space' (abstract) is load-bearing on the interpretation of the Young Female to Old Female performance transfer. This interpretation is not yet supported by ablations that would isolate embedding geometry from CelebA label correlations or from artifacts introduced by the three chosen unlearning methods.
[Evaluation Metrics] The redistribution score and accuracy-shift results are presented without reported statistical significance tests, exact formula for the score, or controls for confounding factors in the age-gender intersections (abstract and implied experimental sections). These omissions directly affect the strength of the redistribution-along-gender conclusion.

minor comments (3)

Details on data splits, training/validation protocols, and exact hyperparameter choices for the three unlearning methods are missing and would improve reproducibility.
The manuscript would benefit from a table or figure explicitly showing the redistribution score for all group pairs and all methods to allow direct comparison of gender vs. age effects.
Clarify whether the demographic parity gaps are computed on the full test set or only on retained groups after unlearning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thorough review and valuable feedback on our manuscript. We have carefully considered each of the major comments and provide point-by-point responses below. We outline the revisions we intend to make to address the concerns raised.

read point-by-point responses

Referee: [Results] The central claim that unlearning reveals a 'gender-dominant structure in CLIP's embedding space' (abstract) is load-bearing on the interpretation of the Young Female to Old Female performance transfer. This interpretation is not yet supported by ablations that would isolate embedding geometry from CelebA label correlations or from artifacts introduced by the three chosen unlearning methods.

Authors: We thank the referee for highlighting this important aspect of our interpretation. The observed performance transfer from Young Female to Old Female is consistent across three different unlearning methods and three model scales, which we believe provides evidence against method-specific artifacts. Regarding CelebA label correlations, the fact that the transfer occurs within the same gender but across age groups, rather than to male groups, supports our gender-dominant structure hypothesis. However, we acknowledge that explicit ablations isolating embedding geometry would further strengthen this claim. In the revised manuscript, we will add a new subsection discussing potential confounding factors from dataset correlations and include an analysis of embedding similarities between group prototypes to better isolate the geometric effects. revision: partial
Referee: [Evaluation Metrics] The redistribution score and accuracy-shift results are presented without reported statistical significance tests, exact formula for the score, or controls for confounding factors in the age-gender intersections (abstract and implied experimental sections). These omissions directly affect the strength of the redistribution-along-gender conclusion.

Authors: We agree that including statistical significance tests and the exact formula for the redistribution score will improve the rigor of our results. We will explicitly provide the formula for the redistribution score in the main text. We will also report p-values using paired statistical tests for accuracy shifts across runs and add controls by analyzing the correlation between group transfers and label co-occurrences in CelebA. These changes will be incorporated in the revised experimental and results sections. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or self-referential quantities

full rationale

The paper conducts an empirical study evaluating three unlearning methods (Prompt Erasure, Prompt Reweighting, Refusal Vector) on CLIP ViT variants using CelebA intersectional groups. It reports per-group accuracy shifts, demographic parity gaps, and a redistribution score. No equations, derivations, fitted parameters, or self-citations are invoked as load-bearing steps in any claimed result. The central observation (performance transfer from Young Female to Old Female) is presented as a measured outcome rather than a quantity forced by definition or prior self-citation. The work is self-contained against external benchmarks and does not reduce any prediction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine learning evaluation assumptions (accuracy and demographic parity as valid proxies) and the representativeness of CelebA for demographic studies. No free parameters or invented entities are introduced; the redistribution score appears to be a derived metric from existing measures.

axioms (1)

domain assumption The three evaluated unlearning methods are representative of current approaches for studying bias effects.
Conclusions about unlearning in general are drawn from results on Prompt Erasure, Prompt Reweighting, and Refusal Vector.

pith-pipeline@v0.9.0 · 5544 in / 1285 out tokens · 87458 ms · 2026-05-10T17:11:52.903543+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 6 canonical work pages

[1]

Re- fusal in language models is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, and Neel Nanda. Re- fusal in language models is mediated by a single direction. Advances in Neural Information Processing Systems, 37: 136037–136083, 2024. 4

2024
[2]

Leace: Perfect linear concept erasure in closed form

Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, and Stella Biderman. Leace: Perfect linear concept erasure in closed form. InAdvances in Neural Information Processing Systems, pages 66044– 66063, 2023. 2, 4, 7

2023
[3]

Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pages 141–159. IEEE, 2021. 1, 2, 3

2021
[4]

Gender shades: Inter- sectional accuracy disparities in commercial gender classifi- cation

Joy Buolamwini and Timnit Gebru. Gender shades: Inter- sectional accuracy disparities in commercial gender classifi- cation. InProceedings of the Conference on Fairness, Ac- countability and Transparency (FAT*), pages 77–91. PMLR,
[5]

Deep unlearn: Benchmarking machine unlearning for image classification

Xavier F Cadet, Anastasia Borovykh, Mohammad Malekzadeh, Sara Ahmadi-Abhari, and Hamed Had- dadi. Deep unlearn: Benchmarking machine unlearning for image classification. In2025 IEEE 10th European Symposium on Security and Privacy (EuroS&P), pages 939–962. IEEE, 2025. 2

2025
[6]

Towards making systems for- get with machine unlearning

Yinzhi Cao and Junfeng Yang. Towards making systems for- get with machine unlearning. In2015 IEEE Symposium on Security and Privacy (SP), pages 463–480. IEEE, 2015. 1, 2

2015
[7]

Group-robust machine un- learning.arXiv preprint arXiv:2503.09330, 2025

Thomas De Min, Subhankar Roy, St´ephane Lathuili`ere, Elisa Ricci, and Massimiliano Mancini. Group-robust machine un- learning.arXiv preprint arXiv:2503.09330, 2025. 3

work page arXiv 2025
[8]

Calibrating noise to sensitivity in private data analy- sis

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analy- sis. InTheory of cryptography conference, pages 265–284. Springer, 2006. 3

2006
[9]

Fairness through awareness

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Rein- gold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012. 3

2012
[10]

arXiv preprint arXiv:2310.12508 (2023)

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Den- nis Wei, and Sijia Liu. Salun: Empowering machine unlearn- ing via gradient-based weight saliency in both image classi- fication and generation.arXiv preprint arXiv:2310.12508,

work page arXiv
[11]

Fast machine unlearning without retraining through selective synaptic dampening

Jack Foster, Stefan Schoepf, and Alexandra Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, pages 12043–12051, 2024. 2

2024
[12]

Erasing concepts from diffu- sion models

Rohit Gandikota, Joanna Materzynska, Jaden Fiotto- Kaufman, and David Bau. Erasing concepts from diffu- sion models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2426–2436,
[13]

Amne- siac machine learning

Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Amne- siac machine learning. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 11516–11524, 2021. 2, 3

2021
[14]

Facet: Fairness in computer vision evaluation benchmark

Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Du- val, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, and Can- dace Ross. Facet: Fairness in computer vision evaluation benchmark. InProceedings of the IEEE/CVF international conference on computer vision, pages 20370–20382, 2023. 2

2023
[15]

Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation

Kimmo Karkkainen and Jungseock Joo. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1548–1558, 2021. 2

2021
[16]

Preventing fairness gerrymandering: Auditing and learning for subgroup fairness

Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. InProceedings of the In- ternational Conference on Machine Learning (ICML), pages 2564–2572. PMLR, 2018. 2, 3

2018
[17]

Towards unbounded machine unlearn- ing

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearn- ing. InAdvances in Neural Information Processing Systems, pages 1957–1987, 2023. 2, 3, 8

1957
[18]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE International Conference on Computer Vision (ICCV), pages 3730–3738, 2015. 2, 3, 4

2015
[19]

A survey on feder- ated unlearning: Challenges, methods, and future directions

Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, and Xiaoning Liu. A survey on feder- ated unlearning: Challenges, methods, and future directions. ACM Computing Surveys, 57(1):1–38, 2024. 2

2024
[20]

Fairclip: Har- nessing fairness in vision-language learning

Yan Luo, Min Shi, Muhammad Osama Khan, Muham- mad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, et al. Fairclip: Har- nessing fairness in vision-language learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12289–12301, 2024. 3

2024
[21]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 202...

2021
[22]

Null it out: Guarding protected at- tributes by iterative nullspace projection

Shauli Ravfogel, Yanai Elazar, Hila Gonen, Michael Twiton, and Yoav Goldberg. Null it out: Guarding protected at- tributes by iterative nullspace projection. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 7237–7256, 2020. 2

2020
[23]

Dear: Debiasing vision-language models with additive residuals

Ashish Seth, Mayur Hemani, and Chirag Agarwal. Dear: Debiasing vision-language models with additive residuals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6820–6829, 2023. 3

2023
[24]

Datasetagent: A novel multi-agent system for auto-constructing datasets from real-world images

Haoran Sun, Haoyu Bian, Shaoning Zeng, Yunbo Rao, Xu Xu, Lin Mei, and Jianping Gou. Datasetagent: A novel multi-agent system for auto-constructing datasets from real- world images.arXiv preprint arXiv:2507.08648, 2025. 3

work page arXiv 2025
[25]

Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (11), 2008. 6

2008
[26]

Springer International Publishing, Cham, 1 edition, 2017

Paul V oigt and Axel V on dem Bussche.The EU General Data Protection Regulation (GDPR): A Practical Guide. Springer International Publishing, Cham, 1 edition, 2017. 1, 3, 8

2017
[27]

Fairclip: So- cial bias elimination based on attribute prototype learn- ing and representation neutralization.arXiv preprint arXiv:2210.14562, 2022

Junyang Wang, Yi Zhang, and Jitao Sang. Fairclip: So- cial bias elimination based on attribute prototype learn- ing and representation neutralization.arXiv preprint arXiv:2210.14562, 2022. 3

work page arXiv 2022
[28]

Sd-psfnet: Sequential and dynamic point spread function network for image deraining

Jiayu Wang, Haoyu Bian, Haoran Sun, and Shaoning Zeng. Sd-psfnet: Sequential and dynamic point spread function network for image deraining. InProceedings of the AAAI Conference on Artificial Intelligence, pages 9921–9929,
[29]

Spiking vision transformer with saccadic attention.arXiv preprint arXiv:2502.12677, 2025

Shuai Wang, Malu Zhang, Dehao Zhang, Ammar Bela- treche, Yichen Xiao, Yu Liang, Yimeng Shan, Qian Sun, Enqi Zhang, and Yang Yang. Spiking vision transformer with saccadic attention.arXiv preprint arXiv:2502.12677, 2025. 3

work page arXiv 2025
[30]

Machine unlearning of features and labels

Alexander Warnecke et al. Machine unlearning of features and labels. InNetwork and Distributed System Security Sym- posium (NDSS), 2023. 2, 4

2023
[31]

arXiv preprint arXiv:2602.09794 , year=

Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Xudong Wang, Zhenzhen Huang, Pengcheng Zheng, Shuai Yuan, Sheng Zheng, Qigan Sun, Jie Zou, et al. Learning global hypothesis space for enhancing synergistic reasoning chain. arXiv preprint arXiv:2602.09794, 2026. 3

work page arXiv 2026