Interference-Aware Multi-Task Unlearning
Pith reviewed 2026-05-20 10:22 UTC · model grok-4.3
The pith
Shared parameters in multi-task models couple forget and retain data, but task-specific gradient projection plus instance-level orthogonalization can decouple the interference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that shared parameters in multi-task models couple the forget set and the retain set, producing task-level interference on non-target tasks and instance-level interference on other instances. The interference-aware framework counters this coupling by combining task-aware gradient projection, which restricts updates to task-specific subspaces, with instance-level gradient orthogonalization, which reduces direct conflicts between forget and retain signals. On two multi-task computer vision benchmarks the method achieves effective unlearning while maintaining strong generalization performance.
What carries the argument
Interference-aware framework that applies task-aware gradient projection to constrain updates within task-specific subspaces and instance-level gradient orthogonalization to reduce conflicts between forget and retain gradients.
If this is right
- Reduces unlearning interference score by 30.3 percent relative to the strongest baseline in full-task unlearning.
- Reduces unlearning interference score by 52.9 percent relative to the strongest baseline in partial-task unlearning.
- Preserves strong generalization on the data and tasks that are not targeted for removal.
- Works across five tasks in standard multi-task computer vision benchmarks.
Where Pith is reading between the lines
- The same interference pattern may appear in any architecture that re-uses parameters across tasks, such as large language models fine-tuned on multiple domains.
- The projection and orthogonalization steps could be combined with existing continual-learning regularizers to handle sequences of tasks added over time.
- If the method generalizes beyond vision, it might lower the cost of selective data removal in production systems that serve several downstream applications from one backbone.
Load-bearing premise
Constraining updates to task-specific subspaces via gradient projection and orthogonalizing forget and retain gradients at the instance level sufficiently decouples shared-parameter interference without introducing new performance trade-offs.
What would settle it
Applying the method to a new multi-task dataset and finding no reduction in unlearning interference score or a clear drop in generalization on non-target tasks would falsify the central claim.
Figures
read the original abstract
Machine unlearning aims to remove the contribution of designated training data from a trained model while preserving performance on the remaining data. Existing work mainly focuses on single-task settings, whereas modern models often operate in multi-task setups with shared backbones, where removing supervision for one task or instance can unintentionally affect others. We introduce multi-task unlearning with two settings: full-task unlearning, which removes a target instance from all tasks, and partial-task unlearning, which removes supervision only from selected tasks. We show that shared parameters couple the forget and retain sets, causing task-level interference on non-target tasks and instance-level interference on other instances. To address this issue, we propose an interference-aware framework that combines task-aware gradient projection, which constrains updates within task-specific subspaces, with instance-level gradient orthogonalization, which reduces conflicts between forget and retain signals. Experiments on two multi-task computer vision benchmarks across five tasks show that our method achieves effective unlearning while maintaining strong generalization, reducing UIS compared with the strongest baseline by 30.3% in full-task unlearning and 52.9% in partial-task unlearning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces multi-task unlearning in full-task and partial-task settings for models with shared backbones. It identifies task-level and instance-level interference arising from coupled forget/retain sets and proposes an interference-aware framework that combines task-aware gradient projection (constraining updates to task-specific subspaces) with instance-level gradient orthogonalization (reducing forget/retain conflicts). Experiments on two multi-task computer vision benchmarks across five tasks report that the method reduces UIS by 30.3% in full-task unlearning and 52.9% in partial-task unlearning relative to the strongest baseline while preserving generalization.
Significance. If the reported UIS reductions can be reproduced with full experimental controls and component ablations, the work would address a practical gap in extending unlearning to multi-task models with shared parameters. The empirical evaluation across multiple tasks and two settings provides a concrete starting point for interference-aware unlearning, though the current presentation leaves the attribution of gains to the proposed mechanisms unverified.
major comments (2)
- [Abstract and Experiments] Abstract and Experiments section: The central claims of 30.3% and 52.9% UIS reductions versus the strongest baseline are presented without details on baseline implementations, the precise definition of UIS, statistical significance testing, or ablations isolating task-aware gradient projection from instance-level gradient orthogonalization. This leaves the decoupling premise unverified and makes it impossible to rule out that the gains arise from unstated hyperparameter choices or unintended retain-set degradation rather than interference awareness.
- [Method] Method description (likely §3): The task-aware gradient projection is described as constraining updates within task-specific subspaces, but no formal definition of subspace construction, orthogonality strength parameter, or analysis of potential generalization trade-offs on non-target tasks is provided. Without this, it is unclear whether the operations eliminate shared-parameter coupling or merely shift interference in a way that affects the reported metrics.
minor comments (2)
- [Abstract] The acronym UIS is used in the abstract without an explicit definition; this should be introduced at first use in the main text with a clear formula or reference to its computation.
- [Experiments] Figure and table captions in the experimental results should explicitly state the number of runs, random seeds, and error bars to support reproducibility claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: The central claims of 30.3% and 52.9% UIS reductions versus the strongest baseline are presented without details on baseline implementations, the precise definition of UIS, statistical significance testing, or ablations isolating task-aware gradient projection from instance-level gradient orthogonalization. This leaves the decoupling premise unverified and makes it impossible to rule out that the gains arise from unstated hyperparameter choices or unintended retain-set degradation rather than interference awareness.
Authors: We agree that additional transparency is required. In the revised manuscript we have added a new subsection in Experiments detailing the exact implementation of each baseline (including optimizer settings, learning rates, and any regularization terms used). The UIS metric is now formally defined with its equation in Section 3.3. We report results over five random seeds together with paired t-test p-values confirming statistical significance of the reported reductions. We have also inserted a dedicated ablation study (Section 4.3) that evaluates the full method, the version without task-aware projection, and the version without instance-level orthogonalization; the results show that both components contribute measurably to the interference reduction and that retain-set accuracy remains comparable to the strongest baseline, ruling out unintended degradation. revision: yes
-
Referee: [Method] Method description (likely §3): The task-aware gradient projection is described as constraining updates within task-specific subspaces, but no formal definition of subspace construction, orthogonality strength parameter, or analysis of potential generalization trade-offs on non-target tasks is provided. Without this, it is unclear whether the operations eliminate shared-parameter coupling or merely shift interference in a way that affects the reported metrics.
Authors: We accept that a more rigorous presentation is needed. Section 3.2 has been expanded to define the task-specific subspaces formally via the top-k principal components of the per-task gradient matrix collected on a small held-out set; the projection operator is written explicitly. We introduce the scalar hyperparameter α that controls the strength of the orthogonalization term and provide its range and selection procedure. A new paragraph analyzes generalization on non-target tasks, supported by additional experiments showing that accuracy on those tasks does not drop below the pre-unlearning baseline, indicating that interference is reduced rather than merely relocated. revision: yes
Circularity Check
No circularity: empirical engineering proposal with independent experimental validation
full rationale
The paper introduces a multi-task unlearning framework via task-aware gradient projection and instance-level orthogonalization, evaluated on CV benchmarks. No equations or derivations are presented that reduce the reported UIS reductions (30.3%, 52.9%) to quantities defined by the same fitted parameters or evaluation data. The central claims rest on empirical results rather than a closed mathematical chain; the method is described as a practical combination of existing gradient techniques without self-definitional loops or load-bearing self-citations that equate the output to the input. This is a standard non-circular ML engineering contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gradient updates can be meaningfully projected onto task-specific subspaces without destroying useful signal for the retain set.
Reference graph
Works this paper leans on
- [1]
-
[2]
S. M. Ahmed, U. Y . Basaran, D. S. Raychaudhuri, A. Dutta, R. Kundu, F. F. Niloy, B. Guler, and A. K. Roy-Chowdhury. Towards source-free machine unlearning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4948–4957, 2025
work page 2025
-
[3]
L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot. Machine unlearning. In2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021
work page 2021
- [4]
- [5]
- [6]
-
[7]
L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[8]
R. Chen, J. Yang, H. Xiong, J. Bai, T. Hu, J. Hao, Y . Feng, J. T. Zhou, J. Wu, and Z. Liu. Fast model debias with machine unlearning.Advances in Neural Information Processing Systems, 36:14516–14539, 2023
work page 2023
- [9]
- [10]
-
[11]
D. Choi and D. Na. Towards machine unlearning benchmarks: Forgetting the personal identities in facial recognition systems.arXiv preprint arXiv:2311.02240, 2023
-
[12]
S. B. R. Chowdhury, K. M. Choromanski, A. Sehanobish, K. A. Dubey, and S. Chaturvedi. Towards scalable exact machine unlearning using parameter-efficient fine-tuning. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[13]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
- [14]
-
[15]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 10
work page 2021
-
[16]
M. Du, Z. Chen, C. Liu, R. Oak, and D. Song. Lifelong anomaly detection through unlearning. InProceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 1283–1297, 2019
work page 2019
- [17]
-
[18]
A. Ebrahimpour-Boroojeny, H. Sundaram, and V . Chandrasekaran. Amun: Adversarial machine unlearning.arXiv preprint arXiv:2503.00917, 2025
-
[19]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88(2):303–338, 2010
work page 2010
- [20]
-
[21]
A. Golatkar, A. Achille, and S. Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020
work page 2020
- [22]
- [23]
- [24]
- [25]
- [26]
-
[27]
A. Kamalesh, A. Lakhotia, P. S. Kulkarni, G. Srinivasa, et al. Unolora: Single low-rank adaptation for efficient multitask fine-tuning. InNeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024
work page 2024
-
[28]
Y . H. Khalil, M. Setayesh, and H. Li. Coun: Empowering machine unlearning via contrastive learning.arXiv preprint arXiv:2509.16391, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Y . Kong and Y . Fu. Human action recognition and prediction: A survey.International Journal of Computer Vision, 130(5):1366–1401, 2022
work page 2022
-
[30]
M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou. Towards unbounded machine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023
work page 1957
-
[31]
Y . Li, X. Feng, C. Chen, and Q. Yang. A survey on recommendation unlearning: Fundamen- tals, taxonomy, evaluation, and open questions.IEEE Transactions on Knowledge and Data Engineering, 38(2):781–799, 2025
work page 2025
-
[32]
S. Lin, X. Zhang, W. Susilo, X. Chen, and J. Liu. Gdr-gma: machine unlearning via direction- rectified and magnitude-adjusted gradients. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9087–9095, 2024
work page 2024
-
[33]
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014
work page 2014
-
[34]
B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu. Conflict-averse gradient descent for multi-task learning.Advances in neural information processing systems, 34:18878–18890, 2021. 11
work page 2021
-
[35]
S. Liu, Y . Liu, N. B. Angel, and E. Triantafillou. Machine unlearning in computer vision: Foundations and applications. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[36]
S. Liu, Y . Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, Y . Yao, C. Y . Liu, X. Xu, H. Li, et al. Rethinking machine unlearning for large language models.Nature Machine Intelligence, 7(2):181–194, 2025
work page 2025
- [37]
- [38]
-
[39]
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021
work page 2021
-
[40]
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015
work page 2015
-
[41]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[42]
R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 565–576, 2021
work page 2021
- [43]
-
[44]
A. Prabhakar, Y . Li, K. Narasimhan, S. Kakade, E. Malach, and S. Jelassi. Lora soups: Merging loras for practical skill composition tasks. InProceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 644–655, 2025
work page 2025
-
[45]
W. Qian, C. Zhao, W. Le, M. Ma, and M. Huai. Towards understanding and enhancing robustness of deep learning models against malicious unlearning attacks. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1932–1942, 2023
work page 1932
-
[46]
P. Regulation. Regulation (eu) 2016/679 of the european parliament and of the council.Regula- tion (eu), 679(2016):10–3, 2016
work page 2016
- [47]
- [48]
-
[49]
A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh. Remember what you want to forget: Algorithms for machine unlearning.Advances in Neural Information Processing Systems, 34:18075–18086, 2021
work page 2021
-
[50]
A. Shamsian, E. Shaar, A. Navon, G. Chechik, and E. Fetaya. Go beyond your means: Unlearn- ing with per-sample gradient orthogonalization.arXiv preprint arXiv:2503.02312, 2025
- [51]
-
[52]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. InEuropean conference on computer vision, pages 746–760. Springer, 2012. 12
work page 2012
-
[53]
L. Song and P. Mittal. Systematic evaluation of privacy risks of machine learning models. In 30th USENIX security symposium (USENIX security 21), pages 2615–2632, 2021
work page 2021
-
[54]
C. N. Spartalis, T. Semertzidis, E. Gavves, and P. Daras. Lotus: Large-scale machine unlearning with a taste of uncertainty. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10046–10055, 2025
work page 2025
-
[55]
T. Surve and R. Pradhan. Explaining fairness violations using machine unlearning. InEDBT, pages 623–635, 2025
work page 2025
- [56]
- [57]
-
[58]
P. V oigt and A. V on dem Bussche. The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10(3152676):10–5555, 2017
work page 2017
-
[59]
W. Wang, Z. Tian, A. Liu, and S. Yu. Tape: Tailored posterior difference for auditing of machine unlearning. InProceedings of the ACM on Web Conference 2025, pages 3061–3072, 2025
work page 2025
-
[60]
X. Wang, T. Chen, Q. Ge, H. Xia, R. Bao, R. Zheng, Q. Zhang, T. Gui, and X.-J. Huang. Orthogonal subspace learning for language model continual learning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 10658–10671, 2023
work page 2023
-
[61]
T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[62]
Y . Xin, J. Du, Q. Wang, Z. Lin, and K. Yan. Vmt-adapter: Parameter-efficient transfer learning for multi-task dense scene understanding. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 16085–16093, 2024
work page 2024
- [63]
-
[64]
H. Yan, X. Li, Z. Guo, H. Li, F. Li, and X. Lin. Arcane: An efficient architecture for exact machine unlearning. InIjcai, volume 6, page 19, 2022
work page 2022
- [65]
-
[66]
S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018
work page 2018
- [67]
-
[68]
L. Yu, B. Yu, H. Yu, F. Huang, and Y . Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[69]
T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn. Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020
work page 2020
- [70]
-
[71]
H. Zhao, B. Ni, J. Fan, Y . Wang, Y . Chen, G. Meng, and Z. Zhang. Continual forgetting for pre-trained vision models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28631–28642, 2024
work page 2024
-
[72]
K. Zhao, M. Kurmanji, G.-O. Barbulescu, E. Triantafillou, and P. Triantafillou. What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024. 14 A Notations Table 4: Summary of notations. Notation Description XSet of input instances. xi Thei-th input instance. TSet of tasks. tTask index. y(t) i...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.