pith. sign in

arxiv: 2605.19042 · v1 · pith:3TTR6GAMnew · submitted 2026-05-18 · 💻 cs.AI

Interference-Aware Multi-Task Unlearning

Pith reviewed 2026-05-20 10:22 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-task unlearningmachine unlearninggradient projectiongradient orthogonalizationshared parameterscomputer visioninterference reduction
0
0 comments X

The pith

Shared parameters in multi-task models couple forget and retain data, but task-specific gradient projection plus instance-level orthogonalization can decouple the interference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern models trained on multiple tasks share parameters across them, so removing the effect of certain training data for one task or instance tends to degrade performance on the others. The paper defines full-task unlearning, which erases an instance from every task, and partial-task unlearning, which erases it from only selected tasks. It shows that shared parameters create both task-level interference on non-target tasks and instance-level interference on other instances. The proposed framework projects gradients onto task-specific subspaces and orthogonalizes forget and retain gradients at the instance level. Experiments on two computer-vision benchmarks with five tasks report lower unlearning interference scores than prior methods while preserving generalization.

Core claim

The paper claims that shared parameters in multi-task models couple the forget set and the retain set, producing task-level interference on non-target tasks and instance-level interference on other instances. The interference-aware framework counters this coupling by combining task-aware gradient projection, which restricts updates to task-specific subspaces, with instance-level gradient orthogonalization, which reduces direct conflicts between forget and retain signals. On two multi-task computer vision benchmarks the method achieves effective unlearning while maintaining strong generalization performance.

What carries the argument

Interference-aware framework that applies task-aware gradient projection to constrain updates within task-specific subspaces and instance-level gradient orthogonalization to reduce conflicts between forget and retain gradients.

If this is right

  • Reduces unlearning interference score by 30.3 percent relative to the strongest baseline in full-task unlearning.
  • Reduces unlearning interference score by 52.9 percent relative to the strongest baseline in partial-task unlearning.
  • Preserves strong generalization on the data and tasks that are not targeted for removal.
  • Works across five tasks in standard multi-task computer vision benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same interference pattern may appear in any architecture that re-uses parameters across tasks, such as large language models fine-tuned on multiple domains.
  • The projection and orthogonalization steps could be combined with existing continual-learning regularizers to handle sequences of tasks added over time.
  • If the method generalizes beyond vision, it might lower the cost of selective data removal in production systems that serve several downstream applications from one backbone.

Load-bearing premise

Constraining updates to task-specific subspaces via gradient projection and orthogonalizing forget and retain gradients at the instance level sufficiently decouples shared-parameter interference without introducing new performance trade-offs.

What would settle it

Applying the method to a new multi-task dataset and finding no reduction in unlearning interference score or a clear drop in generalization on non-target tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.19042 by Hsi-Wen Chen, Ming-Syan Chen, Rui Fang, Ying-Hua Huang.

Figure 1
Figure 1. Figure 1: Overview of multi-task unlearning. removes supervision for a target instance only from selected tasks. For example, an image may be removed from person identification [11] due to privacy requirements while retained for action recog￾nition [29]. Similarly, a user’s interaction may be removed from personalized recommendation [31] while retained for fraud detection [16]. However, our preliminary experiments s… view at source ↗
Figure 2
Figure 2. Figure 2: Unlearning impact score (%) under different unlearn ratios. Lower is better [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Machine unlearning aims to remove the contribution of designated training data from a trained model while preserving performance on the remaining data. Existing work mainly focuses on single-task settings, whereas modern models often operate in multi-task setups with shared backbones, where removing supervision for one task or instance can unintentionally affect others. We introduce multi-task unlearning with two settings: full-task unlearning, which removes a target instance from all tasks, and partial-task unlearning, which removes supervision only from selected tasks. We show that shared parameters couple the forget and retain sets, causing task-level interference on non-target tasks and instance-level interference on other instances. To address this issue, we propose an interference-aware framework that combines task-aware gradient projection, which constrains updates within task-specific subspaces, with instance-level gradient orthogonalization, which reduces conflicts between forget and retain signals. Experiments on two multi-task computer vision benchmarks across five tasks show that our method achieves effective unlearning while maintaining strong generalization, reducing UIS compared with the strongest baseline by 30.3% in full-task unlearning and 52.9% in partial-task unlearning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces multi-task unlearning in full-task and partial-task settings for models with shared backbones. It identifies task-level and instance-level interference arising from coupled forget/retain sets and proposes an interference-aware framework that combines task-aware gradient projection (constraining updates to task-specific subspaces) with instance-level gradient orthogonalization (reducing forget/retain conflicts). Experiments on two multi-task computer vision benchmarks across five tasks report that the method reduces UIS by 30.3% in full-task unlearning and 52.9% in partial-task unlearning relative to the strongest baseline while preserving generalization.

Significance. If the reported UIS reductions can be reproduced with full experimental controls and component ablations, the work would address a practical gap in extending unlearning to multi-task models with shared parameters. The empirical evaluation across multiple tasks and two settings provides a concrete starting point for interference-aware unlearning, though the current presentation leaves the attribution of gains to the proposed mechanisms unverified.

major comments (2)
  1. [Abstract and Experiments] Abstract and Experiments section: The central claims of 30.3% and 52.9% UIS reductions versus the strongest baseline are presented without details on baseline implementations, the precise definition of UIS, statistical significance testing, or ablations isolating task-aware gradient projection from instance-level gradient orthogonalization. This leaves the decoupling premise unverified and makes it impossible to rule out that the gains arise from unstated hyperparameter choices or unintended retain-set degradation rather than interference awareness.
  2. [Method] Method description (likely §3): The task-aware gradient projection is described as constraining updates within task-specific subspaces, but no formal definition of subspace construction, orthogonality strength parameter, or analysis of potential generalization trade-offs on non-target tasks is provided. Without this, it is unclear whether the operations eliminate shared-parameter coupling or merely shift interference in a way that affects the reported metrics.
minor comments (2)
  1. [Abstract] The acronym UIS is used in the abstract without an explicit definition; this should be introduced at first use in the main text with a clear formula or reference to its computation.
  2. [Experiments] Figure and table captions in the experimental results should explicitly state the number of runs, random seeds, and error bars to support reproducibility claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract and Experiments section: The central claims of 30.3% and 52.9% UIS reductions versus the strongest baseline are presented without details on baseline implementations, the precise definition of UIS, statistical significance testing, or ablations isolating task-aware gradient projection from instance-level gradient orthogonalization. This leaves the decoupling premise unverified and makes it impossible to rule out that the gains arise from unstated hyperparameter choices or unintended retain-set degradation rather than interference awareness.

    Authors: We agree that additional transparency is required. In the revised manuscript we have added a new subsection in Experiments detailing the exact implementation of each baseline (including optimizer settings, learning rates, and any regularization terms used). The UIS metric is now formally defined with its equation in Section 3.3. We report results over five random seeds together with paired t-test p-values confirming statistical significance of the reported reductions. We have also inserted a dedicated ablation study (Section 4.3) that evaluates the full method, the version without task-aware projection, and the version without instance-level orthogonalization; the results show that both components contribute measurably to the interference reduction and that retain-set accuracy remains comparable to the strongest baseline, ruling out unintended degradation. revision: yes

  2. Referee: [Method] Method description (likely §3): The task-aware gradient projection is described as constraining updates within task-specific subspaces, but no formal definition of subspace construction, orthogonality strength parameter, or analysis of potential generalization trade-offs on non-target tasks is provided. Without this, it is unclear whether the operations eliminate shared-parameter coupling or merely shift interference in a way that affects the reported metrics.

    Authors: We accept that a more rigorous presentation is needed. Section 3.2 has been expanded to define the task-specific subspaces formally via the top-k principal components of the per-task gradient matrix collected on a small held-out set; the projection operator is written explicitly. We introduce the scalar hyperparameter α that controls the strength of the orthogonalization term and provide its range and selection procedure. A new paragraph analyzes generalization on non-target tasks, supported by additional experiments showing that accuracy on those tasks does not drop below the pre-unlearning baseline, indicating that interference is reduced rather than merely relocated. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical engineering proposal with independent experimental validation

full rationale

The paper introduces a multi-task unlearning framework via task-aware gradient projection and instance-level orthogonalization, evaluated on CV benchmarks. No equations or derivations are presented that reduce the reported UIS reductions (30.3%, 52.9%) to quantities defined by the same fitted parameters or evaluation data. The central claims rest on empirical results rather than a closed mathematical chain; the method is described as a practical combination of existing gradient techniques without self-definitional loops or load-bearing self-citations that equate the output to the input. This is a standard non-circular ML engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard gradient-based optimization assumptions and the premise that task subspaces can be identified from gradients; no new physical entities or ad-hoc constants are introduced beyond the algorithmic design choices.

axioms (1)
  • domain assumption Gradient updates can be meaningfully projected onto task-specific subspaces without destroying useful signal for the retain set.
    Invoked when describing task-aware gradient projection as a solution to shared-parameter coupling.

pith-pipeline@v0.9.0 · 5727 in / 1276 out tokens · 27407 ms · 2026-05-20T10:22:24.166229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 4 internal anchors

  1. [1]

    Agiza, M

    A. Agiza, M. Neseem, and S. Reda. Mtlora: Low-rank adaptation approach for efficient multi- task learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16196–16205, 2024

  2. [2]

    S. M. Ahmed, U. Y . Basaran, D. S. Raychaudhuri, A. Dutta, R. Kundu, F. F. Niloy, B. Guler, and A. K. Roy-Chowdhury. Towards source-free machine unlearning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4948–4957, 2025

  3. [3]

    Bourtoule, V

    L. Bourtoule, V . Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot. Machine unlearning. In2021 IEEE symposium on security and privacy (SP), pages 141–159. IEEE, 2021

  4. [4]

    Cao and J

    Y . Cao and J. Yang. Towards making systems forget with machine unlearning. In2015 IEEE symposium on security and privacy, pages 463–480. IEEE, 2015

  5. [5]

    Carion, F

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers. InEuropean conference on computer vision, pages 213–229. Springer, 2020

  6. [6]

    S. Cha, S. Cho, D. Hwang, and M. Lee. Towards robust and parameter-efficient knowledge unlearning for llms.arXiv preprint arXiv:2408.06621, 2024

  7. [7]

    L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587, 2017

  8. [8]

    R. Chen, J. Yang, H. Xiong, J. Bai, T. Hu, J. Hao, Y . Feng, J. T. Zhou, J. Wu, and Z. Liu. Fast model debias with machine unlearning.Advances in Neural Information Processing Systems, 36:14516–14539, 2023

  9. [9]

    Cheng, P

    J. Cheng, P. Liu, Q. Li, and C. Zhang. Machine unlearning under retain-forget entanglement. arXiv preprint arXiv:2603.26569, 2026

  10. [10]

    Cheng, Z

    X. Cheng, Z. Huang, W. Zhou, Z. He, R. Yang, Y . Wu, and X. Huang. Remaining-data-free machine unlearning by suppressing sample contribution.arXiv preprint arXiv:2402.15109, 2024

  11. [11]

    Choi and D

    D. Choi and D. Na. Towards machine unlearning benchmarks: Forgetting the personal identities in facial recognition systems.arXiv preprint arXiv:2311.02240, 2023

  12. [12]

    S. B. R. Chowdhury, K. M. Choromanski, A. Sehanobish, K. A. Dubey, and S. Chaturvedi. Towards scalable exact machine unlearning using parameter-efficient fine-tuning. InThe Thirteenth International Conference on Learning Representations, 2025

  13. [13]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  14. [14]

    C. Ding, J. Wu, Y . Yuan, J. Lu, K. Zhang, A. Su, X. Wang, and X. He. Unified parameter-efficient unlearning for llms.arXiv preprint arXiv:2412.00383, 2024

  15. [15]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 10

  16. [16]

    M. Du, Z. Chen, C. Liu, R. Oak, and D. Song. Lifelong anomaly detection through unlearning. InProceedings of the 2019 ACM SIGSAC conference on computer and communications security, pages 1283–1297, 2019

  17. [17]

    Dukler, B

    Y . Dukler, B. Bowman, A. Achille, A. Golatkar, A. Swaminathan, and S. Soatto. Safe: Machine unlearning with shard graphs. InProceedings of the IEEE/CVF international conference on computer vision, pages 17108–17118, 2023

  18. [18]

    Ebrahimpour-Boroojeny, H

    A. Ebrahimpour-Boroojeny, H. Sundaram, and V . Chandrasekaran. Amun: Adversarial machine unlearning.arXiv preprint arXiv:2503.00917, 2025

  19. [19]

    Everingham, L

    M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88(2):303–338, 2010

  20. [20]

    Foster, S

    J. Foster, S. Schoepf, and A. Brintrup. Fast machine unlearning without retraining through selective synaptic dampening. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 12043–12051, 2024

  21. [21]

    Golatkar, A

    A. Golatkar, A. Achille, and S. Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9304–9312, 2020

  22. [22]

    Graves, V

    L. Graves, V . Nagisetty, and V . Ganesh. Amnesiac machine learning. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 11516–11524, 2021

  23. [23]

    C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten. Certified data removal from machine learning models.arXiv preprint arXiv:1911.03030, 2019

  24. [24]

    Hoang, S

    T. Hoang, S. Rana, S. Gupta, and S. Venkatesh. Learn to unlearn for deep neural networks: Minimizing unlearning interference with gradient projection. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4819–4828, 2024

  25. [25]

    Huang, Z

    A. Huang, Z. Cai, and Z. Xiong. A survey of machine unlearning in generative ai models: Methods, applications, security, and challenges.IEEE Internet of Things Journal, 2025

  26. [26]

    Huang, Q

    C. Huang, Q. Liu, B. Y . Lin, T. Pang, C. Du, and M. Lin. Lorahub: Efficient cross-task generalization via dynamic lora composition.arXiv preprint arXiv:2307.13269, 2023

  27. [27]

    Kamalesh, A

    A. Kamalesh, A. Lakhotia, P. S. Kulkarni, G. Srinivasa, et al. Unolora: Single low-rank adaptation for efficient multitask fine-tuning. InNeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024

  28. [28]

    Y . H. Khalil, M. Setayesh, and H. Li. Coun: Empowering machine unlearning via contrastive learning.arXiv preprint arXiv:2509.16391, 2025

  29. [29]

    Kong and Y

    Y . Kong and Y . Fu. Human action recognition and prediction: A survey.International Journal of Computer Vision, 130(5):1366–1401, 2022

  30. [30]

    Kurmanji, P

    M. Kurmanji, P. Triantafillou, J. Hayes, and E. Triantafillou. Towards unbounded machine unlearning.Advances in neural information processing systems, 36:1957–1987, 2023

  31. [31]

    Y . Li, X. Feng, C. Chen, and Q. Yang. A survey on recommendation unlearning: Fundamen- tals, taxonomy, evaluation, and open questions.IEEE Transactions on Knowledge and Data Engineering, 38(2):781–799, 2025

  32. [32]

    S. Lin, X. Zhang, W. Susilo, X. Chen, and J. Liu. Gdr-gma: machine unlearning via direction- rectified and magnitude-adjusted gradients. InProceedings of the 32nd ACM International Conference on Multimedia, pages 9087–9095, 2024

  33. [33]

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  34. [34]

    B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu. Conflict-averse gradient descent for multi-task learning.Advances in neural information processing systems, 34:18878–18890, 2021. 11

  35. [35]

    S. Liu, Y . Liu, N. B. Angel, and E. Triantafillou. Machine unlearning in computer vision: Foundations and applications. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  36. [36]

    S. Liu, Y . Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, Y . Yao, C. Y . Liu, X. Xu, H. Li, et al. Rethinking machine unlearning for large language models.Nature Machine Intelligence, 7(2):181–194, 2025

  37. [37]

    Y . Liu, H. Chen, W. Huang, Y . Ni, and M. Imani. Lune: Efficient llm unlearning via lora fine-tuning with negative examples.arXiv preprint arXiv:2512.07375, 2025

  38. [38]

    Liu, C.-Y

    Y .-C. Liu, C.-Y . Ma, J. Tian, Z. He, and Z. Kira. Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks.Advances in neural information processing systems, 35:36889– 36901, 2022

  39. [39]

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  40. [40]

    J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015

  41. [41]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  42. [42]

    R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 565–576, 2021

  43. [43]

    Poppi, S

    S. Poppi, S. Sarto, M. Cornia, L. Baraldi, and R. Cucchiara. Unlearning vision transformers without retaining data via low-rank decompositions. InInternational Conference on Pattern Recognition, pages 147–163. Springer, 2024

  44. [44]

    Prabhakar, Y

    A. Prabhakar, Y . Li, K. Narasimhan, S. Kakade, E. Malach, and S. Jelassi. Lora soups: Merging loras for practical skill composition tasks. InProceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 644–655, 2025

  45. [45]

    W. Qian, C. Zhao, W. Le, M. Ma, and M. Huai. Towards understanding and enhancing robustness of deep learning models against malicious unlearning attacks. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1932–1942, 2023

  46. [46]

    Regulation

    P. Regulation. Regulation (eu) 2016/679 of the european parliament and of the council.Regula- tion (eu), 679(2016):10–3, 2016

  47. [47]

    S. Roy, S. Banerjee, V . Verma, S. Dasgupta, D. Gupta, and P. Rai. Novo: Unlearning-compliant vision transformers.arXiv preprint arXiv:2507.03281, 2025

  48. [48]

    G. Saha, I. Garg, and K. Roy. Gradient projection memory for continual learning.arXiv preprint arXiv:2103.09762, 2021

  49. [49]

    Sekhari, J

    A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh. Remember what you want to forget: Algorithms for machine unlearning.Advances in Neural Information Processing Systems, 34:18075–18086, 2021

  50. [50]

    Shamsian, E

    A. Shamsian, E. Shaar, A. Navon, G. Chechik, and E. Fetaya. Go beyond your means: Unlearn- ing with per-sample gradient orthogonalization.arXiv preprint arXiv:2503.02312, 2025

  51. [51]

    Shokri, M

    R. Shokri, M. Stronati, C. Song, and V . Shmatikov. Membership inference attacks against machine learning models. In2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017

  52. [52]

    Silberman, D

    N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. InEuropean conference on computer vision, pages 746–760. Springer, 2012. 12

  53. [53]

    Song and P

    L. Song and P. Mittal. Systematic evaluation of privacy risks of machine learning models. In 30th USENIX security symposium (USENIX security 21), pages 2615–2632, 2021

  54. [54]

    C. N. Spartalis, T. Semertzidis, E. Gavves, and P. Daras. Lotus: Large-scale machine unlearning with a taste of uncertainty. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10046–10055, 2025

  55. [55]

    Surve and R

    T. Surve and R. Pradhan. Explaining fairness violations using machine unlearning. InEDBT, pages 623–635, 2025

  56. [56]

    Y . Tong, T. Zhang, J. Yuan, Y . Wang, and C. Hu. Lethevit: Selective machine unlearning for vision transformers via attention-guided contrastive learning.arXiv preprint arXiv:2508.01569, 2025

  57. [57]

    Y . Tu, P. Hu, and J. Ma. A reliable cryptographic framework for empirical machine unlearning evaluation.arXiv preprint arXiv:2404.11577, 2024

  58. [58]

    V oigt and A

    P. V oigt and A. V on dem Bussche. The eu general data protection regulation (gdpr).A practical guide, 1st ed., Cham: Springer International Publishing, 10(3152676):10–5555, 2017

  59. [59]

    W. Wang, Z. Tian, A. Liu, and S. Yu. Tape: Tailored posterior difference for auditing of machine unlearning. InProceedings of the ACM on Web Conference 2025, pages 3061–3072, 2025

  60. [60]

    X. Wang, T. Chen, Q. Ge, H. Xia, R. Bao, R. Zheng, Q. Zhang, T. Gui, and X.-J. Huang. Orthogonal subspace learning for language model continual learning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 10658–10671, 2023

  61. [61]

    T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019

  62. [62]

    Y . Xin, J. Du, Q. Wang, Z. Lin, and K. Yan. Vmt-adapter: Parameter-efficient transfer learning for multi-task dense scene understanding. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 16085–16093, 2024

  63. [63]

    Yadav, D

    P. Yadav, D. Tam, L. Choshen, C. A. Raffel, and M. Bansal. Ties-merging: Resolving interfer- ence when merging models.Advances in neural information processing systems, 36:7093–7115, 2023

  64. [64]

    H. Yan, X. Li, Z. Guo, H. Li, F. Li, and X. Lin. Arcane: An efficient architecture for exact machine unlearning. InIjcai, volume 6, page 19, 2022

  65. [65]

    Z. Yang, G. Chen, Y . Yang, A. Zeng, and X. Yang. Disentangling task conflicts in multi-task lora via orthogonal gradient projection.arXiv preprint arXiv:2601.09684, 2026

  66. [66]

    S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018

  67. [67]

    Yifei, X

    S. Yifei, X. Wei, and Y . Wang. Dislora: Task-specific low-rank adaptation via orthogonal basis from singular value decomposition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 13751–13766, 2025

  68. [68]

    L. Yu, B. Yu, H. Yu, F. Huang, and Y . Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch. InForty-first International Conference on Machine Learning, 2024

  69. [69]

    T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn. Gradient surgery for multi-task learning.Advances in neural information processing systems, 33:5824–5836, 2020

  70. [70]

    Zhang, S

    D. Zhang, S. Pan, T. Hoang, Z. Xing, M. Staples, X. Xu, L. Yao, Q. Lu, and L. Zhu. To be forgotten or to be fair: Unveiling fairness implications of machine unlearning methods.AI and Ethics, 4(1):83–93, 2024. 13

  71. [71]

    H. Zhao, B. Ni, J. Fan, Y . Wang, Y . Chen, G. Meng, and Z. Zhang. Continual forgetting for pre-trained vision models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28631–28642, 2024

  72. [72]

    K. Zhao, M. Kurmanji, G.-O. Barbulescu, E. Triantafillou, and P. Triantafillou. What makes unlearning hard and what to do about it.Advances in Neural Information Processing Systems, 37:12293–12333, 2024. 14 A Notations Table 4: Summary of notations. Notation Description XSet of input instances. xi Thei-th input instance. TSet of tasks. tTask index. y(t) i...