pith. sign in

arxiv: 2604.18075 · v1 · submitted 2026-04-20 · 💻 cs.CV

Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting

Pith reviewed 2026-05-10 04:35 UTC · model grok-4.3

classification 💻 cs.CV
keywords continual learningvision-language modelsprefix-tuningdomain incremental learningadaptersparameter-efficient fine-tuninggating mechanism
0
0 comments X

The pith

Dynamically weighting prefixes by token importance improves continual learning for vision-language models

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language models struggle to adapt to new domains and classes over time without losing earlier knowledge. Prior prefix-tuning approaches normalize adjustment weights uniformly across tokens, even though some tokens need more or less change than others. The paper introduces Dynamic Prefix Weighting, a method that uses a gating module to scale each prefix weight according to the importance of its input token and treats adapter outputs as residuals after the prefix adjustments. This selective, token-aware adaptation delivers state-of-the-art results on domain-class incremental learning benchmarks for VLMs.

Core claim

The authors show that a gating module can assign importance-based weights to prefixes while deriving adapter weights as the residual difference from those prefixes, enabling more precise parameter-efficient updates that outperform uniform-weight prefix-tuning in sequential domain and class shifts.

What carries the argument

Dynamic Prefix Weighting (DPW) framework consisting of a gating module that scales prefix weights by input-token importance and a residual mechanism that activates adapters only when needed.

If this is right

  • State-of-the-art performance is reached in domain-class incremental learning scenarios for VLMs.
  • Adapters are engaged only when prefix-tuning alone is insufficient, limiting unnecessary changes.
  • The model adapts more effectively to tokens that require different degrees of adjustment.
  • Prior task knowledge is retained better across sequential domain and class shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same token-aware weighting idea could transfer to other efficient tuning techniques such as low-rank adapters.
  • The approach may scale favorably when the number of sequential tasks grows large.
  • It suggests a route for applying importance-based selection in continual learning settings outside vision-language models.

Load-bearing premise

The gating module can reliably estimate the relative importance of each input token, and the residual adapter weighting supplies additive benefit beyond plain prefix-tuning.

What would settle it

Experiments that replace the dynamic gating with uniform weights and observe no performance loss on the same domain-class incremental benchmarks would undermine the claim.

Figures

Figures reproduced from arXiv: 2604.18075 by Hyeonseo Jang, Hyuk Kwon, Kibok Lee.

Figure 1
Figure 1. Figure 1: Comparison of the weighting mechanisms of various popular PEFT methods. Light-green boxes denote the output vectors added [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of the proposed method. The right side of the figure shows how each input token assigns weights to both the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of prefix score maps between attention and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: UMAP visualization of token embeddings from the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of mean performance and the number of [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information into input tokens through additive vectors. However, previous approaches often normalize the weights of these vectors, disregarding the fact that different input tokens require different degrees of adjustment. To overcome this issue, we propose Dynamic Prefix Weighting (DPW), a framework that dynamically assigns weights to prefixes, complemented by adapters. DPW consists of 1) a gating module that adjusts the weights of each prefix based on the importance of the corresponding input token, and 2) a weighting mechanism that derives adapter output weights as a residual of prefix-tuning weights, ensuring that adapters are utilized only when necessary. Experimental results demonstrate that our method achieves state-of-the-art performance in domain-class incremental learning scenarios for VLMs. The code is available at: https://github.com/YonseiML/dpw.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes Dynamic Prefix Weighting (DPW) to improve domain-class incremental learning for vision-language models. It augments prefix-tuning with (1) a gating module that computes token-specific weights for the prefixes instead of uniform normalization and (2) a residual adapter-weighting mechanism that activates adapters only when the prefix adjustment is insufficient. The central claim is that this combination yields state-of-the-art performance on the relevant continual-learning benchmarks while remaining parameter-efficient; public code is provided for verification.

Significance. If the reported gains hold under rigorous re-evaluation, the work supplies a concrete, verifiable improvement over standard prefix-tuning and adapter baselines in a practically important setting. The explicit public code release is a notable strength that directly supports reproducibility of the SOTA numbers.

minor comments (3)
  1. The abstract states that the gating module 'adjusts the weights of each prefix based on the importance of the corresponding input token,' but does not specify the exact functional form or training objective of the gate; a short equation or pseudocode block in §3 would remove ambiguity for readers.
  2. Table captions and axis labels should explicitly indicate whether reported metrics are averaged over multiple random seeds and whether error bars or standard deviations are shown; this is especially important for incremental-learning claims where variance can be high.
  3. The residual adapter weighting is described as 'deriving adapter output weights as a residual of prefix-tuning weights.' A one-sentence clarification of the exact residual formula (e.g., whether it is a simple subtraction or a learned scaling) would aid implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on Dynamic Prefix Weighting for domain-class incremental learning in VLMs, the recognition of its practical importance, and the recommendation for minor revision. The emphasis on reproducibility via public code is appreciated.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an explicitly new framework (DPW) with two defined components—a gating module for token-specific prefix weights and a residual adapter weighting mechanism—rather than deriving any result from prior equations or self-citations by construction. The central claim is empirical (SOTA on domain-class incremental benchmarks), supported by external evaluation and public code, with no load-bearing mathematical derivation, fitted-parameter prediction, uniqueness theorem, or ansatz smuggled via self-citation. The approach is self-contained against standard prefix-tuning baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the empirical effectiveness of two newly introduced components whose benefit is demonstrated experimentally rather than derived from first principles.

axioms (1)
  • domain assumption Prefix-tuning and adapters remain effective adaptation mechanisms when their weights are made input-dependent via gating and residuals.
    Invoked in the design of DPW as the foundation for the proposed modifications.
invented entities (2)
  • Gating module no independent evidence
    purpose: Dynamically assigns weights to each prefix based on the importance of the corresponding input token.
    New component introduced to address uniform weighting limitation.
  • Residual adapter weighting mechanism no independent evidence
    purpose: Derives adapter output weights as the residual of prefix-tuning weights so adapters activate only when necessary.
    New mechanism introduced to complement prefix-tuning.

pith-pipeline@v0.9.0 · 5477 in / 1268 out tokens · 37488 ms · 2026-05-10T04:35:21.714721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    Memory aware synapses: Learning what (not) to forget

    Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2

  2. [2]

    Task-free continual learning

    Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. Task-free continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11254–11263, 2019. 2

  3. [3]

    Rainbow memory: Continual learn- ing with a memory of diverse samples

    Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 2

  4. [4]

    Dokania, Thalaiyasingam Ajan- than, and Philip H

    Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. InPro- ceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018. 2

  5. [5]

    Learning without memorizing

    Pramit Dhar, Rajeev Ranjan Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5138–5146, 2019. 1

  6. [6]

    Douillard, A

    A. Douillard, A. Ramé, G. Couairon, and M. Cord. Dytox: Transformers for continual learning with dynamic token ex- pansion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9285–9295, 2022. 2

  7. [7]

    Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks

    Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, and Hung-Yi Lee. Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks. InFindings of the Association for Computational Linguistics, pages 2608–2621,

  8. [8]

    Enhanced continual learning of vision-language models with model fusion

    Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, and Weiran Huang. Enhanced continual learning of vision-language models with model fusion. InICLR 2025 Workshop. ICLR, 2025. 2

  9. [9]

    Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024

    Naoki Hiratani. Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024. 1, 7

  10. [10]

    D. Jung, D. Han, J. Bang, and H. Song. Generating instance- level prompts for rehearsal-free continual learning. InPro- ceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11813–11823, 2023. 3

  11. [11]

    Maple: Multi- modal prompt learning

    Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. Maple: Multi- modal prompt learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, 2023. 1

  12. [12]

    Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017. 2

  13. [13]

    A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021

    Matthias De Lange, Rahaf Aljundi, Mateusz Masana, Sophie Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021. 1

  14. [14]

    Mix- ture of experts meets prompt-based continual learning

    Minh Le, An Nguyen The, Huy Nguyen, Thien Trang Nguyen Vu, Huyen Trang Pham, Linh Ngo Van, and Nhat Ho. Mix- ture of experts meets prompt-based continual learning. InAd- vances in Neural Information Processing Systems (NeurIPS),

  15. [15]

    Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

    Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017. 2

  16. [16]

    Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting

    Xilai Li, Yuezhou Zhou, Tianjun Wu, Richard Socher, and Caiming Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), pages 3925–3934. PMLR, 2019. 2

  17. [17]

    X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00119,

  18. [18]

    Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024

    Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, and Peng Wang. Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024. 1, 2, 3, 6, 7

  19. [19]

    Inflora: Interference-free low-rank adaptation for continual learning

    Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23638–23647, 2024. 2, 5

  20. [20]

    C-clip: Multimodal continual learning for vision-language model

    Wenzhuo Liu, Fei Zhu, Longhui Wei, and Qi Tian. C-clip: Multimodal continual learning for vision-language model. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3

  21. [21]

    Mnemonics training: Multi-class incremental learning without forgetting

    Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12245–12254, 2020. 2, 6

  22. [22]

    Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024

    Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, and Yan Wang. Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024. 1, 3, 6, 7

  23. [23]

    Class- incremental exemplar compression for class-incremental learning

    Zilin Luo, Yaoyao Liu, Bernt Schiele, and Qianru Sun. Class- incremental exemplar compression for class-incremental learning. InProceedings of the IEEE/CVF Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 11371–11380, 2023. 2

  24. [24]

    Packnet: Adding multi- ple tasks to a single network by iterative pruning

    Arun Mallya and Svetlana Lazebnik. Packnet: Adding multi- ple tasks to a single network by iterative pruning. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, 2018. 2

  25. [25]

    Pissa: Prin- cipal singular values and singular vectors adaptation of large language models

    Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Prin- cipal singular values and singular vectors adaptation of large language models. InAdvances in Neural Information Pro- cessing Systems, 2024. 5

  26. [26]

    On the role of attention in prompt-tuning

    Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, and Christos Thrampoulidis. On the role of attention in prompt- tuning.arXiv preprint arXiv:2306.03435, 2023. 4

  27. [27]

    Dis- secting query-key interaction in vision transformers

    Xu Pan, Aaron Philip, Ziqian Xie, and Odelia Schwartz. Dis- secting query-key interaction in vision transformers. InAd- vances in Neural Information Processing Systems (NeurIPS),

  28. [28]

    Spotlight Presentation. 1

  29. [29]

    Learning transferable visual models from natural language supervision.arXiv preprint,

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.arXiv preprint,

  30. [30]

    The- ory, analysis, and best practices for sigmoid self-attention

    Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, and Russ Webb. The- ory, analysis, and best practices for sigmoid self-attention. arXiv preprint, 2025. 5, 8

  31. [31]

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 2001–2010, 2017. 1, 6

  32. [32]

    Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

    James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

  33. [33]

    Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models

    Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Han- tao Zhou, Hengshuang Zhao, Xiu Li, and Jiaya Jia. Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models. In European Conference on Computer Vision (ECCV), pages 346–365. Springer, 2024. 1, 2, 3, 5, 6, 7, 8, 4

  34. [34]

    Hydralora: An asymmetric lora architecture for efficient fine-tuning

    Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Cheng zhong Xu. Hydralora: An asymmetric lora architecture for efficient fine-tuning. InAdvances in Neural Information Processing Systems, 2024. 5, 2

  35. [35]

    Sclip: Rethinking self-attention for dense vision-language inference

    Feng Wang, Jieru Mei, and Alan Yuille. Sclip: Rethinking self-attention for dense vision-language inference. InPro- ceedings of the European Conference on Computer Vision (ECCV), 2024. 1

  36. [36]

    Milora: Harnessing mi- nor singular components for parameter-efficient llm fine- tuning.arXiv preprint arXiv:2406.09044,

    Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, and Yun Chen. Milora: Harnessing minor singular compo- nents for parameter-efficient llm finetuning.arXiv preprint arXiv:2406.09044, 2024. 5

  37. [37]

    S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning

    Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 2, 3

  38. [38]

    Dualprompt: Comple- mentary prompting for rehearsal-free continual learning

    Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoyu Sun, Haohan Zhang, Ching-Yao Lee, Xinlei Ren, Guodong Su, Vincent Perot, Jennifer Dy, et al. Dualprompt: Comple- mentary prompting for rehearsal-free continual learning. In European Conference on Computer Vision (ECCV), pages 631–648. Springer, 2022. 3, 7

  39. [39]

    Learning to prompt for continual learning

    Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022. 2, 3, 7

  40. [40]

    Robust fine-tuning of zero- shot models

    Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero- shot models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 6

  41. [41]

    Synthetic data is an elegant gift for continual vision-language models

    Bin Wu, Wuxuan Shi, Jinqiao Wang, and Mang Ye. Synthetic data is an elegant gift for continual vision-language models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2, 6, 4, 7

  42. [42]

    Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022

    Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guo- qiang Liang, Peng Wang, and Yanning Zhang. Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022. 3

  43. [43]

    Learning bayesian sparse net- works with full experience replay for continual learning

    Qingsen Yan, Dong Gong, Yuhang Liu, Anton van den Hen- gel, and Javen Qinfeng Shi. Learning bayesian sparse net- works with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 109–118, 2022. 2

  44. [44]

    Der: Dynami- cally expandable representation for class incremental learning

    Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynami- cally expandable representation for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023,

  45. [45]

    Yu, and Irwin King

    Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, and Irwin King. Recent advances of mul- timodal continual learning: A comprehensive survey.arXiv preprint, 2024. 1

  46. [46]

    Boosting continual learning of vision-language models via mixture-of-experts adapters

    Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, and You He. Boosting continual learning of vision-language models via mixture-of-experts adapters. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23219–23230,

  47. [47]

    Task residual for tuning vision-language models

    Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision-language models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

  48. [48]

    10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models

    Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang. 10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models. InEu- ropean Conference on Computer Vision (ECCV). Springer,

  49. [49]

    Continual learning through synaptic intelligence

    Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. InProceedings of the International Conference on Machine Learning (ICML), pages 3987–3995. PMLR, 2017. 2

  50. [50]

    Preventing zero-shot transfer degradation in continual learning of vision-language models

    Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xi- angyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint, 2023. 1, 2, 3, 4, 6, 7

  51. [51]

    Continual learning with pre-trained models: A survey

    Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained models: A survey. InProceedings of the International Joint Confer- ence on Artificial Intelligence (IJCAI), 2024. 7

  52. [52]

    Conditional prompt learning for vision-language models

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16816–16825,

  53. [53]

    ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024

    Nan Zhou, Jiaxin Chen, and Di Huang. ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024. 1, 4 11 Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting Supplementary Material 0.0 0.2 0.4 0.6 0.8 Cutoff threshold 69.6 69.8 70.0 70.2 70.4Transfer Score...

  54. [54]

    Transfer

    The row rank dimension of the LoRA adapter is set to 64 in our default setting (Ours) and reduced to 4 in the parameter-efficient variant (Ours†). Both prefix and adapter modules are integrated into all 12 layers of the visual and text encoders. All experiments are conducted using a sin- gle NVIDIA 4090 GPU. For RePA, the bias matrix BG i is initialized t...