Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting

Hyeonseo Jang; Hyuk Kwon; Kibok Lee

arxiv: 2604.18075 · v1 · submitted 2026-04-20 · 💻 cs.CV

Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting

Hyeonseo Jang , Hyuk Kwon , Kibok Lee This is my paper

Pith reviewed 2026-05-10 04:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords continual learningvision-language modelsprefix-tuningdomain incremental learningadaptersparameter-efficient fine-tuninggating mechanism

0 comments

The pith

Dynamically weighting prefixes by token importance improves continual learning for vision-language models

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language models struggle to adapt to new domains and classes over time without losing earlier knowledge. Prior prefix-tuning approaches normalize adjustment weights uniformly across tokens, even though some tokens need more or less change than others. The paper introduces Dynamic Prefix Weighting, a method that uses a gating module to scale each prefix weight according to the importance of its input token and treats adapter outputs as residuals after the prefix adjustments. This selective, token-aware adaptation delivers state-of-the-art results on domain-class incremental learning benchmarks for VLMs.

Core claim

The authors show that a gating module can assign importance-based weights to prefixes while deriving adapter weights as the residual difference from those prefixes, enabling more precise parameter-efficient updates that outperform uniform-weight prefix-tuning in sequential domain and class shifts.

What carries the argument

Dynamic Prefix Weighting (DPW) framework consisting of a gating module that scales prefix weights by input-token importance and a residual mechanism that activates adapters only when needed.

If this is right

State-of-the-art performance is reached in domain-class incremental learning scenarios for VLMs.
Adapters are engaged only when prefix-tuning alone is insufficient, limiting unnecessary changes.
The model adapts more effectively to tokens that require different degrees of adjustment.
Prior task knowledge is retained better across sequential domain and class shifts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-aware weighting idea could transfer to other efficient tuning techniques such as low-rank adapters.
The approach may scale favorably when the number of sequential tasks grows large.
It suggests a route for applying importance-based selection in continual learning settings outside vision-language models.

Load-bearing premise

The gating module can reliably estimate the relative importance of each input token, and the residual adapter weighting supplies additive benefit beyond plain prefix-tuning.

What would settle it

Experiments that replace the dynamic gating with uniform weights and observe no performance loss on the same domain-class incremental benchmarks would undermine the claim.

Figures

Figures reproduced from arXiv: 2604.18075 by Hyeonseo Jang, Hyuk Kwon, Kibok Lee.

**Figure 1.** Figure 1: Comparison of the weighting mechanisms of various popular PEFT methods. Light-green boxes denote the output vectors added [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overall framework of the proposed method. The right side of the figure shows how each input token assigns weights to both the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of prefix score maps between attention and [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: UMAP visualization of token embeddings from the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Comparison of mean performance and the number of [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information into input tokens through additive vectors. However, previous approaches often normalize the weights of these vectors, disregarding the fact that different input tokens require different degrees of adjustment. To overcome this issue, we propose Dynamic Prefix Weighting (DPW), a framework that dynamically assigns weights to prefixes, complemented by adapters. DPW consists of 1) a gating module that adjusts the weights of each prefix based on the importance of the corresponding input token, and 2) a weighting mechanism that derives adapter output weights as a residual of prefix-tuning weights, ensuring that adapters are utilized only when necessary. Experimental results demonstrate that our method achieves state-of-the-art performance in domain-class incremental learning scenarios for VLMs. The code is available at: https://github.com/YonseiML/dpw.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DPW adds a learned per-token gate to scale prefixes plus a residual adapter rule, which looks like a practical tweak over uniform prefix tuning in continual VLM settings.

read the letter

The paper's main move is to stop treating all prefix vectors the same way. Instead of normalizing them uniformly, they train a small gating module that looks at each input token and decides how much that prefix should matter for it. They also derive the adapter weights as a residual from the prefix weights so the adapters kick in only when the prefixes aren't enough. That combination is the concrete new piece relative to the prefix-tuning and adapter papers they cite. The motivation is clear: different tokens really do need different amounts of adjustment when the model is learning new domains and classes incrementally. Public code is a plus for anyone who wants to check the numbers themselves. The abstract claims SOTA on the relevant domain-class incremental benchmarks, which is the kind of result that matters for people actually deploying VLMs in changing environments. The approach stays parameter-efficient and avoids obvious leakage in the incremental protocol based on the description. The soft spot is that we only have the abstract and the high-level framework. Without the full ablation tables, variance numbers, or direct head-to-heads showing how much the gate contributes versus the residual rule, it's hard to know if the gains are robust or mostly from careful tuning. If the experiments hold up under scrutiny, the idea is solid enough to be worth trying in follow-up work. This is aimed at the continual-learning-for-VLMs crowd rather than a broad audience. It is the sort of incremental but testable refinement that deserves a serious referee rather than a desk reject, even if it ends up needing more controls and statistical checks in revision.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes Dynamic Prefix Weighting (DPW) to improve domain-class incremental learning for vision-language models. It augments prefix-tuning with (1) a gating module that computes token-specific weights for the prefixes instead of uniform normalization and (2) a residual adapter-weighting mechanism that activates adapters only when the prefix adjustment is insufficient. The central claim is that this combination yields state-of-the-art performance on the relevant continual-learning benchmarks while remaining parameter-efficient; public code is provided for verification.

Significance. If the reported gains hold under rigorous re-evaluation, the work supplies a concrete, verifiable improvement over standard prefix-tuning and adapter baselines in a practically important setting. The explicit public code release is a notable strength that directly supports reproducibility of the SOTA numbers.

minor comments (3)

The abstract states that the gating module 'adjusts the weights of each prefix based on the importance of the corresponding input token,' but does not specify the exact functional form or training objective of the gate; a short equation or pseudocode block in §3 would remove ambiguity for readers.
Table captions and axis labels should explicitly indicate whether reported metrics are averaged over multiple random seeds and whether error bars or standard deviations are shown; this is especially important for incremental-learning claims where variance can be high.
The residual adapter weighting is described as 'deriving adapter output weights as a residual of prefix-tuning weights.' A one-sentence clarification of the exact residual formula (e.g., whether it is a simple subtraction or a learned scaling) would aid implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on Dynamic Prefix Weighting for domain-class incremental learning in VLMs, the recognition of its practical importance, and the recommendation for minor revision. The emphasis on reproducibility via public code is appreciated.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an explicitly new framework (DPW) with two defined components—a gating module for token-specific prefix weights and a residual adapter weighting mechanism—rather than deriving any result from prior equations or self-citations by construction. The central claim is empirical (SOTA on domain-class incremental benchmarks), supported by external evaluation and public code, with no load-bearing mathematical derivation, fitted-parameter prediction, uniqueness theorem, or ansatz smuggled via self-citation. The approach is self-contained against standard prefix-tuning baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the empirical effectiveness of two newly introduced components whose benefit is demonstrated experimentally rather than derived from first principles.

axioms (1)

domain assumption Prefix-tuning and adapters remain effective adaptation mechanisms when their weights are made input-dependent via gating and residuals.
Invoked in the design of DPW as the foundation for the proposed modifications.

invented entities (2)

Gating module no independent evidence
purpose: Dynamically assigns weights to each prefix based on the importance of the corresponding input token.
New component introduced to address uniform weighting limitation.
Residual adapter weighting mechanism no independent evidence
purpose: Derives adapter output weights as the residual of prefix-tuning weights so adapters activate only when necessary.
New mechanism introduced to complement prefix-tuning.

pith-pipeline@v0.9.0 · 5477 in / 1268 out tokens · 37488 ms · 2026-05-10T04:35:21.714721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Memory aware synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2

work page 2018
[2]

Task-free continual learning

Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. Task-free continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11254–11263, 2019. 2

work page 2019
[3]

Rainbow memory: Continual learn- ing with a memory of diverse samples

Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 2

work page 2021
[4]

Dokania, Thalaiyasingam Ajan- than, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. InPro- ceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018. 2

work page 2018
[5]

Learning without memorizing

Pramit Dhar, Rajeev Ranjan Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5138–5146, 2019. 1

work page 2019
[6]

Douillard, A

A. Douillard, A. Ramé, G. Couairon, and M. Cord. Dytox: Transformers for continual learning with dynamic token ex- pansion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9285–9295, 2022. 2

work page 2022
[7]

Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks

Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, and Hung-Yi Lee. Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks. InFindings of the Association for Computational Linguistics, pages 2608–2621,

work page
[8]

Enhanced continual learning of vision-language models with model fusion

Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, and Weiran Huang. Enhanced continual learning of vision-language models with model fusion. InICLR 2025 Workshop. ICLR, 2025. 2

work page 2025
[9]

Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024

Naoki Hiratani. Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024. 1, 7

work page 2024
[10]

D. Jung, D. Han, J. Bang, and H. Song. Generating instance- level prompts for rehearsal-free continual learning. InPro- ceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11813–11823, 2023. 3

work page 2023
[11]

Maple: Multi- modal prompt learning

Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. Maple: Multi- modal prompt learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, 2023. 1

work page 2023
[12]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017. 2

work page 2017
[13]

A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021

Matthias De Lange, Rahaf Aljundi, Mateusz Masana, Sophie Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021. 1

work page 2021
[14]

Mix- ture of experts meets prompt-based continual learning

Minh Le, An Nguyen The, Huy Nguyen, Thien Trang Nguyen Vu, Huyen Trang Pham, Linh Ngo Van, and Nhat Ho. Mix- ture of experts meets prompt-based continual learning. InAd- vances in Neural Information Processing Systems (NeurIPS),

work page
[15]

Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017. 2

work page 2017
[16]

Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting

Xilai Li, Yuezhou Zhou, Tianjun Wu, Richard Socher, and Caiming Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), pages 3925–3934. PMLR, 2019. 2

work page 2019
[17]

X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00119,

work page arXiv
[18]

Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024

Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, and Peng Wang. Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024. 1, 2, 3, 6, 7

work page 2024
[19]

Inflora: Interference-free low-rank adaptation for continual learning

Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23638–23647, 2024. 2, 5

work page 2024
[20]

C-clip: Multimodal continual learning for vision-language model

Wenzhuo Liu, Fei Zhu, Longhui Wei, and Qi Tian. C-clip: Multimodal continual learning for vision-language model. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3

work page 2025
[21]

Mnemonics training: Multi-class incremental learning without forgetting

Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12245–12254, 2020. 2, 6

work page 2020
[22]

Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024

Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, and Yan Wang. Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024. 1, 3, 6, 7

work page 2024
[23]

Class- incremental exemplar compression for class-incremental learning

Zilin Luo, Yaoyao Liu, Bernt Schiele, and Qianru Sun. Class- incremental exemplar compression for class-incremental learning. InProceedings of the IEEE/CVF Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 11371–11380, 2023. 2

work page 2023
[24]

Packnet: Adding multi- ple tasks to a single network by iterative pruning

Arun Mallya and Svetlana Lazebnik. Packnet: Adding multi- ple tasks to a single network by iterative pruning. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, 2018. 2

work page 2018
[25]

Pissa: Prin- cipal singular values and singular vectors adaptation of large language models

Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Prin- cipal singular values and singular vectors adaptation of large language models. InAdvances in Neural Information Pro- cessing Systems, 2024. 5

work page 2024
[26]

On the role of attention in prompt-tuning

Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, and Christos Thrampoulidis. On the role of attention in prompt- tuning.arXiv preprint arXiv:2306.03435, 2023. 4

work page arXiv 2023
[27]

Dis- secting query-key interaction in vision transformers

Xu Pan, Aaron Philip, Ziqian Xie, and Odelia Schwartz. Dis- secting query-key interaction in vision transformers. InAd- vances in Neural Information Processing Systems (NeurIPS),

work page
[28]

Spotlight Presentation. 1

work page
[29]

Learning transferable visual models from natural language supervision.arXiv preprint,

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.arXiv preprint,

work page
[30]

The- ory, analysis, and best practices for sigmoid self-attention

Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, and Russ Webb. The- ory, analysis, and best practices for sigmoid self-attention. arXiv preprint, 2025. 5, 8

work page 2025
[31]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 2001–2010, 2017. 1, 6

work page 2001
[32]

Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

work page
[33]

Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models

Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Han- tao Zhou, Hengshuang Zhao, Xiu Li, and Jiaya Jia. Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models. In European Conference on Computer Vision (ECCV), pages 346–365. Springer, 2024. 1, 2, 3, 5, 6, 7, 8, 4

work page 2024
[34]

Hydralora: An asymmetric lora architecture for efficient fine-tuning

Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Cheng zhong Xu. Hydralora: An asymmetric lora architecture for efficient fine-tuning. InAdvances in Neural Information Processing Systems, 2024. 5, 2

work page 2024
[35]

Sclip: Rethinking self-attention for dense vision-language inference

Feng Wang, Jieru Mei, and Alan Yuille. Sclip: Rethinking self-attention for dense vision-language inference. InPro- ceedings of the European Conference on Computer Vision (ECCV), 2024. 1

work page 2024
[36]

Milora: Harnessing mi- nor singular components for parameter-efficient llm fine- tuning.arXiv preprint arXiv:2406.09044,

Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, and Yun Chen. Milora: Harnessing minor singular compo- nents for parameter-efficient llm finetuning.arXiv preprint arXiv:2406.09044, 2024. 5

work page arXiv 2024
[37]

S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning

Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 2, 3

work page 2022
[38]

Dualprompt: Comple- mentary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoyu Sun, Haohan Zhang, Ching-Yao Lee, Xinlei Ren, Guodong Su, Vincent Perot, Jennifer Dy, et al. Dualprompt: Comple- mentary prompting for rehearsal-free continual learning. In European Conference on Computer Vision (ECCV), pages 631–648. Springer, 2022. 3, 7

work page 2022
[39]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022. 2, 3, 7

work page 2022
[40]

Robust fine-tuning of zero- shot models

Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero- shot models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 6

work page 2022
[41]

Synthetic data is an elegant gift for continual vision-language models

Bin Wu, Wuxuan Shi, Jinqiao Wang, and Mang Ye. Synthetic data is an elegant gift for continual vision-language models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2, 6, 4, 7

work page 2025
[42]

Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022

Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guo- qiang Liang, Peng Wang, and Yanning Zhang. Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022. 3

work page 2022
[43]

Learning bayesian sparse net- works with full experience replay for continual learning

Qingsen Yan, Dong Gong, Yuhang Liu, Anton van den Hen- gel, and Javen Qinfeng Shi. Learning bayesian sparse net- works with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 109–118, 2022. 2

work page 2022
[44]

Der: Dynami- cally expandable representation for class incremental learning

Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynami- cally expandable representation for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023,

work page
[45]

Yu, and Irwin King

Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, and Irwin King. Recent advances of mul- timodal continual learning: A comprehensive survey.arXiv preprint, 2024. 1

work page 2024
[46]

Boosting continual learning of vision-language models via mixture-of-experts adapters

Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, and You He. Boosting continual learning of vision-language models via mixture-of-experts adapters. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23219–23230,

work page
[47]

Task residual for tuning vision-language models

Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision-language models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

work page 2023
[48]

10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang. 10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models. InEu- ropean Conference on Computer Vision (ECCV). Springer,

work page
[49]

Continual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. InProceedings of the International Conference on Machine Learning (ICML), pages 3987–3995. PMLR, 2017. 2

work page 2017
[50]

Preventing zero-shot transfer degradation in continual learning of vision-language models

Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xi- angyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint, 2023. 1, 2, 3, 4, 6, 7

work page 2023
[51]

Continual learning with pre-trained models: A survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained models: A survey. InProceedings of the International Joint Confer- ence on Artificial Intelligence (IJCAI), 2024. 7

work page 2024
[52]

Conditional prompt learning for vision-language models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16816–16825,

work page
[53]

ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024

Nan Zhou, Jiaxin Chen, and Di Huang. ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024. 1, 4 11 Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting Supplementary Material 0.0 0.2 0.4 0.6 0.8 Cutoff threshold 69.6 69.8 70.0 70.2 70.4Transfer Score...

work page 2024
[54]

Transfer

The row rank dimension of the LoRA adapter is set to 64 in our default setting (Ours) and reduced to 4 in the parameter-efficient variant (Ours†). Both prefix and adapter modules are integrated into all 12 layers of the visual and text encoders. All experiments are conducted using a sin- gle NVIDIA 4090 GPU. For RePA, the bias matrix BG i is initialized t...

work page

[1] [1]

Memory aware synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2

work page 2018

[2] [2]

Task-free continual learning

Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. Task-free continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11254–11263, 2019. 2

work page 2019

[3] [3]

Rainbow memory: Continual learn- ing with a memory of diverse samples

Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 2

work page 2021

[4] [4]

Dokania, Thalaiyasingam Ajan- than, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. InPro- ceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018. 2

work page 2018

[5] [5]

Learning without memorizing

Pramit Dhar, Rajeev Ranjan Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5138–5146, 2019. 1

work page 2019

[6] [6]

Douillard, A

A. Douillard, A. Ramé, G. Couairon, and M. Cord. Dytox: Transformers for continual learning with dynamic token ex- pansion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9285–9295, 2022. 2

work page 2022

[7] [7]

Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks

Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, and Hung-Yi Lee. Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks. InFindings of the Association for Computational Linguistics, pages 2608–2621,

work page

[8] [8]

Enhanced continual learning of vision-language models with model fusion

Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, and Weiran Huang. Enhanced continual learning of vision-language models with model fusion. InICLR 2025 Workshop. ICLR, 2025. 2

work page 2025

[9] [9]

Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024

Naoki Hiratani. Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024. 1, 7

work page 2024

[10] [10]

D. Jung, D. Han, J. Bang, and H. Song. Generating instance- level prompts for rehearsal-free continual learning. InPro- ceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11813–11823, 2023. 3

work page 2023

[11] [11]

Maple: Multi- modal prompt learning

Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. Maple: Multi- modal prompt learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, 2023. 1

work page 2023

[12] [12]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017. 2

work page 2017

[13] [13]

A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021

Matthias De Lange, Rahaf Aljundi, Mateusz Masana, Sophie Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021. 1

work page 2021

[14] [14]

Mix- ture of experts meets prompt-based continual learning

Minh Le, An Nguyen The, Huy Nguyen, Thien Trang Nguyen Vu, Huyen Trang Pham, Linh Ngo Van, and Nhat Ho. Mix- ture of experts meets prompt-based continual learning. InAd- vances in Neural Information Processing Systems (NeurIPS),

work page

[15] [15]

Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017. 2

work page 2017

[16] [16]

Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting

Xilai Li, Yuezhou Zhou, Tianjun Wu, Richard Socher, and Caiming Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), pages 3925–3934. PMLR, 2019. 2

work page 2019

[17] [17]

X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00119,

work page arXiv

[18] [18]

Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024

Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, and Peng Wang. Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024. 1, 2, 3, 6, 7

work page 2024

[19] [19]

Inflora: Interference-free low-rank adaptation for continual learning

Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23638–23647, 2024. 2, 5

work page 2024

[20] [20]

C-clip: Multimodal continual learning for vision-language model

Wenzhuo Liu, Fei Zhu, Longhui Wei, and Qi Tian. C-clip: Multimodal continual learning for vision-language model. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3

work page 2025

[21] [21]

Mnemonics training: Multi-class incremental learning without forgetting

Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12245–12254, 2020. 2, 6

work page 2020

[22] [22]

Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024

Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, and Yan Wang. Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024. 1, 3, 6, 7

work page 2024

[23] [23]

Class- incremental exemplar compression for class-incremental learning

Zilin Luo, Yaoyao Liu, Bernt Schiele, and Qianru Sun. Class- incremental exemplar compression for class-incremental learning. InProceedings of the IEEE/CVF Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 11371–11380, 2023. 2

work page 2023

[24] [24]

Packnet: Adding multi- ple tasks to a single network by iterative pruning

Arun Mallya and Svetlana Lazebnik. Packnet: Adding multi- ple tasks to a single network by iterative pruning. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, 2018. 2

work page 2018

[25] [25]

Pissa: Prin- cipal singular values and singular vectors adaptation of large language models

Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Prin- cipal singular values and singular vectors adaptation of large language models. InAdvances in Neural Information Pro- cessing Systems, 2024. 5

work page 2024

[26] [26]

On the role of attention in prompt-tuning

Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, and Christos Thrampoulidis. On the role of attention in prompt- tuning.arXiv preprint arXiv:2306.03435, 2023. 4

work page arXiv 2023

[27] [27]

Dis- secting query-key interaction in vision transformers

Xu Pan, Aaron Philip, Ziqian Xie, and Odelia Schwartz. Dis- secting query-key interaction in vision transformers. InAd- vances in Neural Information Processing Systems (NeurIPS),

work page

[28] [28]

Spotlight Presentation. 1

work page

[29] [29]

Learning transferable visual models from natural language supervision.arXiv preprint,

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.arXiv preprint,

work page

[30] [30]

The- ory, analysis, and best practices for sigmoid self-attention

Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, and Russ Webb. The- ory, analysis, and best practices for sigmoid self-attention. arXiv preprint, 2025. 5, 8

work page 2025

[31] [31]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 2001–2010, 2017. 1, 6

work page 2001

[32] [32]

Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

work page

[33] [33]

Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models

Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Han- tao Zhou, Hengshuang Zhao, Xiu Li, and Jiaya Jia. Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models. In European Conference on Computer Vision (ECCV), pages 346–365. Springer, 2024. 1, 2, 3, 5, 6, 7, 8, 4

work page 2024

[34] [34]

Hydralora: An asymmetric lora architecture for efficient fine-tuning

Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Cheng zhong Xu. Hydralora: An asymmetric lora architecture for efficient fine-tuning. InAdvances in Neural Information Processing Systems, 2024. 5, 2

work page 2024

[35] [35]

Sclip: Rethinking self-attention for dense vision-language inference

Feng Wang, Jieru Mei, and Alan Yuille. Sclip: Rethinking self-attention for dense vision-language inference. InPro- ceedings of the European Conference on Computer Vision (ECCV), 2024. 1

work page 2024

[36] [36]

Milora: Harnessing mi- nor singular components for parameter-efficient llm fine- tuning.arXiv preprint arXiv:2406.09044,

Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, and Yun Chen. Milora: Harnessing minor singular compo- nents for parameter-efficient llm finetuning.arXiv preprint arXiv:2406.09044, 2024. 5

work page arXiv 2024

[37] [37]

S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning

Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 2, 3

work page 2022

[38] [38]

Dualprompt: Comple- mentary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoyu Sun, Haohan Zhang, Ching-Yao Lee, Xinlei Ren, Guodong Su, Vincent Perot, Jennifer Dy, et al. Dualprompt: Comple- mentary prompting for rehearsal-free continual learning. In European Conference on Computer Vision (ECCV), pages 631–648. Springer, 2022. 3, 7

work page 2022

[39] [39]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022. 2, 3, 7

work page 2022

[40] [40]

Robust fine-tuning of zero- shot models

Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero- shot models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 6

work page 2022

[41] [41]

Synthetic data is an elegant gift for continual vision-language models

Bin Wu, Wuxuan Shi, Jinqiao Wang, and Mang Ye. Synthetic data is an elegant gift for continual vision-language models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2, 6, 4, 7

work page 2025

[42] [42]

Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022

Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guo- qiang Liang, Peng Wang, and Yanning Zhang. Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022. 3

work page 2022

[43] [43]

Learning bayesian sparse net- works with full experience replay for continual learning

Qingsen Yan, Dong Gong, Yuhang Liu, Anton van den Hen- gel, and Javen Qinfeng Shi. Learning bayesian sparse net- works with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 109–118, 2022. 2

work page 2022

[44] [44]

Der: Dynami- cally expandable representation for class incremental learning

Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynami- cally expandable representation for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023,

work page

[45] [45]

Yu, and Irwin King

Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, and Irwin King. Recent advances of mul- timodal continual learning: A comprehensive survey.arXiv preprint, 2024. 1

work page 2024

[46] [46]

Boosting continual learning of vision-language models via mixture-of-experts adapters

Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, and You He. Boosting continual learning of vision-language models via mixture-of-experts adapters. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23219–23230,

work page

[47] [47]

Task residual for tuning vision-language models

Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision-language models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

work page 2023

[48] [48]

10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models

Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang. 10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models. InEu- ropean Conference on Computer Vision (ECCV). Springer,

work page

[49] [49]

Continual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. InProceedings of the International Conference on Machine Learning (ICML), pages 3987–3995. PMLR, 2017. 2

work page 2017

[50] [50]

Preventing zero-shot transfer degradation in continual learning of vision-language models

Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xi- angyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint, 2023. 1, 2, 3, 4, 6, 7

work page 2023

[51] [51]

Continual learning with pre-trained models: A survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained models: A survey. InProceedings of the International Joint Confer- ence on Artificial Intelligence (IJCAI), 2024. 7

work page 2024

[52] [52]

Conditional prompt learning for vision-language models

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16816–16825,

work page

[53] [53]

ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024

Nan Zhou, Jiaxin Chen, and Di Huang. ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024. 1, 4 11 Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting Supplementary Material 0.0 0.2 0.4 0.6 0.8 Cutoff threshold 69.6 69.8 70.0 70.2 70.4Transfer Score...

work page 2024

[54] [54]

Transfer

The row rank dimension of the LoRA adapter is set to 64 in our default setting (Ours) and reduced to 4 in the parameter-efficient variant (Ours†). Both prefix and adapter modules are integrated into all 12 layers of the visual and text encoders. All experiments are conducted using a sin- gle NVIDIA 4090 GPU. For RePA, the bias matrix BG i is initialized t...

work page