Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
Pith reviewed 2026-05-10 04:35 UTC · model grok-4.3
The pith
Dynamically weighting prefixes by token importance improves continual learning for vision-language models
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that a gating module can assign importance-based weights to prefixes while deriving adapter weights as the residual difference from those prefixes, enabling more precise parameter-efficient updates that outperform uniform-weight prefix-tuning in sequential domain and class shifts.
What carries the argument
Dynamic Prefix Weighting (DPW) framework consisting of a gating module that scales prefix weights by input-token importance and a residual mechanism that activates adapters only when needed.
If this is right
- State-of-the-art performance is reached in domain-class incremental learning scenarios for VLMs.
- Adapters are engaged only when prefix-tuning alone is insufficient, limiting unnecessary changes.
- The model adapts more effectively to tokens that require different degrees of adjustment.
- Prior task knowledge is retained better across sequential domain and class shifts.
Where Pith is reading between the lines
- The same token-aware weighting idea could transfer to other efficient tuning techniques such as low-rank adapters.
- The approach may scale favorably when the number of sequential tasks grows large.
- It suggests a route for applying importance-based selection in continual learning settings outside vision-language models.
Load-bearing premise
The gating module can reliably estimate the relative importance of each input token, and the residual adapter weighting supplies additive benefit beyond plain prefix-tuning.
What would settle it
Experiments that replace the dynamic gating with uniform weights and observe no performance loss on the same domain-class incremental benchmarks would undermine the claim.
Figures
read the original abstract
We investigate recently introduced domain-class incremental learning scenarios for vision-language models (VLMs). Recent works address this challenge using parameter-efficient methods, such as prefix-tuning or adapters, which facilitate model adaptation to downstream tasks by incorporating task-specific information into input tokens through additive vectors. However, previous approaches often normalize the weights of these vectors, disregarding the fact that different input tokens require different degrees of adjustment. To overcome this issue, we propose Dynamic Prefix Weighting (DPW), a framework that dynamically assigns weights to prefixes, complemented by adapters. DPW consists of 1) a gating module that adjusts the weights of each prefix based on the importance of the corresponding input token, and 2) a weighting mechanism that derives adapter output weights as a residual of prefix-tuning weights, ensuring that adapters are utilized only when necessary. Experimental results demonstrate that our method achieves state-of-the-art performance in domain-class incremental learning scenarios for VLMs. The code is available at: https://github.com/YonseiML/dpw.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Dynamic Prefix Weighting (DPW) to improve domain-class incremental learning for vision-language models. It augments prefix-tuning with (1) a gating module that computes token-specific weights for the prefixes instead of uniform normalization and (2) a residual adapter-weighting mechanism that activates adapters only when the prefix adjustment is insufficient. The central claim is that this combination yields state-of-the-art performance on the relevant continual-learning benchmarks while remaining parameter-efficient; public code is provided for verification.
Significance. If the reported gains hold under rigorous re-evaluation, the work supplies a concrete, verifiable improvement over standard prefix-tuning and adapter baselines in a practically important setting. The explicit public code release is a notable strength that directly supports reproducibility of the SOTA numbers.
minor comments (3)
- The abstract states that the gating module 'adjusts the weights of each prefix based on the importance of the corresponding input token,' but does not specify the exact functional form or training objective of the gate; a short equation or pseudocode block in §3 would remove ambiguity for readers.
- Table captions and axis labels should explicitly indicate whether reported metrics are averaged over multiple random seeds and whether error bars or standard deviations are shown; this is especially important for incremental-learning claims where variance can be high.
- The residual adapter weighting is described as 'deriving adapter output weights as a residual of prefix-tuning weights.' A one-sentence clarification of the exact residual formula (e.g., whether it is a simple subtraction or a learned scaling) would aid implementation.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work on Dynamic Prefix Weighting for domain-class incremental learning in VLMs, the recognition of its practical importance, and the recommendation for minor revision. The emphasis on reproducibility via public code is appreciated.
Circularity Check
No significant circularity detected
full rationale
The paper introduces an explicitly new framework (DPW) with two defined components—a gating module for token-specific prefix weights and a residual adapter weighting mechanism—rather than deriving any result from prior equations or self-citations by construction. The central claim is empirical (SOTA on domain-class incremental benchmarks), supported by external evaluation and public code, with no load-bearing mathematical derivation, fitted-parameter prediction, uniqueness theorem, or ansatz smuggled via self-citation. The approach is self-contained against standard prefix-tuning baselines.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Prefix-tuning and adapters remain effective adaptation mechanisms when their weights are made input-dependent via gating and residuals.
invented entities (2)
-
Gating module
no independent evidence
-
Residual adapter weighting mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Memory aware synapses: Learning what (not) to forget
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), pages 139–154, 2018. 2
work page 2018
-
[2]
Rahaf Aljundi, Klaas Kelchtermans, and Tinne Tuytelaars. Task-free continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11254–11263, 2019. 2
work page 2019
-
[3]
Rainbow memory: Continual learn- ing with a memory of diverse samples
Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 2
work page 2021
-
[4]
Dokania, Thalaiyasingam Ajan- than, and Philip H
Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. InPro- ceedings of the European Conference on Computer Vision (ECCV), pages 532–547, 2018. 2
work page 2018
-
[5]
Pramit Dhar, Rajeev Ranjan Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without memorizing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5138–5146, 2019. 1
work page 2019
-
[6]
A. Douillard, A. Ramé, G. Couairon, and M. Cord. Dytox: Transformers for continual learning with dynamic token ex- pansion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9285–9295, 2022. 2
work page 2022
-
[7]
Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks
Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, and Hung-Yi Lee. Adapterbias: Parameter-efficient token-dependent rep- resentation shift for adapters in nlp tasks. InFindings of the Association for Computational Linguistics, pages 2608–2621,
-
[8]
Enhanced continual learning of vision-language models with model fusion
Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, and Weiran Huang. Enhanced continual learning of vision-language models with model fusion. InICLR 2025 Workshop. ICLR, 2025. 2
work page 2025
-
[9]
Naoki Hiratani. Disentangling and mitigating the impact of task similarity for continual learning.arXiv preprint, 2024. 1, 7
work page 2024
-
[10]
D. Jung, D. Han, J. Bang, and H. Song. Generating instance- level prompts for rehearsal-free continual learning. InPro- ceedings of the IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 11813–11823, 2023. 3
work page 2023
-
[11]
Maple: Multi- modal prompt learning
Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan. Maple: Multi- modal prompt learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, 2023. 1
work page 2023
-
[12]
Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017. 2
work page 2017
-
[13]
Matthias De Lange, Rahaf Aljundi, Mateusz Masana, Sophie Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(7):3366–3385, 2021. 1
work page 2021
-
[14]
Mix- ture of experts meets prompt-based continual learning
Minh Le, An Nguyen The, Huy Nguyen, Thien Trang Nguyen Vu, Huyen Trang Pham, Linh Ngo Van, and Nhat Ho. Mix- ture of experts meets prompt-based continual learning. InAd- vances in Neural Information Processing Systems (NeurIPS),
-
[15]
Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. Overcoming catastrophic forget- ting by incremental moment matching.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017. 2
work page 2017
-
[16]
Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting
Xilai Li, Yuezhou Zhou, Tianjun Wu, Richard Socher, and Caiming Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. InPro- ceedings of the International Conference on Machine Learn- ing (ICML), pages 3925–3934. PMLR, 2019. 2
work page 2019
- [17]
-
[18]
Yukun Li, Guansong Pang, Wei Suo, Chenchen Jing, Yuling Xi, Lingqiao Liu, Hao Chen, Guoqiang Liang, and Peng Wang. Coleclip: Open-domain continual learning via joint task prompt and vocabulary learning.arXiv preprint, 2024. 1, 2, 3, 6, 7
work page 2024
-
[19]
Inflora: Interference-free low-rank adaptation for continual learning
Yan-Shuo Liang and Wu-Jun Li. Inflora: Interference-free low-rank adaptation for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23638–23647, 2024. 2, 5
work page 2024
-
[20]
C-clip: Multimodal continual learning for vision-language model
Wenzhuo Liu, Fei Zhu, Longhui Wei, and Qi Tian. C-clip: Multimodal continual learning for vision-language model. InInternational Conference on Learning Representations (ICLR), 2025. 1, 2, 3
work page 2025
-
[21]
Mnemonics training: Multi-class incremental learning without forgetting
Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12245–12254, 2020. 2, 6
work page 2020
-
[22]
Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, and Yan Wang. Boosting open-domain con- tinual learning via leveraging intra-domain category-aware prototype.arXiv preprint, 2024. 1, 3, 6, 7
work page 2024
-
[23]
Class- incremental exemplar compression for class-incremental learning
Zilin Luo, Yaoyao Liu, Bernt Schiele, and Qianru Sun. Class- incremental exemplar compression for class-incremental learning. InProceedings of the IEEE/CVF Conference on 9 Computer Vision and Pattern Recognition (CVPR), pages 11371–11380, 2023. 2
work page 2023
-
[24]
Packnet: Adding multi- ple tasks to a single network by iterative pruning
Arun Mallya and Svetlana Lazebnik. Packnet: Adding multi- ple tasks to a single network by iterative pruning. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, 2018. 2
work page 2018
-
[25]
Pissa: Prin- cipal singular values and singular vectors adaptation of large language models
Fanxu Meng, Zhaohui Wang, and Muhan Zhang. Pissa: Prin- cipal singular values and singular vectors adaptation of large language models. InAdvances in Neural Information Pro- cessing Systems, 2024. 5
work page 2024
-
[26]
On the role of attention in prompt-tuning
Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, and Christos Thrampoulidis. On the role of attention in prompt- tuning.arXiv preprint arXiv:2306.03435, 2023. 4
-
[27]
Dis- secting query-key interaction in vision transformers
Xu Pan, Aaron Philip, Ziqian Xie, and Odelia Schwartz. Dis- secting query-key interaction in vision transformers. InAd- vances in Neural Information Processing Systems (NeurIPS),
-
[28]
Spotlight Presentation. 1
-
[29]
Learning transferable visual models from natural language supervision.arXiv preprint,
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision.arXiv preprint,
-
[30]
The- ory, analysis, and best practices for sigmoid self-attention
Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, and Russ Webb. The- ory, analysis, and best practices for sigmoid self-attention. arXiv preprint, 2025. 5, 8
work page 2025
-
[31]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 2001–2010, 2017. 1, 6
work page 2001
-
[32]
Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning
James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[33]
Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Han- tao Zhou, Hengshuang Zhao, Xiu Li, and Jiaya Jia. Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models. In European Conference on Computer Vision (ECCV), pages 346–365. Springer, 2024. 1, 2, 3, 5, 6, 7, 8, 4
work page 2024
-
[34]
Hydralora: An asymmetric lora architecture for efficient fine-tuning
Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, and Cheng zhong Xu. Hydralora: An asymmetric lora architecture for efficient fine-tuning. InAdvances in Neural Information Processing Systems, 2024. 5, 2
work page 2024
-
[35]
Sclip: Rethinking self-attention for dense vision-language inference
Feng Wang, Jieru Mei, and Alan Yuille. Sclip: Rethinking self-attention for dense vision-language inference. InPro- ceedings of the European Conference on Computer Vision (ECCV), 2024. 1
work page 2024
-
[36]
Hanqing Wang, Yixia Li, Shuo Wang, Guanhua Chen, and Yun Chen. Milora: Harnessing minor singular compo- nents for parameter-efficient llm finetuning.arXiv preprint arXiv:2406.09044, 2024. 5
-
[37]
S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning
Yabin Wang, Zhiwu Huang, and Xiaopeng Hong. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 2, 3
work page 2022
-
[38]
Dualprompt: Comple- mentary prompting for rehearsal-free continual learning
Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoyu Sun, Haohan Zhang, Ching-Yao Lee, Xinlei Ren, Guodong Su, Vincent Perot, Jennifer Dy, et al. Dualprompt: Comple- mentary prompting for rehearsal-free continual learning. In European Conference on Computer Vision (ECCV), pages 631–648. Springer, 2022. 3, 7
work page 2022
-
[39]
Learning to prompt for continual learning
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022. 2, 3, 7
work page 2022
-
[40]
Robust fine-tuning of zero- shot models
Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero- shot models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 6
work page 2022
-
[41]
Synthetic data is an elegant gift for continual vision-language models
Bin Wu, Wuxuan Shi, Jinqiao Wang, and Mang Ye. Synthetic data is an elegant gift for continual vision-language models. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2, 6, 4, 7
work page 2025
-
[42]
Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022
Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guo- qiang Liang, Peng Wang, and Yanning Zhang. Dual modality prompt tuning for vision-language pre-trained model.arXiv preprint, 2022. 3
work page 2022
-
[43]
Learning bayesian sparse net- works with full experience replay for continual learning
Qingsen Yan, Dong Gong, Yuhang Liu, Anton van den Hen- gel, and Javen Qinfeng Shi. Learning bayesian sparse net- works with full experience replay for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 109–118, 2022. 2
work page 2022
-
[44]
Der: Dynami- cally expandable representation for class incremental learning
Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynami- cally expandable representation for class incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023,
-
[45]
Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, and Irwin King. Recent advances of mul- timodal continual learning: A comprehensive survey.arXiv preprint, 2024. 1
work page 2024
-
[46]
Boosting continual learning of vision-language models via mixture-of-experts adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, and You He. Boosting continual learning of vision-language models via mixture-of-experts adapters. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23219–23230,
-
[47]
Task residual for tuning vision-language models
Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, and Xinchao Wang. Task residual for tuning vision-language models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3
work page 2023
-
[48]
Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, and Yu-Chiang Frank Wang. 10 Select and distill: Selective dual-teacher knowledge transfer for continual learning on vision-language models. InEu- ropean Conference on Computer Vision (ECCV). Springer,
-
[49]
Continual learning through synaptic intelligence
Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. InProceedings of the International Conference on Machine Learning (ICML), pages 3987–3995. PMLR, 2017. 2
work page 2017
-
[50]
Preventing zero-shot transfer degradation in continual learning of vision-language models
Zangwei Zheng, Mingyuan Ma, Kai Wang, Ziheng Qin, Xi- angyu Yue, and Yang You. Preventing zero-shot transfer degradation in continual learning of vision-language models. arXiv preprint, 2023. 1, 2, 3, 4, 6, 7
work page 2023
-
[51]
Continual learning with pre-trained models: A survey
Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained models: A survey. InProceedings of the International Joint Confer- ence on Artificial Intelligence (IJCAI), 2024. 7
work page 2024
-
[52]
Conditional prompt learning for vision-language models
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Conditional prompt learning for vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16816–16825,
-
[53]
Nan Zhou, Jiaxin Chen, and Di Huang. ivpt: Improving task-relevant information sharing in visual prompt tuning by cross-layer dynamic connection.arXiv preprint, 2024. 1, 4 11 Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting Supplementary Material 0.0 0.2 0.4 0.6 0.8 Cutoff threshold 69.6 69.8 70.0 70.2 70.4Transfer Score...
work page 2024
-
[54]
The row rank dimension of the LoRA adapter is set to 64 in our default setting (Ours) and reduced to 4 in the parameter-efficient variant (Ours†). Both prefix and adapter modules are integrated into all 12 layers of the visual and text encoders. All experiments are conducted using a sin- gle NVIDIA 4090 GPU. For RePA, the bias matrix BG i is initialized t...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.