Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Haemin Lee; Hankook Lee; Seoyoung Park

arxiv: 2604.04420 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Seoyoung Park , Haemin Lee , Hankook Lee This is my paper

Pith reviewed 2026-05-10 19:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual learningtask-free learningonline learningprompt tuningvision transformercatastrophic forgettingstreaming dataclassifier design

0 comments

The pith

Prompt selection from a pool is not necessary for state-of-the-art task-free online continual learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Task-free online continual learning requires models to learn from a non-stationary data stream without task boundaries or revisiting samples. The paper shows that prompt selection strategies, which choose from a pool based on input, often select poor prompts and underperform despite extra training. The authors instead propose using one fixed prompt inserted into every self-attention block, calculating logits via cosine similarity to limit forgetting, and masking out logits of classes absent from the current batch. This straightforward setup delivers top results on standard benchmarks. Readers should care because it simplifies real-time adaptation in dynamic environments where task cues are absent.

Core claim

The authors claim that prompt selection strategies in task-free online continual learning frequently fail to pick suitable prompts, leading to suboptimal performance. They demonstrate that a SinglePrompt method, which injects a single prompt into each self-attention block, uses a cosine similarity-based design for logits to reduce the forgetting effect in classifier weights, and masks logits for unexposed classes in the minibatch, achieves state-of-the-art performance across various online continual learning benchmarks without needing task boundaries or multiple passes over the data.

What carries the argument

The SinglePrompt mechanism, which consists of injecting one shared prompt into each self-attention block of a transformer, computing classification logits with cosine similarity instead of dot product, and applying logit masking for unseen classes, carries the argument by focusing optimization on the classifier while avoiding the pitfalls of adaptive selection.

Load-bearing premise

That the failures of prompt selection are general across methods and datasets, rather than specific to the implementations tested, and that the single prompt design with cosine logits and masking is broadly sufficient without any form of task information.

What would settle it

An experiment showing that an improved prompt selection strategy achieves higher accuracy than SinglePrompt on the same continual learning benchmarks, or ablation studies where removing the cosine design or masking causes SinglePrompt to underperform significantly.

Figures

Figures reproduced from arXiv: 2604.04420 by Haemin Lee, Hankook Lee, Seoyoung Park.

**Figure 1.** Figure 1: An overview of the proposed SinglePrompt. When minibatch Bt = {(x (i) t , y (i) t )} Bt i=1 is provided, it passes though a pretrained Vision Transformer encoder. At the i-th self-attention block fi, the input sequence hi−1 is given, and during the attention operation the learnable prompts p k i and p v i are prepended to the key and value, respectively. Only the class token from the encoder’s output seque… view at source ↗

**Figure 2.** Figure 2: Histograms of prompt selection counts per class on task-free continual learning using CIFAR100 [ [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt selection failures in task-based methods on CI [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the L2 norms of the weights for each [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study on Prompt length. The x-axis denotes the [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the Si-blurry scenario. The dataset for [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 6.** Figure 6: Anytime inference accuracy curves of SinglePrompt and [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 8.** Figure 8: Prompt selection failures on additional datasets, showing [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Failure of ConvPrompt [20] selection in task-based continual learning on CIFAR100 [13]. The x-axis represents the task ID of input sample and the y-axis indicates the average cosine similarity between each task’s samples and their assigned keys. (a) Result in the offline continual learning setting, where class information of upcoming tasks is available. Task similarity is computed using class descriptor… view at source ↗

read the original abstract

Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches have employed prompt selection, an adaptive strategy that selects prompts from a pool based on input signals. However, we observe that such selection strategies often fail to select appropriate prompts, yielding suboptimal results despite additional training of key parameters. Motivated by this observation, we propose a simple yet effective SinglePrompt that eliminates the need for prompt selection and focuses on classifier optimization. Specifically, we simply (i) inject a single prompt into each self-attention block, (ii) employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights, and (iii) mask logits for unexposed classes in the current minibatch. With this simple task-free design, our framework achieves state-of-the-art performance across various online continual learning benchmarks. Source code is available at https://github.com/efficient-learning-lab/SinglePrompt.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SinglePrompt drops prompt selection for a single prompt per attention block plus cosine logits and masking, and claims SOTA on task-free online CL benchmarks with public code.

read the letter

The paper's main claim is that prompt selection is often unnecessary in task-free online continual learning. They note that selection from a prompt pool frequently picks the wrong one even with extra training, so they replace it with a single prompt injected into each self-attention block, cosine-similarity logits to cut classifier bias, and per-minibatch masking of logits for unseen classes. This keeps the whole thing strictly task-free with no boundaries or replay buffers. The approach is new in its minimalism relative to recent pool-based prompt methods, and the design choices line up directly with the problems they flag. Public code is a real plus for anyone wanting to verify the numbers. The results on standard online CL benchmarks look competitive, which is the strongest part of the work. The soft spots sit in the experiments. The abstract asserts SOTA performance, but without seeing the full tables I want to check the exact baselines, number of runs, error bars, and whether ablations show each of the three changes actually moves the needle. The observation that selection fails needs concrete side-by-side evidence rather than just a statement. If those details hold, the central argument stands; if the gains shrink under stricter controls, the case for dropping selection weakens. This paper is aimed at people working on continual learning in streaming, non-stationary settings. A reader already following prompt-based or task-free CL work would find the simplification useful to test. It deserves a serious referee because the idea is straightforward, the code is available, and it pushes back on a current trend with reproducible claims. I would send it to review rather than desk-reject.

Referee Report

1 major / 2 minor

Summary. The paper observes that prompt selection strategies in task-free online continual learning often fail to select appropriate prompts despite extra training of key parameters. It proposes a simple SinglePrompt framework that (i) injects a single prompt into each self-attention block, (ii) employs a cosine similarity-based logit design to alleviate forgetting in classifier weights, and (iii) masks logits for unexposed classes per minibatch. This task-free design (no boundaries or replay) is claimed to achieve state-of-the-art performance across standard online continual learning benchmarks, with public code provided.

Significance. If the empirical results hold under rigorous verification, the work demonstrates that complex prompt selection may be unnecessary, shifting focus to minimal classifier optimizations in non-stationary streams. The public repository strengthens the contribution by enabling direct reproducibility checks on the reported SOTA benchmarks.

major comments (1)

[Abstract and §4] Abstract and §4 (Experiments): the SOTA claim requires explicit reporting of all baselines, statistical significance tests, number of runs, and ablations on the three proposed components; without these, it is unclear whether the performance gains are attributable to the single-prompt design or to implementation details.

minor comments (2)

[§3.2] §3.2: the cosine logit formulation should include a short derivation or reference showing how it explicitly counters the bias in classifier weights compared to standard softmax.
[Figure 2] Figure 2 or equivalent: clarify the masking operation's effect on the loss computation to confirm it does not inadvertently use future class information.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address the major comment on empirical rigor and SOTA claims below, agreeing that additional details will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the SOTA claim requires explicit reporting of all baselines, statistical significance tests, number of runs, and ablations on the three proposed components; without these, it is unclear whether the performance gains are attributable to the single-prompt design or to implementation details.

Authors: We agree that rigorous SOTA claims benefit from explicit and comprehensive reporting. In the revised manuscript, we will expand the experiments section (§4) and update the abstract as needed to: (i) explicitly list all baselines with their original citations and any adaptations for the task-free online continual learning setting; (ii) report all results as mean ± standard deviation over a specified number of independent runs (we will use 5 runs for consistency with common practice in the field); (iii) include statistical significance tests (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values) between SinglePrompt and the strongest baselines; and (iv) add a dedicated ablation study isolating the contribution of each of the three components—single prompt per attention block, cosine similarity-based logit design, and per-minibatch logit masking for unexposed classes—along with their cumulative effects. These changes will clarify that performance improvements arise from the proposed design choices rather than implementation artifacts. The publicly available code repository already supports full reproduction of the reported results, which can serve as an immediate verification aid. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's chain consists of an empirical observation (prompt selection often fails in task-free online CL) followed by a direct proposal of three concrete architectural choices (single prompt per self-attention block, cosine-similarity logits, per-minibatch logit masking) that are motivated by that observation and then validated on public benchmarks. No equations are presented whose outputs are defined in terms of their own inputs; no parameter is fitted on a subset and then relabeled as a prediction; no uniqueness theorem or ansatz is imported via self-citation to force the design; and the central claim (SOTA performance with a strictly task-free method) remains externally falsifiable via the linked repository. The construction is therefore self-contained and does not reduce to its own premises by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities are detailed. The method builds on standard transformer self-attention and continual learning assumptions like non-stationary data streams.

pith-pipeline@v0.9.0 · 5489 in / 1084 out tokens · 57453 ms · 2026-05-10T19:33:15.157182+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

inject a single prompt into each self-attention block

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Rainbow memory: Continual learn- ing with a memory of diverse samples

Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 6

work page 2021
[2]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InAdvances in Neural Information Processing Systems, pages 15920– 15930. Curran Associates, Inc., 2020. 6

work page 2020
[3]

arXiv preprint arXiv:2104.05025 , year=

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuyte- laars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning.arXiv preprint arXiv:2104.05025, 2021. 6

work page arXiv 2021
[4]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapt- ing vision transformers for scalable visual recognition.arXiv preprint arXiv:2205.13535, 2022. 8

work page arXiv 2022
[5]

Morgan & Claypool Publishers, 2018

Zhiyuan Chen and Bing Liu.Lifelong machine learning. Morgan & Claypool Publishers, 2018. 1

work page 2018
[6]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 2, 3

work page 2021
[7]

The many faces of robustness: A critical analysis of out-of-distribution generalization.ICCV, 2021

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A critical analysis of out-of-distribution generalization.ICCV, 2021. 2, 6, 7

work page 2021
[8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 1, 3

work page 2022
[9]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision (ECCV), 2022. 1

work page 2022
[10]

Advancing prompt-based methods for replay- independent general continual learning

Zhiqi Kang, Liyuan Wang, Xingxing Zhang, and Karteek Alahari. Advancing prompt-based methods for replay- independent general continual learning. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 1, 2, 3, 4, 5, 6, 7, 8

work page 2025
[11]

Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017. 6

work page 2017
[12]

Online continual learning on class incremental blurry task configuration with anytime inference

Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. InICLR, 2022. 6, 7, 1

work page 2022
[13]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009. 2, 3, 4, 5, 6, 7, 1

work page 2009
[14]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Yann Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 2, 6, 7

work page 2015
[15]

The power of scale for parameter-efficient prompt tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceed- ings of the 2021 Conference on Empirical Methods in Nat- ural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. 1, 3

work page 2021
[16]

Prefix-tuning: Optimiz- ing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimiz- ing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computa- tional Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, pages 4582–4597, Online, 2021. Association for Computational Linguistics. 5

work page 2021
[17]

Learning without forgetting

Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017. 6

work page 2017
[18]

Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning

Jun-Yeong Moon, Keon-Hee Park, Jung Uk Kim, and Gyeong-Moon Park. Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, 2023. 1, 2, 3, 4, 5, 6, 7, 8

work page 2023
[19]

Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lil- licrap, and Gregory Wayne. Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019. 6

work page 2019
[20]

Convolutional prompting meets lan- guage models for continual learning

Anurag Roy, Riddhiman Moulick, Vinay Verma, Saptarshi Ghosh, and Abir Das. Convolutional prompting meets lan- guage models for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 3, 4, 8, 2

work page 2024
[21]

Dualprompt: Complementary prompting for rehearsal-free continual learning.European Conference on Computer Vision, 2022

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning.European Conference on Computer Vision, 2022. 1, 3, 4, 5, 6, 8

work page 2022
[22]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149,

work page
[23]

Online-lora: Task-free online continual learning via low rank adaptation

Xiwen Wei, Guihong Li, and Radu Marculescu. Online-lora: Task-free online continual learning via low rank adaptation. arXiv preprint arXiv:2411.05663, 2024. 3, 7, 8 9

work page arXiv 2024
[24]

Low-rank few-shot adaptation of vision-language models

Maxime Zanella and Ismail Ben Ayed. Low-rank few-shot adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1593–1603, 2024. 1, 3

work page 2024
[25]

Continual learning with pre-trained mod- els: A survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained mod- els: A survey. InIJCAI, pages 8363–8371, 2024. 1

work page 2024
[26]

Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9851–9873, 2024

Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9851–9873, 2024. 1 10 Is Prompt Selection Necessary for Task-Free Online Continual Learning? Supplementary Material A. Evaluation Metrics In this section, we provide a...

work page 2024

[1] [1]

Rainbow memory: Continual learn- ing with a memory of diverse samples

Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8218–8227, 2021. 6

work page 2021

[2] [2]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InAdvances in Neural Information Processing Systems, pages 15920– 15930. Curran Associates, Inc., 2020. 6

work page 2020

[3] [3]

arXiv preprint arXiv:2104.05025 , year=

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuyte- laars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning.arXiv preprint arXiv:2104.05025, 2021. 6

work page arXiv 2021

[4] [4]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapt- ing vision transformers for scalable visual recognition.arXiv preprint arXiv:2205.13535, 2022. 8

work page arXiv 2022

[5] [5]

Morgan & Claypool Publishers, 2018

Zhiyuan Chen and Bing Liu.Lifelong machine learning. Morgan & Claypool Publishers, 2018. 1

work page 2018

[6] [6]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 2, 3

work page 2021

[7] [7]

The many faces of robustness: A critical analysis of out-of-distribution generalization.ICCV, 2021

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A critical analysis of out-of-distribution generalization.ICCV, 2021. 2, 6, 7

work page 2021

[8] [8]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 1, 3

work page 2022

[9] [9]

Vi- sual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision (ECCV), 2022. 1

work page 2022

[10] [10]

Advancing prompt-based methods for replay- independent general continual learning

Zhiqi Kang, Liyuan Wang, Xingxing Zhang, and Karteek Alahari. Advancing prompt-based methods for replay- independent general continual learning. InThe Thirteenth In- ternational Conference on Learning Representations, 2025. 1, 2, 3, 4, 5, 6, 7, 8

work page 2025

[11] [11]

Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017. 6

work page 2017

[12] [12]

Online continual learning on class incremental blurry task configuration with anytime inference

Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. InICLR, 2022. 6, 7, 1

work page 2022

[13] [13]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009. 2, 3, 4, 5, 6, 7, 1

work page 2009

[14] [14]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Yann Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 2, 6, 7

work page 2015

[15] [15]

The power of scale for parameter-efficient prompt tuning

Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. InProceed- ings of the 2021 Conference on Empirical Methods in Nat- ural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. 1, 3

work page 2021

[16] [16]

Prefix-tuning: Optimiz- ing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimiz- ing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computa- tional Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, pages 4582–4597, Online, 2021. Association for Computational Linguistics. 5

work page 2021

[17] [17]

Learning without forgetting

Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017. 6

work page 2017

[18] [18]

Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning

Jun-Yeong Moon, Keon-Hee Park, Jung Uk Kim, and Gyeong-Moon Park. Online class incremental learning on stochastic blurry task boundary via mask and visual prompt tuning. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, 2023. 1, 2, 3, 4, 5, 6, 7, 8

work page 2023

[19] [19]

Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lil- licrap, and Gregory Wayne. Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019. 6

work page 2019

[20] [20]

Convolutional prompting meets lan- guage models for continual learning

Anurag Roy, Riddhiman Moulick, Vinay Verma, Saptarshi Ghosh, and Abir Das. Convolutional prompting meets lan- guage models for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 3, 4, 8, 2

work page 2024

[21] [21]

Dualprompt: Complementary prompting for rehearsal-free continual learning.European Conference on Computer Vision, 2022

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning.European Conference on Computer Vision, 2022. 1, 3, 4, 5, 6, 8

work page 2022

[22] [22]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149,

work page

[23] [23]

Online-lora: Task-free online continual learning via low rank adaptation

Xiwen Wei, Guihong Li, and Radu Marculescu. Online-lora: Task-free online continual learning via low rank adaptation. arXiv preprint arXiv:2411.05663, 2024. 3, 7, 8 9

work page arXiv 2024

[24] [24]

Low-rank few-shot adaptation of vision-language models

Maxime Zanella and Ismail Ben Ayed. Low-rank few-shot adaptation of vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1593–1603, 2024. 1, 3

work page 2024

[25] [25]

Continual learning with pre-trained mod- els: A survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained mod- els: A survey. InIJCAI, pages 8363–8371, 2024. 1

work page 2024

[26] [26]

Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9851–9873, 2024

Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9851–9873, 2024. 1 10 Is Prompt Selection Necessary for Task-Free Online Continual Learning? Supplementary Material A. Evaluation Metrics In this section, we provide a...

work page 2024