pith. machine review for the scientific record. sign in

arxiv: 2604.11112 · v1 · submitted 2026-04-13 · 💻 cs.LG · cs.CV

Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning

Pith reviewed 2026-05-10 16:39 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords class-incremental learningknowledge distillationpre-trained modelsquantum gatingcatastrophic forgettingtask embeddingscontinual learningadapters
0
0 comments X

The pith

Quantum-gated task modulation guides knowledge distillation to reduce forgetting in pre-trained model class-incremental learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the QKD framework that uses quantum gating to manage knowledge transfer between tasks in class-incremental learning with pre-trained models. It targets the entanglement of multi-task subspaces that causes catastrophic forgetting when adding new classes. The approach introduces a quantum-gated task modulation mechanism to dynamically model how samples relate to different task embeddings and then applies weighted distillation from old adapters to new ones. A sympathetic reader would care because continuous learning without erasing prior knowledge is essential for deploying models in real-world streams of data such as robotics or adaptive classification systems.

Core claim

We propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embeddings, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces.

What carries the argument

The quantum-gated task modulation gating mechanism, which models relational dependencies among task embeddings to dynamically capture sample-to-task relevance and direct weighted knowledge distillation between old and new adapters.

If this is right

  • QKD mitigates catastrophic forgetting when pre-trained models learn new classes from a task stream.
  • The framework achieves state-of-the-art accuracy on class-incremental learning benchmarks.
  • Dynamic sample-to-task relevance improves calibration of task routing parameters during both training and inference.
  • Representation gaps between independent task subspaces are reduced through correlation-weighted distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The quantum-inspired gating could be tested in other continual learning settings such as domain-incremental or task-free scenarios.
  • If the method scales, it may reduce the need for explicit replay buffers or parameter isolation in long task sequences.
  • Similar gating structures might apply to multi-modal or federated continual learning where task boundaries are soft.

Load-bearing premise

The quantum-gated task modulation gating mechanism accurately captures sample-to-task relevance and produces distillation weights that transfer useful knowledge without creating new interference.

What would settle it

Training QKD on a standard benchmark such as split CIFAR-100 or ImageNet-1000 incremental splits and measuring whether average forgetting rates drop below those of baseline pre-trained model CIL methods that use fixed adapters or unweighted distillation.

Figures

Figures reproduced from arXiv: 2604.11112 by Huiyu Xiao, Jiarui Cao, Linjie Li, Yang Ji, Zhenyu Wu.

Figure 1
Figure 1. Figure 1: Illustration of QKD. Left: The current task sample is encoded by the frozen PTM and the first adapter, while old adapters generate task representations forming a task pool. These representations are mapped into the quantum space to compute task correlations. Right: The task correlations produce task scores, which weight the feature-level knowledge distillation to guide new adapter training. qubits. To furt… view at source ↗
Figure 2
Figure 2. Figure 2: Performance curve of various methods under different settings. The relative improvement of QKD over the second-best method [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental results with large base classes. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Further analysis on hyperparameter robustness. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level representations are rigidly fixed. To address this issue, we propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embedding, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation guided by these task-embedding-level correlation weights from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces. Extensive experiments demonstrate that QKD effectively mitigates forgetting and achieves state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework for class-incremental learning with pre-trained models. It introduces a quantum-gated task modulation gating mechanism to model relational dependencies among task embeddings, dynamically capture sample-to-task relevance during joint training and inference, and guide task-interaction knowledge distillation from old to new adapters using correlation weights, with the goal of bridging representation gaps between task subspaces and mitigating catastrophic forgetting while achieving state-of-the-art performance.

Significance. If the quantum gating mechanism demonstrably provides advantages over classical alternatives in modeling task interactions and the experimental results hold under rigorous ablations, this could represent a meaningful advance in continual learning by enabling more flexible inter-task knowledge transfer in PTM-based CIL without introducing additional interference.

major comments (2)
  1. [Method (quantum-gated task modulation gating mechanism)] The central claim that the quantum-gated task modulation mechanism accurately captures sample-to-task relevance and produces reliable distillation weights rests on an unverified assumption that the quantum component supplies modeling power beyond standard classical gating (e.g., attention or learned embeddings). No formulation is given showing use of quantum-specific properties such as superposition or non-simulable measurement, raising the risk that gains are attributable to the added distillation pathway or extra parameters rather than the claimed quantum gating.
  2. [Experiments] The abstract asserts that extensive experiments demonstrate SOTA performance and effective forgetting mitigation, yet the provided description supplies no quantitative results, ablation studies isolating the quantum gating contribution, baseline comparisons, or error analysis. This prevents verification that the described mechanism supports the claims and makes the experimental evidence load-bearing for the overall contribution.
minor comments (1)
  1. [Abstract] The abstract introduces terms such as 'quantum gating outputs' and 'task-embedding-level correlation weights' without a preceding definition or reference to their computation, which reduces clarity for readers unfamiliar with the framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions where the manuscript will be updated to strengthen the presentation.

read point-by-point responses
  1. Referee: [Method (quantum-gated task modulation gating mechanism)] The central claim that the quantum-gated task modulation mechanism accurately captures sample-to-task relevance and produces reliable distillation weights rests on an unverified assumption that the quantum component supplies modeling power beyond standard classical gating (e.g., attention or learned embeddings). No formulation is given showing use of quantum-specific properties such as superposition or non-simulable measurement, raising the risk that gains are attributable to the added distillation pathway or extra parameters rather than the claimed quantum gating.

    Authors: We thank the referee for this important observation. Section 3.2 of the manuscript presents the quantum-gated task modulation as a parameterized circuit acting on task embeddings, with rotation and controlled-phase operations intended to induce superposition-like states across task relations. However, we acknowledge that the original text did not explicitly contrast these operations against classical attention or derive the non-simulable measurement step in detail. In the revision we will add a dedicated subsection with the explicit circuit diagram, the mathematical expansion showing how the measurement projects entangled states into correlation weights, and a new ablation that replaces the quantum gates with an equivalent classical multi-head attention module while keeping parameter count matched. This will allow readers to isolate the contribution of the quantum formulation. revision: yes

  2. Referee: [Experiments] The abstract asserts that extensive experiments demonstrate SOTA performance and effective forgetting mitigation, yet the provided description supplies no quantitative results, ablation studies isolating the quantum gating contribution, baseline comparisons, or error analysis. This prevents verification that the described mechanism supports the claims and makes the experimental evidence load-bearing for the overall contribution.

    Authors: We agree that the experimental evidence must be presented with sufficient detail for verification. The full manuscript contains results on CIFAR-100, ImageNet-100, and CUB-200 with tables reporting average accuracy, forgetting rate, and backward transfer against more than ten baselines. Nevertheless, the original submission did not include a dedicated ablation isolating the quantum gates from the distillation pathway or report standard deviations across five random seeds. In the revised version we will expand the experimental section with (i) a table comparing QKD against a classical-gating counterpart, (ii) per-task accuracy curves with error bars, and (iii) an explicit error analysis discussing variance on the most challenging task sequences. These additions directly address the referee’s concern. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal lacks equations or self-referential derivations

full rationale

The provided abstract and description contain no equations, derivations, or mathematical steps that could reduce to inputs by construction. The paper proposes a QKD framework using quantum gating for task-interaction distillation but presents this at a descriptive level without fitting parameters to data and relabeling them as predictions, self-citations that bear the central load, or ansatzes smuggled via prior work. No load-bearing step equates the claimed quantum gating outputs to a classical function by definition or renames a known result. The derivation chain is therefore self-contained as a methodological proposal rather than a tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the effectiveness of a newly introduced quantum-gated modulation mechanism whose behavior is not derived from prior results or external benchmarks.

invented entities (1)
  • quantum-gated task modulation gating mechanism no independent evidence
    purpose: to model relational dependencies among task embeddings and dynamically capture sample-to-task relevance for distillation
    This component is introduced in the proposal without reference to independent prior validation or derivation from established quantum or classical methods.

pith-pipeline@v0.9.0 · 5493 in / 1210 out tokens · 43709 ms · 2026-05-10T16:39:10.192625+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    Quantum ma- chine learning.Nature, 549(7671):195–202, 2017

    Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum ma- chine learning.Nature, 549(7671):195–202, 2017. 3

  2. [2]

    Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022

    Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022. 3

  3. [3]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 7

  4. [4]

    Learning without mem- orizing

    Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without mem- orizing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5138–5146,

  5. [5]

    Few-shot class- incremental learning via relation knowledge distillation

    Songlin Dong, Xiaopeng Hong, Xiaoyu Tao, Xinyuan Chang, Xing Wei, and Yihong Gong. Few-shot class- incremental learning via relation knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 1255–1263, 2021. 2

  6. [6]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3

  7. [7]

    Catastrophic forgetting in connectionist networks.Trends in cognitive sciences, 3(4):128–135, 1999

    Robert M French. Catastrophic forgetting in connectionist networks.Trends in cognitive sciences, 3(4):128–135, 1999. 1, 2

  8. [8]

    R-dfcil: Relation-guided representation learning for data- free class incremental learning

    Qiankun Gao, Chen Zhao, Bernard Ghanem, and Jian Zhang. R-dfcil: Relation-guided representation learning for data- free class incremental learning. InEuropean Conference on Computer Vision, pages 423–439. Springer, 2022. 2

  9. [9]

    The many faces of robust- ness: A critical analysis of out-of-distribution generalization

    Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kada- vath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robust- ness: A critical analysis of out-of-distribution generalization. InProceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021. 2, 6

  10. [10]

    Natural adversarial examples

    Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Stein- hardt, and Dawn Song. Natural adversarial examples. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 15262–15271, 2021. 2, 6

  11. [11]

    Learning a unified classifier incrementally via rebalancing

    Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 831–839,

  12. [12]

    Power of data in quantum machine learning.Na- ture communications, 12(1):2631, 2021

    Hsin-Yuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, and Jarrod R McClean. Power of data in quantum machine learning.Na- ture communications, 12(1):2631, 2021. 3

  13. [13]

    Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neu- ral networks.Proceedings of the national academy of sci- ences, 114(13):3521–3526, 2017. 1

  14. [14]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 2, 6

  15. [15]

    Tae: Task- aware expandable representation for long tail class incre- mental learning

    Linjie Li, Zhenyu Wu, Jiaming Liu, and Yang Ji. Tae: Task- aware expandable representation for long tail class incre- mental learning. InProceedings of the Asian Conference on Computer Vision, pages 3917–3933, 2024. 2

  16. [16]

    Mote: Mixture of task-specific experts for pre-trained model-based class- incremental learning.Knowledge-Based Systems, page 113795, 2025

    Linjie Li, Zhenyu Wu, and Yang Ji. Mote: Mixture of task-specific experts for pre-trained model-based class- incremental learning.Knowledge-Based Systems, page 113795, 2025. 1, 3, 6

  17. [17]

    Learning without forgetting

    Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelli- gence, 40(12):2935–2947, 2017. 1, 2

  18. [18]

    Scaling & shifting your features: A new baseline for efficient model tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022

    Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & shifting your features: A new baseline for efficient model tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022. 3

  19. [19]

    Quantum Relational Knowledge Distillation,

    Chen-Yu Liu, Kuan-Cheng Chen, Keisuke Murota, Samuel Yen-Chi Chen, and Enrico Rinaldi. Quantum relational knowledge distillation.arXiv preprint arXiv:2508.13054,

  20. [20]

    Mnemonics training: Multi-class incremental learning without forgetting

    Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, and Qianru Sun. Mnemonics training: Multi-class incremental learning without forgetting. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 12245–12254, 2020. 2

  21. [21]

    Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory

    James L McClelland, Bruce L McNaughton, and Randall C O’Reilly. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102(3):419, 1995. 1

  22. [22]

    Ranpac: Ran- dom projections and pre-trained models for continual learn- ing.Advances in Neural Information Processing Systems, 36, 2024

    Mark D McDonnell, Dong Gong, Amin Parvaneh, Ehsan Abbasnejad, and Anton van den Hengel. Ranpac: Ran- dom projections and pre-trained models for continual learn- ing.Advances in Neural Information Processing Systems, 36, 2024. 3

  23. [24]

    Cambridge university press,

    Michael A Nielsen and Isaac L Chuang.Quantum computa- tion and quantum information. Cambridge university press,

  24. [25]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- ban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017. 6

  25. [26]

    icarl: Incremental classifier and representation learning

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017. 1, 2, 6, 7

  26. [27]

    Quantum machine learn- ing in feature hilbert spaces.Physical review letters, 122(4): 040504, 2019

    Maria Schuld and Nathan Killoran. Quantum machine learn- ing in feature hilbert spaces.Physical review letters, 122(4): 040504, 2019. 3

  27. [28]

    Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning

    James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. Coda-prompt: Contin- ual decomposed attention-based prompting for rehearsal-free continual learning. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 11909–1191...

  28. [29]

    Pilot: A pre-trained model-based contin- ual learning toolbox,

    Hai-Long Sun, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Pilot: A pre-trained model-based continual learning toolbox.arXiv preprint arXiv:2309.07117, 2023. 2

  29. [30]

    Exponential concentration in quantum kernel meth- ods.Nature communications, 15(1):5200, 2024

    Supanut Thanasilp, Samson Wang, Marco Cerezo, and Zo ¨e Holmes. Exponential concentration in quantum kernel meth- ods.Nature communications, 15(1):5200, 2024. 3

  30. [31]

    The caltech-ucsd birds-200-2011 dataset

    Catherine Wah, Steve Branson, Peter Welinder, Pietro Per- ona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011. 2, 6

  31. [32]

    Foster: Feature boosting and compression for class- incremental learning

    Fu-Yun Wang, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Foster: Feature boosting and compression for class- incremental learning. InEuropean conference on computer vision, pages 398–414. Springer, 2022. 1, 7

  32. [33]

    Memory replay with data compression for continual learning.arXiv preprint arXiv:2202.06592, 2022

    Liyuan Wang, Xingxing Zhang, Kuo Yang, Longhui Yu, Chongxuan Li, Lanqing Hong, Shifeng Zhang, Zhen- guo Li, Yi Zhong, and Jun Zhu. Memory replay with data compression for continual learning.arXiv preprint arXiv:2202.06592, 2022. 2

  33. [34]

    Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub- optimality.Advances in Neural Information Processing Sys- tems, 36, 2024

    Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, and Jun Zhu. Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub- optimality.Advances in Neural Information Processing Sys- tems, 36, 2024. 2

  34. [35]

    A comprehensive survey of continual learning: theory, method and application.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: theory, method and application.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2

  35. [36]

    Dualprompt: Complementary prompting for rehearsal-free continual learning

    Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vin- cent Perot, Jennifer Dy, et al. Dualprompt: Complementary prompting for rehearsal-free continual learning. InEuropean Conference on Computer Vision, pages 631–648. Springer,

  36. [37]

    Learning to prompt for con- tinual learning

    Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jen- nifer Dy, and Tomas Pfister. Learning to prompt for con- tinual learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 139–149,

  37. [38]

    Large scale incre- mental learning

    Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale incre- mental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382,

  38. [39]

    Learning similarity with cosine similarity ensemble.Information sciences, 307: 39–52, 2015

    Peipei Xia, Li Zhang, and Fanzhang Li. Learning similarity with cosine similarity ensemble.Information sciences, 307: 39–52, 2015. 7

  39. [40]

    Der: Dynam- ically expandable representation for class incremental learn- ing

    Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dynam- ically expandable representation for class incremental learn- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 3014–3023,

  40. [41]

    A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

    Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djo- longa, Andre Susano Pinto, Maxim Neumann, Alexey Doso- vitskiy, et al. A large-scale study of representation learning with the visual task adaptation benchmark.arXiv preprint arXiv:1910.04867, 2019. 2, 6

  41. [42]

    Slca: Slow learner with classifier align- ment for continual learning on a pre-trained model

    Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, and Yunchao Wei. Slca: Slow learner with classifier align- ment for continual learning on a pre-trained model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19148–19158, 2023. 3

  42. [43]

    A model or 603 exemplars: Towards memory-efficient class-incremental learning

    Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De-Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. InThe Eleventh International Conference on Learning Representations, 2022. 1, 2, 7

  43. [44]

    Revisiting class-incremental learning with pre- trained models: Generalizability and adaptivity are all you need.International Journal of Computer Vision, pages 1– 21, 2024

    Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, De-Chuan Zhan, and Ziwei Liu. Revisiting class-incremental learning with pre- trained models: Generalizability and adaptivity are all you need.International Journal of Computer Vision, pages 1– 21, 2024. 3, 6

  44. [45]

    Continual learning with pre-trained mod- els: A survey

    Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual learning with pre-trained mod- els: A survey. InIJCAI, pages 8363–8371, 2024. 1, 2, 3

  45. [46]

    Expandable subspace ensemble for pre-trained model- based class-incremental learning

    Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, and De-Chuan Zhan. Expandable subspace ensemble for pre-trained model- based class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23554–23564, 2024. 1, 3, 6

  46. [47]

    Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2, 3