pith. machine review for the scientific record. sign in

arxiv: 2512.03537 · v3 · submitted 2025-12-03 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins

Authors on Pith no claims yet

Pith reviewed 2026-05-17 01:47 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords continual learningknowledge distillationlightweight pluginsstability-plasticity dilemmaclassifier-proximal layerresidual correctionparameter efficiency
0
0 comments X

The pith

Distillation-aware Lightweight Components add small residual plugins near the classifier to improve accuracy in continual learning with minimal extra parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Distillation-aware Lightweight Components to ease the stability-plasticity tension in distillation-based continual learning. Small residual plugins sit in the classifier-proximal layer of the feature extractor so they can adjust semantic outputs for new tasks without rewriting earlier feature layers. A separate lightweight weighting unit scores and combines the different plugin outputs at inference time. On large benchmarks the approach raises accuracy while adding only a small fraction of the original backbone parameters and remains compatible with other continual-learning add-ons.

Core claim

DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction for better classification accuracy while minimizing disruption to the overall feature extraction process. During inference, plugin-enhanced representations are aggregated to produce classification predictions using a learned weighting unit that assigns importance scores to different plugins, delivering up to 8 percent accuracy gain on large-scale benchmarks with only a 4 percent increase in backbone parameters.

What carries the argument

Lightweight residual plugins inserted at the classifier-proximal layer of the shared feature extractor together with a learned weighting unit that combines their outputs.

If this is right

  • Distillation methods can now trade a modest parameter budget for higher final accuracy without changing the core training objectives.
  • The same plugin pattern can be stacked with existing plug-and-play continual-learning modules to obtain additive improvements.
  • Inference cost stays close to the original backbone because only a few extra lightweight modules are evaluated.
  • Storage overhead remains low since no separate model copies or large replay buffers are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same classifier-proximal correction idea could be tested in other settings where a frozen backbone must adapt to shifting output distributions.
  • If the weighting unit learns to ignore outdated plugins, the method might naturally support longer task sequences without explicit forgetting mitigation.
  • Measuring how much of the accuracy lift comes from the residual correction versus the weighting unit would help isolate the most effective part of the design.

Load-bearing premise

The assumption that residual corrections applied only at the classifier-proximal layer can improve semantic accuracy without disturbing the earlier feature extraction that distillation has already stabilized.

What would settle it

A controlled experiment on the same large-scale benchmarks in which the plugins and weighting unit are removed yet accuracy remains unchanged or higher would show the claimed gains do not come from the proposed components.

Figures

Figures reproduced from arXiv: 2512.03537 by Baile Xu, Furao Shen, Jian Zhao, Suorong Yang, Zhiming Xu.

Figure 1
Figure 1. Figure 1: Parameter-accuracy comparison of different CIL meth [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed DLC framework. Left: Training. When a new task t arrives, a dedicated plugin set Lt is created. Then sequentially train the feature extractor and Lt. Right: Test. The feature extractor sequentially loads all task plugins Lt to produce enhanced representations, which are concatenated, weighted by a gating unit, and classified. where layer inputs remain largely stationary. Here, we re￾visit it b… view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrix heatmaps of iCaRL with and w/o DLC [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of AT with and without the weighting unit for DLC-enhanced methods on CIFAR-100. weighting unit, as shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of inference FLOPs and per-sample infer [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Continual learning requires models to learn continuously while preserving prior knowledge under evolving data streams. Distillation-based methods are appealing for retaining past knowledge in a shared single-model framework with low storage overhead. However, they remain constrained by the stability-plasticity dilemma: knowledge acquisition and preservation are still optimized through coupled objectives, and existing enhancement methods do not alter this underlying bottleneck. To address this issue, we propose a plugin extension paradigm termed Distillation-aware Lightweight Components (DLC) for distillation-based CL. DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction for better classification accuracy while minimizing disruption to the overall feature extraction process. During inference, plugin-enhanced representations are aggregated to produce classification predictions. To mitigate interference from non-target plugins, we further introduce a lightweight weighting unit that learns to assign importance scores to different plugin-enhanced representations. DLC could deliver a significant 8% accuracy gain on large-scale benchmarks while introducing only a 4% increase in backbone parameters, highlighting its exceptional efficiency. Moreover, DLC is compatible with other plug-and-play CL enhancements and delivers additional gains when combined with them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Distillation-aware Lightweight Components (DLC) as a plugin extension for distillation-based continual learning. DLC inserts lightweight residual plugins into the classifier-proximal layer of the base feature extractor to perform semantic-level residual correction while minimizing disruption to feature extraction. During inference, a learned lightweight weighting unit aggregates the plugin-enhanced representations by assigning importance scores to mitigate interference from non-target plugins. The authors claim this yields an 8% accuracy improvement on large-scale benchmarks at the cost of only a 4% increase in backbone parameters and is compatible with other plug-and-play continual-learning enhancements.

Significance. If the empirical gains are confirmed with full experimental details, the work could advance distillation-based continual learning by providing a modular, low-overhead mechanism that decouples plasticity from stability through targeted residual corrections at the classifier-proximal stage. The reported efficiency (8% accuracy for 4% parameters) and composability with existing methods would be practically relevant strengths.

major comments (2)
  1. [Abstract] Abstract: the headline claim of an 8% accuracy gain on large-scale benchmarks with a 4% parameter increase is presented without naming the specific datasets, task counts, baselines, or error bars; this information is load-bearing for the central empirical result and must be supplied with the full experimental protocol.
  2. [Method] Method description: the architecture and training objective of the lightweight weighting unit that learns importance scores are not specified in sufficient detail to verify that it adequately mitigates interference from non-target plugins, which is a key assumption underlying the reported gains.
minor comments (2)
  1. [Abstract] Abstract: the wording 'DLC could deliver' is conditional; replace with a direct statement of the observed experimental outcome.
  2. Ensure all notation for residual plugins and weighting scores is introduced consistently and referenced to the corresponding equations or figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that greater specificity in the abstract and additional details on the weighting unit will improve clarity and verifiability. We will revise the manuscript accordingly to address both points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim of an 8% accuracy gain on large-scale benchmarks with a 4% parameter increase is presented without naming the specific datasets, task counts, baselines, or error bars; this information is load-bearing for the central empirical result and must be supplied with the full experimental protocol.

    Authors: We agree that the abstract would benefit from more concrete details to support the central claim. In the revised manuscript we will update the abstract to explicitly name the primary large-scale benchmarks (CIFAR-100 split into 10 tasks and ImageNet-100 split into 10 tasks), the main baselines (LwF, iCaRL, and DER++), and note that all reported gains include standard deviations over three random seeds. The complete experimental protocol, including hyper-parameters, memory budgets, and evaluation metrics, is already provided in Section 4; we will add a forward reference in the abstract to this section. revision: yes

  2. Referee: [Method] Method description: the architecture and training objective of the lightweight weighting unit that learns importance scores are not specified in sufficient detail to verify that it adequately mitigates interference from non-target plugins, which is a key assumption underlying the reported gains.

    Authors: We acknowledge that the current description of the lightweight weighting unit is insufficiently detailed. In the revised manuscript we will expand the relevant subsection to fully specify the architecture (a two-layer MLP with hidden dimension 64 and ReLU activations) and the training objective (a supervised classification loss on the aggregated representation plus an L2 regularization term on the importance scores). We will also include a short derivation showing how the learned scores reduce interference from non-target plugins and add a pseudocode snippet for the forward pass during inference. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical proposal with independent experimental validation

full rationale

The paper introduces DLC as a plugin-based extension to distillation-based continual learning, placing lightweight residual plugins in the classifier-proximal layer and adding a learned weighting unit to aggregate representations. Claims of 8% accuracy gains and 4% parameter overhead are presented as direct outcomes of experiments on large-scale benchmarks, not as reductions from equations or fitted parameters. No derivation chain, self-definitional constructs, or load-bearing self-citations appear in the abstract or method description. The construction is a concrete architectural proposal validated externally via benchmarks, remaining self-contained without circular reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond the high-level description of DLC components and weighting unit.

pith-pipeline@v0.9.0 · 5514 in / 1061 out tokens · 32576 ms · 2026-05-17T01:47:12.050844+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    Prototype-based continual learning with label-free replay buffer and cluster 8 preservation loss

    Agil Aghasanli, Yi Li, and Plamen Angelov. Prototype-based continual learning with label-free replay buffer and cluster 8 preservation loss. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6545–6554, 2025. 2

  2. [2]

    Uncertainty-based continual learning with adaptive regularization.Advances in neural information processing systems, 32, 2019

    Hongjoon Ahn, Sungmin Cha, Donggyu Lee, and Taesup Moon. Uncertainty-based continual learning with adaptive regularization.Advances in neural information processing systems, 32, 2019. 1

  3. [3]

    Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024

    Ang Bian, Wei Li, Hangjie Yuan, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng, et al. Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024. 2

  4. [4]

    Large-margin contrastive learning with distance polarization regularizer

    Shuo Chen, Gang Niu, Chen Gong, Jun Li, Jian Yang, and Masashi Sugiyama. Large-margin contrastive learning with distance polarization regularizer. InInternational Confer- ence on Machine Learning, pages 1673–1683. PMLR, 2021. 1

  5. [5]

    Learning contrastive embedding in low- dimensional space.Advances in Neural Information Pro- cessing Systems, 35:6345–6357, 2022

    Shuo Chen, Chen Gong, Jun Li, Jian Yang, Gang Niu, and Masashi Sugiyama. Learning contrastive embedding in low- dimensional space.Advances in Neural Information Pro- cessing Systems, 35:6345–6357, 2022. 1

  6. [6]

    AutoAugment: Learning Augmentation Policies from Data

    Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasude- van, and Quoc V Le. Autoaugment: Learning augmentation policies from data.arXiv preprint arXiv:1805.09501, 2018. 8

  7. [7]

    Continual pro- totype evolution: Learning online from non-stationary data streams

    Matthias De Lange and Tinne Tuytelaars. Continual pro- totype evolution: Learning online from non-stationary data streams. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 8250–8259, 2021. 1

  8. [8]

    A continual learning survey: Defying for- getting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021

    Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ale ˇs Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying for- getting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021. 1, 3, 6

  9. [9]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6

  10. [10]

    Maintaining fairness in logit-based knowledge distillation for class-incremental learning

    Zijian Gao, Shanhao Han, Xingxing Zhang, Kele Xu, Du- lan Zhou, Xinjun Mao, Yong Dou, and Huaimin Wang. Maintaining fairness in logit-based knowledge distillation for class-incremental learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 16763–16771,

  11. [11]

    A survey on ensemble learning for data stream classification.ACM Computing Surveys (CSUR), 50(2):1–36, 2017

    Heitor Murilo Gomes, Jean Paul Barddal, Fabr ´ıcio Enem- breck, and Albert Bifet. A survey on ensemble learning for data stream classification.ACM Computing Surveys (CSUR), 50(2):1–36, 2017. 1

  12. [12]

    Gradient reweighting: Towards imbalanced class-incremental learning

    Jiangpeng He. Gradient reweighting: Towards imbalanced class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16668–16677, 2024. 2, 6

  13. [13]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 5

  14. [14]

    Multimodal human– computer interaction: A survey.Computer vision and image understanding, 108(1-2):116–134, 2007

    Alejandro Jaimes and Nicu Sebe. Multimodal human– computer interaction: A survey.Computer vision and image understanding, 108(1-2):116–134, 2007. 1

  15. [15]

    Learning multiple layers of features from tiny images.Technical report, 2009

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.Technical report, 2009. 6

  16. [16]

    Re-fed+: A better replay strategy for federated incre- mental learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Yichen Li, Haozhao Wang, Yining Qi, Wei Liu, and Ruixuan Li. Re-fed+: A better replay strategy for federated incre- mental learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2

  17. [17]

    Loss decoupling for task- agnostic continual learning.Advances in Neural Information Processing Systems, 36:11151–11167, 2024

    Yan-Shuo Liang and Wu-Jun Li. Loss decoupling for task- agnostic continual learning.Advances in Neural Information Processing Systems, 36:11151–11167, 2024. 1

  18. [18]

    Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017

    David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017. 1

  19. [19]

    Piggy- back: Adapting a single network to multiple tasks by learn- ing to mask weights

    Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggy- back: Adapting a single network to multiple tasks by learn- ing to mask weights. InProceedings of the European con- ference on computer vision (ECCV), pages 67–82, 2018. 1

  20. [20]

    Rethinking momentum knowledge distillation in online continual learning

    Nicolas Michel, Maorong Wang, Ling Xiao, and Toshihiko Yamasaki. Rethinking momentum knowledge distillation in online continual learning. InInternational Conference on Machine Learning, pages 35607–35622. PMLR, 2024. 3, 6

  21. [21]

    Variational Continual Learning

    Cuong V Nguyen, Yingzhen Li, Thang D Bui, and Richard E Turner. Variational continual learning.arXiv preprint arXiv:1710.10628, 2017. 1

  22. [22]

    Federated class-incremental learning: A hybrid approach us- ing latent exemplars and data-free techniques to address lo- cal and global forgetting.arXiv preprint arXiv:2501.15356,

    Milad Khademi Nori, Il-Min Kim, and Guanghui Wang. Federated class-incremental learning: A hybrid approach us- ing latent exemplars and data-free techniques to address lo- cal and global forgetting.arXiv preprint arXiv:2501.15356,

  23. [23]

    icarl: Incremental classifier and representation learning

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017. 1, 3, 6

  24. [24]

    Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019

    David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lil- licrap, and Gregory Wayne. Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019. 1

  25. [25]

    Overcoming catastrophic forgetting with hard attention to the task

    Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. InInternational conference on machine learning, pages 4548–4557. PMLR, 2018. 1

  26. [26]

    On learning the geodesic path for incremental learning

    Christian Simon, Piotr Koniusz, and Mehrtash Harandi. On learning the geodesic path for incremental learning. InPro- ceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 1591–1600, 2021. 1

  27. [27]

    Mos: Model surgery for pre- trained model-based class-incremental learning

    Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao, Le Gan, De- Chuan Zhan, and Han-Jia Ye. Mos: Model surgery for pre- trained model-based class-incremental learning. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 20699–20707, 2025. 2

  28. [28]

    Matching networks for one shot learning.Ad- vances in neural information processing systems, 29, 2016

    Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning.Ad- vances in neural information processing systems, 29, 2016. 6

  29. [29]

    Foster: Feature boosting and compression for class- 9 incremental learning

    Fu-Yun Wang, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Foster: Feature boosting and compression for class- 9 incremental learning. InEuropean conference on computer vision, pages 398–414. Springer, 2022. 8

  30. [30]

    Improving plasticity in online continual learning via collaborative learning

    Maorong Wang, Nicolas Michel, Ling Xiao, and Toshihiko Yamasaki. Improving plasticity in online continual learning via collaborative learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23460–23469, 2024. 3

  31. [31]

    Large scale incre- mental learning

    Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale incre- mental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382,

  32. [32]

    Reinforced continual learning.Ad- vances in neural information processing systems, 31, 2018

    Ju Xu and Zhanxing Zhu. Reinforced continual learning.Ad- vances in neural information processing systems, 31, 2018. 1

  33. [33]

    Der: Dy- namically expandable representation for class incremental learning

    Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dy- namically expandable representation for class incremental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3014–3023,

  34. [34]

    Entaugment: Entropy-driven adaptive data augmentation framework for image classification

    Suorong Yang, Furao Shen, and Jian Zhao. Entaugment: Entropy-driven adaptive data augmentation framework for image classification. InEuropean Conference on Computer Vision, pages 197–214. Springer, 2024. 1

  35. [35]

    Supervised contrastive learn- ing with prototype distillation for data incremental learning

    Suorong Yang, Tianyue Zhang, Zhiming Xu, Peijia Li, Baile Xu, Furao Shen, and Jian Zhao. Supervised contrastive learn- ing with prototype distillation for data incremental learning. Neural Networks, page 107651, 2025. 1

  36. [36]

    Learn- ing multiple local metrics: Global consideration helps.IEEE transactions on pattern analysis and machine intelligence, 42(7):1698–1712, 2019

    Han-Jia Ye, De-Chuan Zhan, Nan Li, and Yuan Jiang. Learn- ing multiple local metrics: Global consideration helps.IEEE transactions on pattern analysis and machine intelligence, 42(7):1698–1712, 2019. 1

  37. [37]

    Jiaxing Zeng, Yifeng Tan, Lina Yang, Siwei Zhang, and Lianhui Liang. Terrasap: Spatially aware prompt-based framework for few-shot class-incremental learning in remote sensing image classification.IEEE Journal of Selected Top- ics in Applied Earth Observations and Remote Sensing, 19: 3143–3156, 2025. 3

  38. [38]

    Maintaining discrimination and fairness in class incremental learning

    Bowen Zhao, Xi Xiao, Guojun Gan, Bin Zhang, and Shu- Tao Xia. Maintaining discrimination and fairness in class incremental learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13208–13217, 2020. 2, 3, 6

  39. [39]

    Task-agnostic guided feature expansion for class- incremental learning

    Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Task-agnostic guided feature expansion for class- incremental learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10099–10109,

  40. [40]

    A model or 603 exemplars: Towards memory-efficient class-incremental learning.arXiv preprint arXiv:2205.13218, 2022

    Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De- Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning.arXiv preprint arXiv:2205.13218, 2022. 1, 2, 7, 8

  41. [41]

    Pycil: a python toolbox for class-incremental learn- ing, 2023

    Da-Wei Zhou, Fu-Yun Wang, Han-Jia Ye, and De-Chuan Zhan. Pycil: a python toolbox for class-incremental learn- ing, 2023. 6

  42. [42]

    Expandable subspace ensemble for pre-trained model- based class-incremental learning

    Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, and De-Chuan Zhan. Expandable subspace ensemble for pre-trained model- based class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23554–23564, 2024. 2

  43. [43]

    Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 7 10 Appendix A. Dataset Description CIFAR-100CIFAR-100 (Canadian Institute for Advanced Research 100-class dataset) is a widely used standard benchmark dataset in t...