arxiv: 2512.03537 · v3 · submitted 2025-12-03 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins

Zhiming Xu , Baile Xu , Jian Zhao , Furao Shen , Suorong Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-17 01:47 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords continual learningknowledge distillationlightweight pluginsstability-plasticity dilemmaclassifier-proximal layerresidual correctionparameter efficiency

0 comments

The pith

Distillation-aware Lightweight Components add small residual plugins near the classifier to improve accuracy in continual learning with minimal extra parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Distillation-aware Lightweight Components to ease the stability-plasticity tension in distillation-based continual learning. Small residual plugins sit in the classifier-proximal layer of the feature extractor so they can adjust semantic outputs for new tasks without rewriting earlier feature layers. A separate lightweight weighting unit scores and combines the different plugin outputs at inference time. On large benchmarks the approach raises accuracy while adding only a small fraction of the original backbone parameters and remains compatible with other continual-learning add-ons.

Core claim

DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction for better classification accuracy while minimizing disruption to the overall feature extraction process. During inference, plugin-enhanced representations are aggregated to produce classification predictions using a learned weighting unit that assigns importance scores to different plugins, delivering up to 8 percent accuracy gain on large-scale benchmarks with only a 4 percent increase in backbone parameters.

What carries the argument

Lightweight residual plugins inserted at the classifier-proximal layer of the shared feature extractor together with a learned weighting unit that combines their outputs.

If this is right

Distillation methods can now trade a modest parameter budget for higher final accuracy without changing the core training objectives.
The same plugin pattern can be stacked with existing plug-and-play continual-learning modules to obtain additive improvements.
Inference cost stays close to the original backbone because only a few extra lightweight modules are evaluated.
Storage overhead remains low since no separate model copies or large replay buffers are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same classifier-proximal correction idea could be tested in other settings where a frozen backbone must adapt to shifting output distributions.
If the weighting unit learns to ignore outdated plugins, the method might naturally support longer task sequences without explicit forgetting mitigation.
Measuring how much of the accuracy lift comes from the residual correction versus the weighting unit would help isolate the most effective part of the design.

Load-bearing premise

The assumption that residual corrections applied only at the classifier-proximal layer can improve semantic accuracy without disturbing the earlier feature extraction that distillation has already stabilized.

What would settle it

A controlled experiment on the same large-scale benchmarks in which the plugins and weighting unit are removed yet accuracy remains unchanged or higher would show the claimed gains do not come from the proposed components.

Figures

Figures reproduced from arXiv: 2512.03537 by Baile Xu, Furao Shen, Jian Zhao, Suorong Yang, Zhiming Xu.

**Figure 2.** Figure 2: The proposed DLC framework. Left: Training. When a new task t arrives, a dedicated plugin set Lt is created. Then sequentially train the feature extractor and Lt. Right: Test. The feature extractor sequentially loads all task plugins Lt to produce enhanced representations, which are concatenated, weighted by a gating unit, and classified. where layer inputs remain largely stationary. Here, we revisit it b… view at source ↗

**Figure 3.** Figure 3: Confusion matrix heatmaps of iCaRL with and w/o DLC [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of AT with and without the weighting unit for DLC-enhanced methods on CIFAR-100. weighting unit, as shown in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Comparison of inference FLOPs and per-sample infer [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Continual learning requires models to learn continuously while preserving prior knowledge under evolving data streams. Distillation-based methods are appealing for retaining past knowledge in a shared single-model framework with low storage overhead. However, they remain constrained by the stability-plasticity dilemma: knowledge acquisition and preservation are still optimized through coupled objectives, and existing enhancement methods do not alter this underlying bottleneck. To address this issue, we propose a plugin extension paradigm termed Distillation-aware Lightweight Components (DLC) for distillation-based CL. DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction for better classification accuracy while minimizing disruption to the overall feature extraction process. During inference, plugin-enhanced representations are aggregated to produce classification predictions. To mitigate interference from non-target plugins, we further introduce a lightweight weighting unit that learns to assign importance scores to different plugin-enhanced representations. DLC could deliver a significant 8% accuracy gain on large-scale benchmarks while introducing only a 4% increase in backbone parameters, highlighting its exceptional efficiency. Moreover, DLC is compatible with other plug-and-play CL enhancements and delivers additional gains when combined with them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds lightweight residual plugins near the classifier in distillation CL to loosen the stability-plasticity tradeoff with low overhead, but the headline gains rest on experiments whose details are not visible here.

read the letter

The main point is that the authors introduce Distillation-aware Lightweight Components as a plugin layer for distillation-based continual learning. They place small residual corrections in the classifier-proximal stage of the feature extractor and add a learned weighting unit to combine the outputs. The claim is that this gives semantic-level fixes for accuracy while leaving the rest of the backbone largely alone, which keeps parameter growth to about 4 percent and storage low.

Referee Report

2 major / 2 minor

Summary. The paper proposes Distillation-aware Lightweight Components (DLC) as a plugin extension for distillation-based continual learning. DLC inserts lightweight residual plugins into the classifier-proximal layer of the base feature extractor to perform semantic-level residual correction while minimizing disruption to feature extraction. During inference, a learned lightweight weighting unit aggregates the plugin-enhanced representations by assigning importance scores to mitigate interference from non-target plugins. The authors claim this yields an 8% accuracy improvement on large-scale benchmarks at the cost of only a 4% increase in backbone parameters and is compatible with other plug-and-play continual-learning enhancements.

Significance. If the empirical gains are confirmed with full experimental details, the work could advance distillation-based continual learning by providing a modular, low-overhead mechanism that decouples plasticity from stability through targeted residual corrections at the classifier-proximal stage. The reported efficiency (8% accuracy for 4% parameters) and composability with existing methods would be practically relevant strengths.

major comments (2)

[Abstract] Abstract: the headline claim of an 8% accuracy gain on large-scale benchmarks with a 4% parameter increase is presented without naming the specific datasets, task counts, baselines, or error bars; this information is load-bearing for the central empirical result and must be supplied with the full experimental protocol.
[Method] Method description: the architecture and training objective of the lightweight weighting unit that learns importance scores are not specified in sufficient detail to verify that it adequately mitigates interference from non-target plugins, which is a key assumption underlying the reported gains.

minor comments (2)

[Abstract] Abstract: the wording 'DLC could deliver' is conditional; replace with a direct statement of the observed experimental outcome.
Ensure all notation for residual plugins and weighting scores is introduced consistently and referenced to the corresponding equations or figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that greater specificity in the abstract and additional details on the weighting unit will improve clarity and verifiability. We will revise the manuscript accordingly to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of an 8% accuracy gain on large-scale benchmarks with a 4% parameter increase is presented without naming the specific datasets, task counts, baselines, or error bars; this information is load-bearing for the central empirical result and must be supplied with the full experimental protocol.

Authors: We agree that the abstract would benefit from more concrete details to support the central claim. In the revised manuscript we will update the abstract to explicitly name the primary large-scale benchmarks (CIFAR-100 split into 10 tasks and ImageNet-100 split into 10 tasks), the main baselines (LwF, iCaRL, and DER++), and note that all reported gains include standard deviations over three random seeds. The complete experimental protocol, including hyper-parameters, memory budgets, and evaluation metrics, is already provided in Section 4; we will add a forward reference in the abstract to this section. revision: yes
Referee: [Method] Method description: the architecture and training objective of the lightweight weighting unit that learns importance scores are not specified in sufficient detail to verify that it adequately mitigates interference from non-target plugins, which is a key assumption underlying the reported gains.

Authors: We acknowledge that the current description of the lightweight weighting unit is insufficiently detailed. In the revised manuscript we will expand the relevant subsection to fully specify the architecture (a two-layer MLP with hidden dimension 64 and ReLU activations) and the training objective (a supervised classification loss on the aggregated representation plus an L2 regularization term on the importance scores). We will also include a short derivation showing how the learned scores reduce interference from non-target plugins and add a pseudocode snippet for the forward pass during inference. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical proposal with independent experimental validation

full rationale

The paper introduces DLC as a plugin-based extension to distillation-based continual learning, placing lightweight residual plugins in the classifier-proximal layer and adding a learned weighting unit to aggregate representations. Claims of 8% accuracy gains and 4% parameter overhead are presented as direct outcomes of experiments on large-scale benchmarks, not as reductions from equations or fitted parameters. No derivation chain, self-definitional constructs, or load-bearing self-citations appear in the abstract or method description. The construction is a concrete architectural proposal validated externally via benchmarks, remaining self-contained without circular reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond the high-level description of DLC components and weighting unit.

pith-pipeline@v0.9.0 · 5514 in / 1061 out tokens · 32576 ms · 2026-05-17T01:47:12.050844+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DLC deploys lightweight residual plugins into the base feature extractor's classifier-proximal layer, enabling semantic-level residual correction... lightweight weighting unit that learns to assign importance scores
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a plug-in extension paradigm termed Deployment of LoRA Components (DLC) to enhance them

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Prototype-based continual learning with label-free replay buffer and cluster 8 preservation loss

Agil Aghasanli, Yi Li, and Plamen Angelov. Prototype-based continual learning with label-free replay buffer and cluster 8 preservation loss. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6545–6554, 2025. 2

work page 2025
[2]

Uncertainty-based continual learning with adaptive regularization.Advances in neural information processing systems, 32, 2019

Hongjoon Ahn, Sungmin Cha, Donggyu Lee, and Taesup Moon. Uncertainty-based continual learning with adaptive regularization.Advances in neural information processing systems, 32, 2019. 1

work page 2019
[3]

Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024

Ang Bian, Wei Li, Hangjie Yuan, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng, et al. Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024. 2

work page 2024
[4]

Large-margin contrastive learning with distance polarization regularizer

Shuo Chen, Gang Niu, Chen Gong, Jun Li, Jian Yang, and Masashi Sugiyama. Large-margin contrastive learning with distance polarization regularizer. InInternational Confer- ence on Machine Learning, pages 1673–1683. PMLR, 2021. 1

work page 2021
[5]

Learning contrastive embedding in low- dimensional space.Advances in Neural Information Pro- cessing Systems, 35:6345–6357, 2022

Shuo Chen, Chen Gong, Jun Li, Jian Yang, Gang Niu, and Masashi Sugiyama. Learning contrastive embedding in low- dimensional space.Advances in Neural Information Pro- cessing Systems, 35:6345–6357, 2022. 1

work page 2022
[6]

AutoAugment: Learning Augmentation Policies from Data

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasude- van, and Quoc V Le. Autoaugment: Learning augmentation policies from data.arXiv preprint arXiv:1805.09501, 2018. 8

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Continual pro- totype evolution: Learning online from non-stationary data streams

Matthias De Lange and Tinne Tuytelaars. Continual pro- totype evolution: Learning online from non-stationary data streams. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 8250–8259, 2021. 1

work page 2021
[8]

A continual learning survey: Defying for- getting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ale ˇs Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying for- getting in classification tasks.IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021. 1, 3, 6

work page 2021
[9]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6

work page 2009
[10]

Maintaining fairness in logit-based knowledge distillation for class-incremental learning

Zijian Gao, Shanhao Han, Xingxing Zhang, Kele Xu, Du- lan Zhou, Xinjun Mao, Yong Dou, and Huaimin Wang. Maintaining fairness in logit-based knowledge distillation for class-incremental learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 16763–16771,

work page
[11]

A survey on ensemble learning for data stream classification.ACM Computing Surveys (CSUR), 50(2):1–36, 2017

Heitor Murilo Gomes, Jean Paul Barddal, Fabr ´ıcio Enem- breck, and Albert Bifet. A survey on ensemble learning for data stream classification.ACM Computing Surveys (CSUR), 50(2):1–36, 2017. 1

work page 2017
[12]

Gradient reweighting: Towards imbalanced class-incremental learning

Jiangpeng He. Gradient reweighting: Towards imbalanced class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16668–16677, 2024. 2, 6

work page 2024
[13]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 5

work page 2022
[14]

Multimodal human– computer interaction: A survey.Computer vision and image understanding, 108(1-2):116–134, 2007

Alejandro Jaimes and Nicu Sebe. Multimodal human– computer interaction: A survey.Computer vision and image understanding, 108(1-2):116–134, 2007. 1

work page 2007
[15]

Learning multiple layers of features from tiny images.Technical report, 2009

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.Technical report, 2009. 6

work page 2009
[16]

Re-fed+: A better replay strategy for federated incre- mental learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yichen Li, Haozhao Wang, Yining Qi, Wei Liu, and Ruixuan Li. Re-fed+: A better replay strategy for federated incre- mental learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2

work page 2025
[17]

Loss decoupling for task- agnostic continual learning.Advances in Neural Information Processing Systems, 36:11151–11167, 2024

Yan-Shuo Liang and Wu-Jun Li. Loss decoupling for task- agnostic continual learning.Advances in Neural Information Processing Systems, 36:11151–11167, 2024. 1

work page 2024
[18]

Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017

David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017. 1

work page 2017
[19]

Piggy- back: Adapting a single network to multiple tasks by learn- ing to mask weights

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggy- back: Adapting a single network to multiple tasks by learn- ing to mask weights. InProceedings of the European con- ference on computer vision (ECCV), pages 67–82, 2018. 1

work page 2018
[20]

Rethinking momentum knowledge distillation in online continual learning

Nicolas Michel, Maorong Wang, Ling Xiao, and Toshihiko Yamasaki. Rethinking momentum knowledge distillation in online continual learning. InInternational Conference on Machine Learning, pages 35607–35622. PMLR, 2024. 3, 6

work page 2024
[21]

Variational Continual Learning

Cuong V Nguyen, Yingzhen Li, Thang D Bui, and Richard E Turner. Variational continual learning.arXiv preprint arXiv:1710.10628, 2017. 1

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Federated class-incremental learning: A hybrid approach us- ing latent exemplars and data-free techniques to address lo- cal and global forgetting.arXiv preprint arXiv:2501.15356,

Milad Khademi Nori, Il-Min Kim, and Guanghui Wang. Federated class-incremental learning: A hybrid approach us- ing latent exemplars and data-free techniques to address lo- cal and global forgetting.arXiv preprint arXiv:2501.15356,

work page arXiv
[23]

icarl: Incremental classifier and representation learning

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017. 1, 3, 6

work page 2001
[24]

Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lil- licrap, and Gregory Wayne. Experience replay for continual learning.Advances in neural information processing sys- tems, 32, 2019. 1

work page 2019
[25]

Overcoming catastrophic forgetting with hard attention to the task

Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. InInternational conference on machine learning, pages 4548–4557. PMLR, 2018. 1

work page 2018
[26]

On learning the geodesic path for incremental learning

Christian Simon, Piotr Koniusz, and Mehrtash Harandi. On learning the geodesic path for incremental learning. InPro- ceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 1591–1600, 2021. 1

work page 2021
[27]

Mos: Model surgery for pre- trained model-based class-incremental learning

Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao, Le Gan, De- Chuan Zhan, and Han-Jia Ye. Mos: Model surgery for pre- trained model-based class-incremental learning. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 20699–20707, 2025. 2

work page 2025
[28]

Matching networks for one shot learning.Ad- vances in neural information processing systems, 29, 2016

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning.Ad- vances in neural information processing systems, 29, 2016. 6

work page 2016
[29]

Foster: Feature boosting and compression for class- 9 incremental learning

Fu-Yun Wang, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Foster: Feature boosting and compression for class- 9 incremental learning. InEuropean conference on computer vision, pages 398–414. Springer, 2022. 8

work page 2022
[30]

Improving plasticity in online continual learning via collaborative learning

Maorong Wang, Nicolas Michel, Ling Xiao, and Toshihiko Yamasaki. Improving plasticity in online continual learning via collaborative learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23460–23469, 2024. 3

work page 2024
[31]

Large scale incre- mental learning

Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale incre- mental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 374–382,

work page
[32]

Reinforced continual learning.Ad- vances in neural information processing systems, 31, 2018

Ju Xu and Zhanxing Zhu. Reinforced continual learning.Ad- vances in neural information processing systems, 31, 2018. 1

work page 2018
[33]

Der: Dy- namically expandable representation for class incremental learning

Shipeng Yan, Jiangwei Xie, and Xuming He. Der: Dy- namically expandable representation for class incremental learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3014–3023,

work page
[34]

Entaugment: Entropy-driven adaptive data augmentation framework for image classification

Suorong Yang, Furao Shen, and Jian Zhao. Entaugment: Entropy-driven adaptive data augmentation framework for image classification. InEuropean Conference on Computer Vision, pages 197–214. Springer, 2024. 1

work page 2024
[35]

Supervised contrastive learn- ing with prototype distillation for data incremental learning

Suorong Yang, Tianyue Zhang, Zhiming Xu, Peijia Li, Baile Xu, Furao Shen, and Jian Zhao. Supervised contrastive learn- ing with prototype distillation for data incremental learning. Neural Networks, page 107651, 2025. 1

work page 2025
[36]

Learn- ing multiple local metrics: Global consideration helps.IEEE transactions on pattern analysis and machine intelligence, 42(7):1698–1712, 2019

Han-Jia Ye, De-Chuan Zhan, Nan Li, and Yuan Jiang. Learn- ing multiple local metrics: Global consideration helps.IEEE transactions on pattern analysis and machine intelligence, 42(7):1698–1712, 2019. 1

work page 2019
[37]

Jiaxing Zeng, Yifeng Tan, Lina Yang, Siwei Zhang, and Lianhui Liang. Terrasap: Spatially aware prompt-based framework for few-shot class-incremental learning in remote sensing image classification.IEEE Journal of Selected Top- ics in Applied Earth Observations and Remote Sensing, 19: 3143–3156, 2025. 3

work page 2025
[38]

Maintaining discrimination and fairness in class incremental learning

Bowen Zhao, Xi Xiao, Guojun Gan, Bin Zhang, and Shu- Tao Xia. Maintaining discrimination and fairness in class incremental learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13208–13217, 2020. 2, 3, 6

work page 2020
[39]

Task-agnostic guided feature expansion for class- incremental learning

Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Task-agnostic guided feature expansion for class- incremental learning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10099–10109,

work page
[40]

A model or 603 exemplars: Towards memory-efficient class-incremental learning.arXiv preprint arXiv:2205.13218, 2022

Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De- Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning.arXiv preprint arXiv:2205.13218, 2022. 1, 2, 7, 8

work page arXiv 2022
[41]

Pycil: a python toolbox for class-incremental learn- ing, 2023

Da-Wei Zhou, Fu-Yun Wang, Han-Jia Ye, and De-Chuan Zhan. Pycil: a python toolbox for class-incremental learn- ing, 2023. 6

work page 2023
[42]

Expandable subspace ensemble for pre-trained model- based class-incremental learning

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, and De-Chuan Zhan. Expandable subspace ensemble for pre-trained model- based class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23554–23564, 2024. 2

work page 2024
[43]

Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 7 10 Appendix A. Dataset Description CIFAR-100CIFAR-100 (Canadian Institute for Advanced Research 100-class dataset) is a widely used standard benchmark dataset in t...

work page 2024