pith. sign in

arxiv: 2503.04638 · v3 · submitted 2025-03-06 · 💻 cs.LG

No Forgetting Learning: Buffer-free Continual Learning Classification

Pith reviewed 2026-05-23 00:47 UTC · model grok-4.3

classification 💻 cs.LG
keywords continual learningbuffer-freeclass-incremental learningtask-incremental learningknowledge distillationoverparameterized networksno forgetting
0
0 comments X

The pith

A buffer-free continual learning method matches memory-based accuracy on sequential image tasks by freezing and distilling overparameterized networks instead of storing examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces No Forgetting Learning, a framework for class- and task-incremental learning that eliminates the need for replay buffers. It decomposes the network into a shared backbone and task-specific heads, then applies stepwise freezing to isolate new capabilities while using knowledge distillation to protect prior performance. An extension called NFL+ adds an under-complete auto-encoder to preserve features and correct imbalance bias, and NFL+LoRA adapts the approach to Vision Transformers with low-rank updates. Tests on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 with up to 50 tasks show the method beats other buffer-free approaches and equals buffer-based ones while using far less memory. The work also defines a Plasticity-Stability score to assess the trade-off between learning new tasks and retaining old ones.

Core claim

Overparameterized networks contain enough redundancy that a decomposed architecture can support new tasks through stepwise freezing of new heads, distillation-based adaptation of the shared backbone, and joint refinement under dual soft targets, allowing prior task performance to be retained without any stored exemplars.

What carries the argument

The stepwise freezing protocol that isolates new task capabilities in dedicated heads, adapts shared representations under knowledge distillation, and refines all components jointly with dual soft-target anchoring.

If this is right

  • The method outperforms all buffer-free baselines on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks.
  • Performance matches memory-based methods while using only 2.53 percent of their model size.
  • NFL+LoRA keeps backbone memory cost constant regardless of task count when applied to pre-trained Vision Transformers.
  • The Plasticity-Stability score offers a balanced metric for evaluating continual learning trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Eliminating the replay buffer removes a source of privacy risk in regulated continual learning settings.
  • The same redundancy-based isolation strategy could be tested on sequence models in natural language tasks where storage of prior examples is restricted.

Load-bearing premise

Overparameterized networks contain sufficient inherent redundancy to allow new task capabilities to be isolated via stepwise freezing and distillation without degrading prior task performance.

What would settle it

Running the NFL protocol on ImageNet-1000 split into 50 tasks and finding that accuracy on the earliest tasks falls well below the level maintained by any memory-based baseline.

Figures

Figures reproduced from arXiv: 2503.04638 by Mohammad Ali Vahedifar, Qi Zhang.

Figure 1
Figure 1. Figure 1: A Conceptual Illustration of NFL. high performance on the combined task T = Tt S Tt+1 formed by c classes in the class set C = Ct S Ct+1 classes, without access to the data and targets of the old dataset Dt(Xt, Yt). NFL follows a five-step process, summarized in Algorithm 1. In step one, we introduce the data samples Xt+1 to the trained NN1 to obtain the logits (i.e., the outputs of the NN1 before the soft… view at source ↗
Figure 2
Figure 2. Figure 2: ACC comparison for Class-IL using CIFAR-100 (10 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ACC comparison for Class-IL using TinyImageNet (20 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Memory size for ImageNet-1000 for different compari [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: ACC comparison for Class-IL using CIFAR-100 (5 in [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training time (h) of CIFAR-100, TinyImageNet and ImageNet-1000 Class-IL and Task-IL experiments for all methods, e.g., [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then applies a stepwise freezing protocol: new capabilities are first isolated, shared representations are adapted under knowledge distillation, and all components are jointly refined with dual soft-target anchoring. NFL+ augments this pipeline with an under-complete auto-encoder that preserves informative features from previous tasks and corrects the prediction bias caused by class imbalance. NFL+LoRA further extends the framework to pre-trained Vision Transformers by confining updates to a low-rank subspace with Fisher-weighted regularization, maintaining constant backbone memory cost regardless of the number of tasks. On CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks, NFL+ outperforms all buffer-free baselines and matches memory-based methods while requiring only 2.53\% of their model size. We also propose a Plasticity--Stability score for more balanced trade-off evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes No Forgetting Learning (NFL), a buffer-free continual learning method for class- and task-incremental settings. It decomposes the network into a shared backbone plus task-specific heads, applies stepwise freezing with knowledge distillation and dual soft-target anchoring, augments with an under-complete auto-encoder in NFL+ to handle feature preservation and bias, and extends to NFL+LoRA for ViTs. It reports that NFL+ outperforms buffer-free baselines and matches memory-based methods on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 (up to 50 tasks) while using 2.53% of their model size, and introduces a Plasticity-Stability score.

Significance. A buffer-free approach that matches memory-based performance on class-incremental benchmarks with constant memory cost would be significant for continual learning, especially in privacy-regulated domains. The exploitation of overparameterization redundancy and the new evaluation metric are potentially valuable contributions if the central claims hold.

major comments (2)
  1. [Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.
  2. [Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the class-incremental results across 50 tasks on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. This omission directly affects verifiability of the buffer-free class-incremental performance.
minor comments (2)
  1. [Abstract] Abstract: The claim of 'only 2.53% of their model size' should specify the exact baseline (e.g., total parameters of a memory-based method) and clarify whether this includes the auto-encoder overhead.
  2. The manuscript provides no implementation details, statistical significance tests, or ablation studies on the distillation and anchoring components, which limits independent verification of the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The points raised correctly identify a lack of explicit description regarding inference in the class-incremental setting, which requires clarification to support the claims made in the abstract and experiments. We address each comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.

    Authors: We agree that the architecture description using task-specific heads creates an inconsistency with the class-incremental claim in the abstract, as no alternative routing or merging mechanism is provided to enable inference without task identity. The manuscript does not describe such a mechanism, so the class-incremental benchmark results cannot be fully substantiated as presented. We will revise the abstract to accurately reflect the supported settings (primarily task-incremental with the described architecture) and clarify or adjust the class-incremental claims and reporting. revision: yes

  2. Referee: [Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the buffer-free class-incremental performance.

    Authors: The referee is correct: the method section provides no explicit mechanism for merging task-specific heads or for inference without task labels. This omission means the buffer-free class-incremental performance on the reported benchmarks cannot be verified from the current text. We will revise the method section to add a clear description of the inference procedure (or note its absence and revise the experimental claims if no such procedure was used). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces an empirical framework (NFL/NFL+) for buffer-free continual learning and reports benchmark results on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. No equations, derivations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed performance or property to a quantity defined by the method itself. All evaluations rest on external baselines and standard datasets, rendering the central claims self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger is minimal because the review uses only the abstract, which states the core premise but supplies no numerical parameters or additional axioms.

axioms (1)
  • domain assumption Overparameterized networks possess inherent redundancy that permits task isolation without catastrophic forgetting when using the described freezing and distillation protocol.
    This premise is invoked as the basis for avoiding a replay buffer.

pith-pipeline@v0.9.0 · 5756 in / 1312 out tokens · 34265 ms · 2026-05-23T00:47:36.603458+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Shift Detection and Adaptation for Network Intrusion Detection

    cs.CR 2025-08 unverdicted novelty 5.0

    NetSight continually detects distribution shifts in network intrusion data and adapts a supervised model using pseudo-labeling and knowledge distillation, achieving up to 11.72% F1 improvement over methods requiring m...

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Expert gate: Lifelong learning with a network of experts

    Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In CVPR, 2017. 2

  2. [2]

    Memory Aware Synapses: Learning what (not) to forget

    Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory Aware Synapses: Learning what (not) to forget . In ECCV, 2018. 1

  3. [3]

    Dark experience for general continual learning: a strong, simple baseline

    Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. In NIPS, pages 15920–15930, 2020. 1, 2, 6

  4. [4]

    End-to-end incre- mental learning

    Francisco M Castro, Manuel J Mar ´ın-Jim´enez, Nicol´as Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incre- mental learning. In ECCV, pages 233–248, 2018. 2

  5. [5]

    Dokania, Thalaiyasingam Ajan- than, and Philip H

    Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremen- tal learning: Understanding forgetting and intransigence. In ECCV, 2018. 5, 6

  6. [6]

    Learning without mem- orizing

    Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without mem- orizing. In CVPR, pages 5138–5146, 2019. 2

  7. [7]

    Podnet: Pooled outputs distil- lation for small-tasks incremental learning

    Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Podnet: Pooled outputs distil- lation for small-tasks incremental learning. In ECCV, pages 86–102, 2020. 2

  8. [8]

    DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion

    Arthur Douillard, Alexandre Ram ´e, Guillaume Couairon, and Matthieu Cord. DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion. InCVPR, pages 9285–9295, 2022. 1

  9. [9]

    Don't forget, there is more than forgetting: new metrics for Continual Learning

    Natalia D ´ıaz-Rodr´ıguez, Vincenzo Lomonaco, David Fil- liat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning, 2018. https://arxiv.org/abs/1810.13166. 6

  10. [10]

    Deep Learning

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. 4

  11. [11]

    Knowledge distillation: A survey

    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. IJCV, 129 (6):1789–1819, 2021. 2

  12. [12]

    Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification. In ICCV, 2015. 6

  13. [13]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,

  14. [14]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network, 2015. https://arxiv.org/abs/1503.02531. 2, 3

  15. [15]

    Learning a unified classifier incrementally via rebalancing

    Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. In CVPR, pages 831–839, 2019. 2

  16. [16]

    Memory-efficient incremental learning through feature adaptation

    Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, and Cordelia Schmid. Memory-efficient incremental learning through feature adaptation. In ECCV, pages 699–715, 2020. 2

  17. [17]

    Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell

    James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks. Proceedings of the National Academy of Sciences, 114(13)...

  18. [18]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 6

  19. [19]

    Imagenet classification with deep convolutional neural net- works

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, pages 1401–1476, 2012. 6

  20. [20]

    Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge, 2015. 6

  21. [21]

    Overcoming catastrophic forgetting with unlabeled data in the wild

    Kibok Lee, Kimin Lee, Jinwoo Shin, and Honglak Lee. Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV, pages 312–321, 2019. 2

  22. [22]

    Learning without forgetting

    Zhizhong Li and Derek Hoiem. Learning without forgetting. PAMI, 40(12):2935–2947, 2018. 1, 2, 6

  23. [23]

    L ´opez, and Andrew D

    Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Wei- jer, Antonio M. L ´opez, and Andrew D. Bagdanov. Ro- tate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In ICPR, pages 2262–2268, 2018. 1

  24. [24]

    RMM: rein- forced memory management for class-incremental learning

    Yaoyao Liu, Bernt Schiele, and Qianru Sun. RMM: rein- forced memory management for class-incremental learning. In NIPS, 2024. 2

  25. [25]

    Gradient episodic memory for continual learning

    David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017. 5, 6

  26. [26]

    Continual lifelong learning with neural networks: A review

    German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71,

  27. [27]

    Pf ¨ulb and A

    B. Pf ¨ulb and A. Gepperth. A comprehensive, application- oriented study of catastrophic forgetting in DNNs. In ICLR,

  28. [28]

    Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024

    Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, and Jun Liu. Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024. https://arxiv.org/abs/2109.11369. 1

  29. [29]

    Encoder-based lifelong learning

    Atoum Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder-based lifelong learning. In ICCV, pages 1320–1328, 2017. 1, 2

  30. [30]

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classi- fier and representation learning. InCVPR, pages 2001–2010,

  31. [31]

    Continual learning via sequential function-space variational inference

    Tim GJ Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, and Yarin Gal. Continual learning via sequential function-space variational inference. In ICML, pages 18871–18887, 2022. 2

  32. [32]

    Progressive Neural Networks

    Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Raz- van Pascanu, and Raia Hadsell. Progressive Neural Net- works, 2022. https://arxiv.org/abs/1606.04671. 1

  33. [33]

    Gradient pro- jection memory for continual learning

    Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient pro- jection memory for continual learning. In ICLR, 2021. 1

  34. [34]

    Exploring Example Influence in Continual Learning

    Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, and Liang Wan. Exploring Example Influence in Continual Learning. In NIPS, pages 27075–27086, 2022. 2

  35. [35]

    Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025

    Mohammad Ali Vahedifar, Qi Zhang, and Alexandros Iosifidis. Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025. https://doi.org/10.5281/zenodo.14631802. 1, 6

  36. [36]

    Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H

    Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, and Gido M. van de Ven. Conti...

  37. [37]

    A comprehensive survey of continual learning: theory, method and application

    Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: theory, method and application. PAMI, 46(8):5362–5383, 2024. 6

  38. [38]

    Con- tinual learning through retrieval and imagination

    Zhen Wang, Liu Liu, Yiqun Duan, and Dacheng Tao. Con- tinual learning through retrieval and imagination. In AAAI,

  39. [39]

    A comprehensive survey of forgetting in deep learning beyond continual learning, 2023

    Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning, 2023. https://arxiv.org/abs/2307.09218. 1, 2

  40. [40]

    Continual learning: A review of techniques, challenges and future directions

    Buddhi Wickramasinghe, Gobinda Saha, and Kaushik Roy. Continual learning: A review of techniques, challenges and future directions. TNNLS, pages 123–140, 2024. 6, 1

  41. [41]

    Memory replay gans: Learning to generate new categories without forgetting

    Chenshen Wu, Luis Herranz, Xialei Liu, Joost van de Weijer, and Bogdan Raducanu. Memory replay gans: Learning to generate new categories without forgetting. In NIPS, 2018. 2

  42. [42]

    Large scale in- cremental learning

    Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale in- cremental learning. In CVPR, 2019. 1, 2

  43. [43]

    Lifelong Learning with Dynamically Expandable Networks

    Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR, 2018. 1

  44. [44]

    Contin- ual learning of context-dependent processing in neural net- works

    Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. Contin- ual learning of context-dependent processing in neural net- works. Nature Machine Intelligence , 1(8):364–372, 2019. 1

  45. [45]

    Contin- ual learning through synaptic intelligence

    Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. In ICML, pages 3987–3995, 2017. 1

  46. [46]

    Lifelong gan: Continual learning for conditional image generation

    Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, and Greg Mori. Lifelong gan: Continual learning for conditional image generation. In ICCV, pages 2759– 2768, 2019. 2

  47. [47]

    Co- Transport for Class-Incremental Learning

    Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Co- Transport for Class-Incremental Learning. InACMMM, page 1645–1654, 2021. 2, 6

  48. [48]

    A model or 603 exemplars: Towards memory-efficient class-incremental learning

    Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De-Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In ICLR, 2023. 2, 6

  49. [49]

    Continual Learning with Pre-Trained Mod- els: A Survey

    Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual Learning with Pre-Trained Mod- els: A Survey. In IJCAI, pages 8363–8371, 2024. 1

  50. [50]

    Class-incremental learning: A survey

    Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey. PAMI, 46(12):9851–9873, 2024. 1, 6, 8 No Forgetting Learning: Memory-free Continual Learning Supplementary Material The supplementary material mainly contains additional ma- terials and experiments that cannot be reported due to the page...

  51. [51]

    The crucial aspect of Task-IL is that the model is provided with information about which task it is handling during training and testing

    Evaluation Protocol The two main experimental scenarios typically used to eval- uate the performance of methods are the following: • Task Incremental Learning (Task-IL): In Task-IL, the training data is divided into multiple tasks, each with a unique set of classes. The crucial aspect of Task-IL is that the model is provided with information about which t...

  52. [52]

    The re- sult is illustrated in Fig

    Additional Results We report Class-IL for the CIFAR-100 dataset, where all 100 classes were trained under the configuration of 20 tasks, with each corresponding to 5 incremental classes. The re- sult is illustrated in Fig. 6. This helps us understand how the number of tasks or classes affects performance. By compar- ing it to Fig. 2, we see that performan...