No Forgetting Learning: Buffer-free Continual Learning Classification

Mohammad Ali Vahedifar; Qi Zhang

arxiv: 2503.04638 · v3 · submitted 2025-03-06 · 💻 cs.LG

No Forgetting Learning: Buffer-free Continual Learning Classification

Mohammad Ali Vahedifar , Qi Zhang This is my paper

Pith reviewed 2026-05-23 00:47 UTC · model grok-4.3

classification 💻 cs.LG

keywords continual learningbuffer-freeclass-incremental learningtask-incremental learningknowledge distillationoverparameterized networksno forgetting

0 comments

The pith

A buffer-free continual learning method matches memory-based accuracy on sequential image tasks by freezing and distilling overparameterized networks instead of storing examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces No Forgetting Learning, a framework for class- and task-incremental learning that eliminates the need for replay buffers. It decomposes the network into a shared backbone and task-specific heads, then applies stepwise freezing to isolate new capabilities while using knowledge distillation to protect prior performance. An extension called NFL+ adds an under-complete auto-encoder to preserve features and correct imbalance bias, and NFL+LoRA adapts the approach to Vision Transformers with low-rank updates. Tests on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 with up to 50 tasks show the method beats other buffer-free approaches and equals buffer-based ones while using far less memory. The work also defines a Plasticity-Stability score to assess the trade-off between learning new tasks and retaining old ones.

Core claim

Overparameterized networks contain enough redundancy that a decomposed architecture can support new tasks through stepwise freezing of new heads, distillation-based adaptation of the shared backbone, and joint refinement under dual soft targets, allowing prior task performance to be retained without any stored exemplars.

What carries the argument

The stepwise freezing protocol that isolates new task capabilities in dedicated heads, adapts shared representations under knowledge distillation, and refines all components jointly with dual soft-target anchoring.

If this is right

The method outperforms all buffer-free baselines on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks.
Performance matches memory-based methods while using only 2.53 percent of their model size.
NFL+LoRA keeps backbone memory cost constant regardless of task count when applied to pre-trained Vision Transformers.
The Plasticity-Stability score offers a balanced metric for evaluating continual learning trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Eliminating the replay buffer removes a source of privacy risk in regulated continual learning settings.
The same redundancy-based isolation strategy could be tested on sequence models in natural language tasks where storage of prior examples is restricted.

Load-bearing premise

Overparameterized networks contain sufficient inherent redundancy to allow new task capabilities to be isolated via stepwise freezing and distillation without degrading prior task performance.

What would settle it

Running the NFL protocol on ImageNet-1000 split into 50 tasks and finding that accuracy on the earliest tasks falls well below the level maintained by any memory-based baseline.

Figures

Figures reproduced from arXiv: 2503.04638 by Mohammad Ali Vahedifar, Qi Zhang.

**Figure 1.** Figure 1: A Conceptual Illustration of NFL. high performance on the combined task T = Tt S Tt+1 formed by c classes in the class set C = Ct S Ct+1 classes, without access to the data and targets of the old dataset Dt(Xt, Yt). NFL follows a five-step process, summarized in Algorithm 1. In step one, we introduce the data samples Xt+1 to the trained NN1 to obtain the logits (i.e., the outputs of the NN1 before the soft… view at source ↗

**Figure 2.** Figure 2: ACC comparison for Class-IL using CIFAR-100 (10 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: ACC comparison for Class-IL using TinyImageNet (20 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Memory size for ImageNet-1000 for different compari [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: ACC comparison for Class-IL using CIFAR-100 (5 in [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Training time (h) of CIFAR-100, TinyImageNet and ImageNet-1000 Class-IL and Task-IL experiments for all methods, e.g., [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then applies a stepwise freezing protocol: new capabilities are first isolated, shared representations are adapted under knowledge distillation, and all components are jointly refined with dual soft-target anchoring. NFL+ augments this pipeline with an under-complete auto-encoder that preserves informative features from previous tasks and corrects the prediction bias caused by class imbalance. NFL+LoRA further extends the framework to pre-trained Vision Transformers by confining updates to a low-rank subspace with Fisher-weighted regularization, maintaining constant backbone memory cost regardless of the number of tasks. On CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks, NFL+ outperforms all buffer-free baselines and matches memory-based methods while requiring only 2.53\% of their model size. We also propose a Plasticity--Stability score for more balanced trade-off evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The class-incremental results are probably invalid because the task-specific heads require task identity at test time.

read the letter

The main takeaway is that NFL introduces a buffer-free pipeline with stepwise freezing, dual soft-target anchoring, and an under-complete auto-encoder for bias correction, plus a LoRA variant for ViTs. This specific combination of decomposition, isolation of new capabilities, and joint refinement is not in the prior buffer-free work cited in the abstract, and the reported numbers on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 up to 50 tasks show it matching memory-based methods at 2.53% of their size while beating other buffer-free baselines. The Plasticity-Stability score is a minor but useful addition for evaluation. The paper does a solid job laying out how overparameterized nets can be exploited without replay buffers or privacy issues. The central soft spot is the architecture itself. Task-specific heads mean inference needs the current task ID to pick the right head, which is standard for task-incremental but incompatible with class-incremental where no task label is available. The abstract claims both regimes, yet nothing in the description supplies a merging or routing mechanism that works without task identity. If the class-incremental numbers were run with task IDs supplied, they do not support that part of the claim. The redundancy assumption is plausible but secondary to this mismatch. This is squarely for the continual learning subfield. A reader working on memory-efficient CL would find the protocol details worth examining if the full paper includes ablations and code. It deserves peer review because the method is concrete and the benchmarks are standard, even though the class-incremental portion needs clarification or correction.

Referee Report

2 major / 2 minor

Summary. The paper proposes No Forgetting Learning (NFL), a buffer-free continual learning method for class- and task-incremental settings. It decomposes the network into a shared backbone plus task-specific heads, applies stepwise freezing with knowledge distillation and dual soft-target anchoring, augments with an under-complete auto-encoder in NFL+ to handle feature preservation and bias, and extends to NFL+LoRA for ViTs. It reports that NFL+ outperforms buffer-free baselines and matches memory-based methods on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 (up to 50 tasks) while using 2.53% of their model size, and introduces a Plasticity-Stability score.

Significance. A buffer-free approach that matches memory-based performance on class-incremental benchmarks with constant memory cost would be significant for continual learning, especially in privacy-regulated domains. The exploitation of overparameterization redundancy and the new evaluation metric are potentially valuable contributions if the central claims hold.

major comments (2)

[Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.
[Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the class-incremental results across 50 tasks on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. This omission directly affects verifiability of the buffer-free class-incremental performance.

minor comments (2)

[Abstract] Abstract: The claim of 'only 2.53% of their model size' should specify the exact baseline (e.g., total parameters of a memory-based method) and clarify whether this includes the auto-encoder overhead.
The manuscript provides no implementation details, statistical significance tests, or ablation studies on the distillation and anchoring components, which limits independent verification of the empirical claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The points raised correctly identify a lack of explicit description regarding inference in the class-incremental setting, which requires clarification to support the claims made in the abstract and experiments. We address each comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.

Authors: We agree that the architecture description using task-specific heads creates an inconsistency with the class-incremental claim in the abstract, as no alternative routing or merging mechanism is provided to enable inference without task identity. The manuscript does not describe such a mechanism, so the class-incremental benchmark results cannot be fully substantiated as presented. We will revise the abstract to accurately reflect the supported settings (primarily task-incremental with the described architecture) and clarify or adjust the class-incremental claims and reporting. revision: yes
Referee: [Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the buffer-free class-incremental performance.

Authors: The referee is correct: the method section provides no explicit mechanism for merging task-specific heads or for inference without task labels. This omission means the buffer-free class-incremental performance on the reported benchmarks cannot be verified from the current text. We will revise the method section to add a clear description of the inference procedure (or note its absence and revise the experimental claims if no such procedure was used). revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces an empirical framework (NFL/NFL+) for buffer-free continual learning and reports benchmark results on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. No equations, derivations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed performance or property to a quantity defined by the method itself. All evaluations rest on external baselines and standard datasets, rendering the central claims self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger is minimal because the review uses only the abstract, which states the core premise but supplies no numerical parameters or additional axioms.

axioms (1)

domain assumption Overparameterized networks possess inherent redundancy that permits task isolation without catastrophic forgetting when using the described freezing and distillation protocol.
This premise is invoked as the basis for avoiding a replay buffer.

pith-pipeline@v0.9.0 · 5756 in / 1312 out tokens · 34265 ms · 2026-05-23T00:47:36.603458+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Shift Detection and Adaptation for Network Intrusion Detection
cs.CR 2025-08 unverdicted novelty 5.0

NetSight continually detects distribution shifts in network intrusion data and adapts a supervised model using pseudo-labeling and knowledge distillation, achieving up to 11.72% F1 improvement over methods requiring m...

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Expert gate: Lifelong learning with a network of experts

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In CVPR, 2017. 2

work page 2017
[2]

Memory Aware Synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory Aware Synapses: Learning what (not) to forget . In ECCV, 2018. 1

work page 2018
[3]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. In NIPS, pages 15920–15930, 2020. 1, 2, 6

work page 2020
[4]

End-to-end incre- mental learning

Francisco M Castro, Manuel J Mar ´ın-Jim´enez, Nicol´as Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incre- mental learning. In ECCV, pages 233–248, 2018. 2

work page 2018
[5]

Dokania, Thalaiyasingam Ajan- than, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremen- tal learning: Understanding forgetting and intransigence. In ECCV, 2018. 5, 6

work page 2018
[6]

Learning without mem- orizing

Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without mem- orizing. In CVPR, pages 5138–5146, 2019. 2

work page 2019
[7]

Podnet: Pooled outputs distil- lation for small-tasks incremental learning

Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Podnet: Pooled outputs distil- lation for small-tasks incremental learning. In ECCV, pages 86–102, 2020. 2

work page 2020
[8]

DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion

Arthur Douillard, Alexandre Ram ´e, Guillaume Couairon, and Matthieu Cord. DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion. InCVPR, pages 9285–9295, 2022. 1

work page 2022
[9]

Don't forget, there is more than forgetting: new metrics for Continual Learning

Natalia D ´ıaz-Rodr´ıguez, Vincenzo Lomonaco, David Fil- liat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning, 2018. https://arxiv.org/abs/1810.13166. 6

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. 4

work page 2016
[11]

Knowledge distillation: A survey

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. IJCV, 129 (6):1789–1819, 2021. 2

work page 2021
[12]

Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification. In ICCV, 2015. 6

work page 2015
[13]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,

work page
[14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network, 2015. https://arxiv.org/abs/1503.02531. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2015
[15]

Learning a unified classifier incrementally via rebalancing

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. In CVPR, pages 831–839, 2019. 2

work page 2019
[16]

Memory-efficient incremental learning through feature adaptation

Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, and Cordelia Schmid. Memory-efficient incremental learning through feature adaptation. In ECCV, pages 699–715, 2020. 2

work page 2020
[17]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks. Proceedings of the National Academy of Sciences, 114(13)...

work page 2017
[18]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 6

work page 2009
[19]

Imagenet classification with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, pages 1401–1476, 2012. 6

work page 2012
[20]

Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge, 2015. 6

work page 2015
[21]

Overcoming catastrophic forgetting with unlabeled data in the wild

Kibok Lee, Kimin Lee, Jinwoo Shin, and Honglak Lee. Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV, pages 312–321, 2019. 2

work page 2019
[22]

Learning without forgetting

Zhizhong Li and Derek Hoiem. Learning without forgetting. PAMI, 40(12):2935–2947, 2018. 1, 2, 6

work page 2018
[23]

L ´opez, and Andrew D

Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Wei- jer, Antonio M. L ´opez, and Andrew D. Bagdanov. Ro- tate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In ICPR, pages 2262–2268, 2018. 1

work page 2018
[24]

RMM: rein- forced memory management for class-incremental learning

Yaoyao Liu, Bernt Schiele, and Qianru Sun. RMM: rein- forced memory management for class-incremental learning. In NIPS, 2024. 2

work page 2024
[25]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017. 5, 6

work page 2017
[26]

Continual lifelong learning with neural networks: A review

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71,

work page
[27]

Pf ¨ulb and A

B. Pf ¨ulb and A. Gepperth. A comprehensive, application- oriented study of catastrophic forgetting in DNNs. In ICLR,

work page
[28]

Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024

Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, and Jun Liu. Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024. https://arxiv.org/abs/2109.11369. 1

work page arXiv 2024
[29]

Encoder-based lifelong learning

Atoum Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder-based lifelong learning. In ICCV, pages 1320–1328, 2017. 1, 2

work page 2017
[30]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classi- fier and representation learning. InCVPR, pages 2001–2010,

work page 2001
[31]

Continual learning via sequential function-space variational inference

Tim GJ Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, and Yarin Gal. Continual learning via sequential function-space variational inference. In ICML, pages 18871–18887, 2022. 2

work page 2022
[32]

Progressive Neural Networks

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Raz- van Pascanu, and Raia Hadsell. Progressive Neural Net- works, 2022. https://arxiv.org/abs/1606.04671. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[33]

Gradient pro- jection memory for continual learning

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient pro- jection memory for continual learning. In ICLR, 2021. 1

work page 2021
[34]

Exploring Example Influence in Continual Learning

Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, and Liang Wan. Exploring Example Influence in Continual Learning. In NIPS, pages 27075–27086, 2022. 2

work page 2022
[35]

Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025

Mohammad Ali Vahedifar, Qi Zhang, and Alexandros Iosifidis. Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025. https://doi.org/10.5281/zenodo.14631802. 1, 6

work page doi:10.5281/zenodo.14631802 2025
[36]

Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H

Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, and Gido M. van de Ven. Conti...

work page 2024
[37]

A comprehensive survey of continual learning: theory, method and application

Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: theory, method and application. PAMI, 46(8):5362–5383, 2024. 6

work page 2024
[38]

Con- tinual learning through retrieval and imagination

Zhen Wang, Liu Liu, Yiqun Duan, and Dacheng Tao. Con- tinual learning through retrieval and imagination. In AAAI,

work page
[39]

A comprehensive survey of forgetting in deep learning beyond continual learning, 2023

Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning, 2023. https://arxiv.org/abs/2307.09218. 1, 2

work page arXiv 2023
[40]

Continual learning: A review of techniques, challenges and future directions

Buddhi Wickramasinghe, Gobinda Saha, and Kaushik Roy. Continual learning: A review of techniques, challenges and future directions. TNNLS, pages 123–140, 2024. 6, 1

work page 2024
[41]

Memory replay gans: Learning to generate new categories without forgetting

Chenshen Wu, Luis Herranz, Xialei Liu, Joost van de Weijer, and Bogdan Raducanu. Memory replay gans: Learning to generate new categories without forgetting. In NIPS, 2018. 2

work page 2018
[42]

Large scale in- cremental learning

Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale in- cremental learning. In CVPR, 2019. 1, 2

work page 2019
[43]

Lifelong Learning with Dynamically Expandable Networks

Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR, 2018. 1

work page 2018
[44]

Contin- ual learning of context-dependent processing in neural net- works

Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. Contin- ual learning of context-dependent processing in neural net- works. Nature Machine Intelligence , 1(8):364–372, 2019. 1

work page 2019
[45]

Contin- ual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. In ICML, pages 3987–3995, 2017. 1

work page 2017
[46]

Lifelong gan: Continual learning for conditional image generation

Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, and Greg Mori. Lifelong gan: Continual learning for conditional image generation. In ICCV, pages 2759– 2768, 2019. 2

work page 2019
[47]

Co- Transport for Class-Incremental Learning

Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Co- Transport for Class-Incremental Learning. InACMMM, page 1645–1654, 2021. 2, 6

work page 2021
[48]

A model or 603 exemplars: Towards memory-efficient class-incremental learning

Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De-Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In ICLR, 2023. 2, 6

work page 2023
[49]

Continual Learning with Pre-Trained Mod- els: A Survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual Learning with Pre-Trained Mod- els: A Survey. In IJCAI, pages 8363–8371, 2024. 1

work page 2024
[50]

Class-incremental learning: A survey

Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey. PAMI, 46(12):9851–9873, 2024. 1, 6, 8 No Forgetting Learning: Memory-free Continual Learning Supplementary Material The supplementary material mainly contains additional ma- terials and experiments that cannot be reported due to the page...

work page 2024
[51]

The crucial aspect of Task-IL is that the model is provided with information about which task it is handling during training and testing

Evaluation Protocol The two main experimental scenarios typically used to eval- uate the performance of methods are the following: • Task Incremental Learning (Task-IL): In Task-IL, the training data is divided into multiple tasks, each with a unique set of classes. The crucial aspect of Task-IL is that the model is provided with information about which t...

work page
[52]

The re- sult is illustrated in Fig

Additional Results We report Class-IL for the CIFAR-100 dataset, where all 100 classes were trained under the configuration of 20 tasks, with each corresponding to 5 incremental classes. The re- sult is illustrated in Fig. 6. This helps us understand how the number of tasks or classes affects performance. By compar- ing it to Fig. 2, we see that performan...

work page 2000

[1] [1]

Expert gate: Lifelong learning with a network of experts

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In CVPR, 2017. 2

work page 2017

[2] [2]

Memory Aware Synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory Aware Synapses: Learning what (not) to forget . In ECCV, 2018. 1

work page 2018

[3] [3]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. In NIPS, pages 15920–15930, 2020. 1, 2, 6

work page 2020

[4] [4]

End-to-end incre- mental learning

Francisco M Castro, Manuel J Mar ´ın-Jim´enez, Nicol´as Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incre- mental learning. In ECCV, pages 233–248, 2018. 2

work page 2018

[5] [5]

Dokania, Thalaiyasingam Ajan- than, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremen- tal learning: Understanding forgetting and intransigence. In ECCV, 2018. 5, 6

work page 2018

[6] [6]

Learning without mem- orizing

Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without mem- orizing. In CVPR, pages 5138–5146, 2019. 2

work page 2019

[7] [7]

Podnet: Pooled outputs distil- lation for small-tasks incremental learning

Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Podnet: Pooled outputs distil- lation for small-tasks incremental learning. In ECCV, pages 86–102, 2020. 2

work page 2020

[8] [8]

DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion

Arthur Douillard, Alexandre Ram ´e, Guillaume Couairon, and Matthieu Cord. DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion. InCVPR, pages 9285–9295, 2022. 1

work page 2022

[9] [9]

Don't forget, there is more than forgetting: new metrics for Continual Learning

Natalia D ´ıaz-Rodr´ıguez, Vincenzo Lomonaco, David Fil- liat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning, 2018. https://arxiv.org/abs/1810.13166. 6

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Deep Learning

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. 4

work page 2016

[11] [11]

Knowledge distillation: A survey

Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. IJCV, 129 (6):1789–1819, 2021. 2

work page 2021

[12] [12]

Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification. In ICCV, 2015. 6

work page 2015

[13] [13]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,

work page

[14] [14]

Distilling the Knowledge in a Neural Network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network, 2015. https://arxiv.org/abs/1503.02531. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2015

[15] [15]

Learning a unified classifier incrementally via rebalancing

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. In CVPR, pages 831–839, 2019. 2

work page 2019

[16] [16]

Memory-efficient incremental learning through feature adaptation

Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, and Cordelia Schmid. Memory-efficient incremental learning through feature adaptation. In ECCV, pages 699–715, 2020. 2

work page 2020

[17] [17]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks. Proceedings of the National Academy of Sciences, 114(13)...

work page 2017

[18] [18]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 6

work page 2009

[19] [19]

Imagenet classification with deep convolutional neural net- works

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, pages 1401–1476, 2012. 6

work page 2012

[20] [20]

Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge, 2015. 6

work page 2015

[21] [21]

Overcoming catastrophic forgetting with unlabeled data in the wild

Kibok Lee, Kimin Lee, Jinwoo Shin, and Honglak Lee. Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV, pages 312–321, 2019. 2

work page 2019

[22] [22]

Learning without forgetting

Zhizhong Li and Derek Hoiem. Learning without forgetting. PAMI, 40(12):2935–2947, 2018. 1, 2, 6

work page 2018

[23] [23]

L ´opez, and Andrew D

Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Wei- jer, Antonio M. L ´opez, and Andrew D. Bagdanov. Ro- tate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In ICPR, pages 2262–2268, 2018. 1

work page 2018

[24] [24]

RMM: rein- forced memory management for class-incremental learning

Yaoyao Liu, Bernt Schiele, and Qianru Sun. RMM: rein- forced memory management for class-incremental learning. In NIPS, 2024. 2

work page 2024

[25] [25]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017. 5, 6

work page 2017

[26] [26]

Continual lifelong learning with neural networks: A review

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71,

work page

[27] [27]

Pf ¨ulb and A

B. Pf ¨ulb and A. Gepperth. A comprehensive, application- oriented study of catastrophic forgetting in DNNs. In ICLR,

work page

[28] [28]

Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024

Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, and Jun Liu. Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024. https://arxiv.org/abs/2109.11369. 1

work page arXiv 2024

[29] [29]

Encoder-based lifelong learning

Atoum Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder-based lifelong learning. In ICCV, pages 1320–1328, 2017. 1, 2

work page 2017

[30] [30]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classi- fier and representation learning. InCVPR, pages 2001–2010,

work page 2001

[31] [31]

Continual learning via sequential function-space variational inference

Tim GJ Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, and Yarin Gal. Continual learning via sequential function-space variational inference. In ICML, pages 18871–18887, 2022. 2

work page 2022

[32] [32]

Progressive Neural Networks

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Raz- van Pascanu, and Raia Hadsell. Progressive Neural Net- works, 2022. https://arxiv.org/abs/1606.04671. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022

[33] [33]

Gradient pro- jection memory for continual learning

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient pro- jection memory for continual learning. In ICLR, 2021. 1

work page 2021

[34] [34]

Exploring Example Influence in Continual Learning

Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, and Liang Wan. Exploring Example Influence in Continual Learning. In NIPS, pages 27075–27086, 2022. 2

work page 2022

[35] [35]

Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025

Mohammad Ali Vahedifar, Qi Zhang, and Alexandros Iosifidis. Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025. https://doi.org/10.5281/zenodo.14631802. 1, 6

work page doi:10.5281/zenodo.14631802 2025

[36] [36]

Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H

Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, and Gido M. van de Ven. Conti...

work page 2024

[37] [37]

A comprehensive survey of continual learning: theory, method and application

Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: theory, method and application. PAMI, 46(8):5362–5383, 2024. 6

work page 2024

[38] [38]

Con- tinual learning through retrieval and imagination

Zhen Wang, Liu Liu, Yiqun Duan, and Dacheng Tao. Con- tinual learning through retrieval and imagination. In AAAI,

work page

[39] [39]

A comprehensive survey of forgetting in deep learning beyond continual learning, 2023

Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning, 2023. https://arxiv.org/abs/2307.09218. 1, 2

work page arXiv 2023

[40] [40]

Continual learning: A review of techniques, challenges and future directions

Buddhi Wickramasinghe, Gobinda Saha, and Kaushik Roy. Continual learning: A review of techniques, challenges and future directions. TNNLS, pages 123–140, 2024. 6, 1

work page 2024

[41] [41]

Memory replay gans: Learning to generate new categories without forgetting

Chenshen Wu, Luis Herranz, Xialei Liu, Joost van de Weijer, and Bogdan Raducanu. Memory replay gans: Learning to generate new categories without forgetting. In NIPS, 2018. 2

work page 2018

[42] [42]

Large scale in- cremental learning

Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale in- cremental learning. In CVPR, 2019. 1, 2

work page 2019

[43] [43]

Lifelong Learning with Dynamically Expandable Networks

Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR, 2018. 1

work page 2018

[44] [44]

Contin- ual learning of context-dependent processing in neural net- works

Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. Contin- ual learning of context-dependent processing in neural net- works. Nature Machine Intelligence , 1(8):364–372, 2019. 1

work page 2019

[45] [45]

Contin- ual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. In ICML, pages 3987–3995, 2017. 1

work page 2017

[46] [46]

Lifelong gan: Continual learning for conditional image generation

Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, and Greg Mori. Lifelong gan: Continual learning for conditional image generation. In ICCV, pages 2759– 2768, 2019. 2

work page 2019

[47] [47]

Co- Transport for Class-Incremental Learning

Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Co- Transport for Class-Incremental Learning. InACMMM, page 1645–1654, 2021. 2, 6

work page 2021

[48] [48]

A model or 603 exemplars: Towards memory-efficient class-incremental learning

Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De-Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In ICLR, 2023. 2, 6

work page 2023

[49] [49]

Continual Learning with Pre-Trained Mod- els: A Survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual Learning with Pre-Trained Mod- els: A Survey. In IJCAI, pages 8363–8371, 2024. 1

work page 2024

[50] [50]

Class-incremental learning: A survey

Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey. PAMI, 46(12):9851–9873, 2024. 1, 6, 8 No Forgetting Learning: Memory-free Continual Learning Supplementary Material The supplementary material mainly contains additional ma- terials and experiments that cannot be reported due to the page...

work page 2024

[51] [51]

The crucial aspect of Task-IL is that the model is provided with information about which task it is handling during training and testing

Evaluation Protocol The two main experimental scenarios typically used to eval- uate the performance of methods are the following: • Task Incremental Learning (Task-IL): In Task-IL, the training data is divided into multiple tasks, each with a unique set of classes. The crucial aspect of Task-IL is that the model is provided with information about which t...

work page

[52] [52]

The re- sult is illustrated in Fig

Additional Results We report Class-IL for the CIFAR-100 dataset, where all 100 classes were trained under the configuration of 20 tasks, with each corresponding to 5 incremental classes. The re- sult is illustrated in Fig. 6. This helps us understand how the number of tasks or classes affects performance. By compar- ing it to Fig. 2, we see that performan...

work page 2000