No Forgetting Learning: Buffer-free Continual Learning Classification
Pith reviewed 2026-05-23 00:47 UTC · model grok-4.3
The pith
A buffer-free continual learning method matches memory-based accuracy on sequential image tasks by freezing and distilling overparameterized networks instead of storing examples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Overparameterized networks contain enough redundancy that a decomposed architecture can support new tasks through stepwise freezing of new heads, distillation-based adaptation of the shared backbone, and joint refinement under dual soft targets, allowing prior task performance to be retained without any stored exemplars.
What carries the argument
The stepwise freezing protocol that isolates new task capabilities in dedicated heads, adapts shared representations under knowledge distillation, and refines all components jointly with dual soft-target anchoring.
If this is right
- The method outperforms all buffer-free baselines on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks.
- Performance matches memory-based methods while using only 2.53 percent of their model size.
- NFL+LoRA keeps backbone memory cost constant regardless of task count when applied to pre-trained Vision Transformers.
- The Plasticity-Stability score offers a balanced metric for evaluating continual learning trade-offs.
Where Pith is reading between the lines
- Eliminating the replay buffer removes a source of privacy risk in regulated continual learning settings.
- The same redundancy-based isolation strategy could be tested on sequence models in natural language tasks where storage of prior examples is restricted.
Load-bearing premise
Overparameterized networks contain sufficient inherent redundancy to allow new task capabilities to be isolated via stepwise freezing and distillation without degrading prior task performance.
What would settle it
Running the NFL protocol on ImageNet-1000 split into 50 tasks and finding that accuracy on the earliest tasks falls well below the level maintained by any memory-based baseline.
Figures
read the original abstract
Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then applies a stepwise freezing protocol: new capabilities are first isolated, shared representations are adapted under knowledge distillation, and all components are jointly refined with dual soft-target anchoring. NFL+ augments this pipeline with an under-complete auto-encoder that preserves informative features from previous tasks and corrects the prediction bias caused by class imbalance. NFL+LoRA further extends the framework to pre-trained Vision Transformers by confining updates to a low-rank subspace with Fisher-weighted regularization, maintaining constant backbone memory cost regardless of the number of tasks. On CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks, NFL+ outperforms all buffer-free baselines and matches memory-based methods while requiring only 2.53\% of their model size. We also propose a Plasticity--Stability score for more balanced trade-off evaluation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes No Forgetting Learning (NFL), a buffer-free continual learning method for class- and task-incremental settings. It decomposes the network into a shared backbone plus task-specific heads, applies stepwise freezing with knowledge distillation and dual soft-target anchoring, augments with an under-complete auto-encoder in NFL+ to handle feature preservation and bias, and extends to NFL+LoRA for ViTs. It reports that NFL+ outperforms buffer-free baselines and matches memory-based methods on CIFAR-100, Tiny-ImageNet, and ImageNet-1000 (up to 50 tasks) while using 2.53% of their model size, and introduces a Plasticity-Stability score.
Significance. A buffer-free approach that matches memory-based performance on class-incremental benchmarks with constant memory cost would be significant for continual learning, especially in privacy-regulated domains. The exploitation of overparameterization redundancy and the new evaluation metric are potentially valuable contributions if the central claims hold.
major comments (2)
- [Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.
- [Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the class-incremental results across 50 tasks on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. This omission directly affects verifiability of the buffer-free class-incremental performance.
minor comments (2)
- [Abstract] Abstract: The claim of 'only 2.53% of their model size' should specify the exact baseline (e.g., total parameters of a memory-based method) and clarify whether this includes the auto-encoder overhead.
- The manuscript provides no implementation details, statistical significance tests, or ablation studies on the distillation and anchoring components, which limits independent verification of the empirical claims.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. The points raised correctly identify a lack of explicit description regarding inference in the class-incremental setting, which requires clarification to support the claims made in the abstract and experiments. We address each comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that NFL supports class-incremental learning (no task ID at inference) is load-bearing yet contradicted by the architecture, which decomposes into task-specific heads whose selection at test time requires task identity. This construction is standard for task-incremental but incompatible with class-incremental protocols, so the reported class-incremental benchmark numbers do not substantiate the headline claim unless an alternative routing mechanism is provided.
Authors: We agree that the architecture description using task-specific heads creates an inconsistency with the class-incremental claim in the abstract, as no alternative routing or merging mechanism is provided to enable inference without task identity. The manuscript does not describe such a mechanism, so the class-incremental benchmark results cannot be fully substantiated as presented. We will revise the abstract to accurately reflect the supported settings (primarily task-incremental with the described architecture) and clarify or adjust the class-incremental claims and reporting. revision: yes
-
Referee: [Method description] Method (stepwise freezing and head isolation protocol): No explicit mechanism is described for merging task-specific heads or performing inference without task labels, which is required to support the buffer-free class-incremental performance.
Authors: The referee is correct: the method section provides no explicit mechanism for merging task-specific heads or for inference without task labels. This omission means the buffer-free class-incremental performance on the reported benchmarks cannot be verified from the current text. We will revise the method section to add a clear description of the inference procedure (or note its absence and revise the experimental claims if no such procedure was used). revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper introduces an empirical framework (NFL/NFL+) for buffer-free continual learning and reports benchmark results on CIFAR-100, Tiny-ImageNet, and ImageNet-1000. No equations, derivations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed performance or property to a quantity defined by the method itself. All evaluations rest on external baselines and standard datasets, rendering the central claims self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Overparameterized networks possess inherent redundancy that permits task isolation without catastrophic forgetting when using the described freezing and distillation protocol.
Forward citations
Cited by 1 Pith paper
-
Shift Detection and Adaptation for Network Intrusion Detection
NetSight continually detects distribution shifts in network intrusion data and adapts a supervised model using pseudo-labeling and knowledge distillation, achieving up to 11.72% F1 improvement over methods requiring m...
Reference graph
Works this paper leans on
-
[1]
Expert gate: Lifelong learning with a network of experts
Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In CVPR, 2017. 2
work page 2017
-
[2]
Memory Aware Synapses: Learning what (not) to forget
Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory Aware Synapses: Learning what (not) to forget . In ECCV, 2018. 1
work page 2018
-
[3]
Dark experience for general continual learning: a strong, simple baseline
Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. In NIPS, pages 15920–15930, 2020. 1, 2, 6
work page 2020
-
[4]
End-to-end incre- mental learning
Francisco M Castro, Manuel J Mar ´ın-Jim´enez, Nicol´as Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incre- mental learning. In ECCV, pages 233–248, 2018. 2
work page 2018
-
[5]
Dokania, Thalaiyasingam Ajan- than, and Philip H
Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajan- than, and Philip H. S. Torr. Riemannian walk for incremen- tal learning: Understanding forgetting and intransigence. In ECCV, 2018. 5, 6
work page 2018
-
[6]
Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, and Rama Chellappa. Learning without mem- orizing. In CVPR, pages 5138–5146, 2019. 2
work page 2019
-
[7]
Podnet: Pooled outputs distil- lation for small-tasks incremental learning
Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Podnet: Pooled outputs distil- lation for small-tasks incremental learning. In ECCV, pages 86–102, 2020. 2
work page 2020
-
[8]
DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion
Arthur Douillard, Alexandre Ram ´e, Guillaume Couairon, and Matthieu Cord. DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion. InCVPR, pages 9285–9295, 2022. 1
work page 2022
-
[9]
Don't forget, there is more than forgetting: new metrics for Continual Learning
Natalia D ´ıaz-Rodr´ıguez, Vincenzo Lomonaco, David Fil- liat, and Davide Maltoni. Don’t forget, there is more than forgetting: new metrics for continual learning, 2018. https://arxiv.org/abs/1810.13166. 6
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. 4
work page 2016
-
[11]
Knowledge distillation: A survey
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey. IJCV, 129 (6):1789–1819, 2021. 2
work page 2021
-
[12]
Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level perfor- mance on imagenet classification. In ICCV, 2015. 6
work page 2015
-
[13]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR,
-
[14]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Dis- tilling the knowledge in a neural network, 2015. https://arxiv.org/abs/1503.02531. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[15]
Learning a unified classifier incrementally via rebalancing
Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. In CVPR, pages 831–839, 2019. 2
work page 2019
-
[16]
Memory-efficient incremental learning through feature adaptation
Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, and Cordelia Schmid. Memory-efficient incremental learning through feature adaptation. In ECCV, pages 699–715, 2020. 2
work page 2020
-
[17]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Ku- maran, and Raia Hadsell. Overcoming catastrophic for- getting in neural networks. Proceedings of the National Academy of Sciences, 114(13)...
work page 2017
-
[18]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 6
work page 2009
-
[19]
Imagenet classification with deep convolutional neural net- works
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. In NIPS, pages 1401–1476, 2012. 6
work page 2012
-
[20]
Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge, 2015. 6
work page 2015
-
[21]
Overcoming catastrophic forgetting with unlabeled data in the wild
Kibok Lee, Kimin Lee, Jinwoo Shin, and Honglak Lee. Overcoming catastrophic forgetting with unlabeled data in the wild. In ICCV, pages 312–321, 2019. 2
work page 2019
-
[22]
Zhizhong Li and Derek Hoiem. Learning without forgetting. PAMI, 40(12):2935–2947, 2018. 1, 2, 6
work page 2018
-
[23]
Xialei Liu, Marc Masana, Luis Herranz, Joost Van de Wei- jer, Antonio M. L ´opez, and Andrew D. Bagdanov. Ro- tate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting. In ICPR, pages 2262–2268, 2018. 1
work page 2018
-
[24]
RMM: rein- forced memory management for class-incremental learning
Yaoyao Liu, Bernt Schiele, and Qianru Sun. RMM: rein- forced memory management for class-incremental learning. In NIPS, 2024. 2
work page 2024
-
[25]
Gradient episodic memory for continual learning
David Lopez-Paz and Marc' Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017. 5, 6
work page 2017
-
[26]
Continual lifelong learning with neural networks: A review
German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71,
-
[27]
B. Pf ¨ulb and A. Gepperth. A comprehensive, application- oriented study of catastrophic forgetting in DNNs. In ICLR,
-
[28]
Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024
Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, and Jun Liu. Recent Advances of Contin- ual Learning in Computer Vision: An Overview, 2024. https://arxiv.org/abs/2109.11369. 1
-
[29]
Encoder-based lifelong learning
Atoum Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. Encoder-based lifelong learning. In ICCV, pages 1320–1328, 2017. 1, 2
work page 2017
-
[30]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classi- fier and representation learning. InCVPR, pages 2001–2010,
work page 2001
-
[31]
Continual learning via sequential function-space variational inference
Tim GJ Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, and Yarin Gal. Continual learning via sequential function-space variational inference. In ICML, pages 18871–18887, 2022. 2
work page 2022
-
[32]
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Raz- van Pascanu, and Raia Hadsell. Progressive Neural Net- works, 2022. https://arxiv.org/abs/1606.04671. 1
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
Gradient pro- jection memory for continual learning
Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient pro- jection memory for continual learning. In ICLR, 2021. 1
work page 2021
-
[34]
Exploring Example Influence in Continual Learning
Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, and Liang Wan. Exploring Example Influence in Continual Learning. In NIPS, pages 27075–27086, 2022. 2
work page 2022
-
[35]
Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025
Mohammad Ali Vahedifar, Qi Zhang, and Alexandros Iosifidis. Towards lifelong deep learning: A review of continual learning and unlearning methods, 2025. https://doi.org/10.5281/zenodo.14631802. 1, 6
-
[36]
Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H
Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke H ¨ullermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, and Gido M. van de Ven. Conti...
work page 2024
-
[37]
A comprehensive survey of continual learning: theory, method and application
Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A comprehensive survey of continual learning: theory, method and application. PAMI, 46(8):5362–5383, 2024. 6
work page 2024
-
[38]
Con- tinual learning through retrieval and imagination
Zhen Wang, Liu Liu, Yiqun Duan, and Dacheng Tao. Con- tinual learning through retrieval and imagination. In AAAI,
-
[39]
A comprehensive survey of forgetting in deep learning beyond continual learning, 2023
Zhenyi Wang, Enneng Yang, Li Shen, and Heng Huang. A comprehensive survey of forgetting in deep learning beyond continual learning, 2023. https://arxiv.org/abs/2307.09218. 1, 2
-
[40]
Continual learning: A review of techniques, challenges and future directions
Buddhi Wickramasinghe, Gobinda Saha, and Kaushik Roy. Continual learning: A review of techniques, challenges and future directions. TNNLS, pages 123–140, 2024. 6, 1
work page 2024
-
[41]
Memory replay gans: Learning to generate new categories without forgetting
Chenshen Wu, Luis Herranz, Xialei Liu, Joost van de Weijer, and Bogdan Raducanu. Memory replay gans: Learning to generate new categories without forgetting. In NIPS, 2018. 2
work page 2018
-
[42]
Large scale in- cremental learning
Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. Large scale in- cremental learning. In CVPR, 2019. 1, 2
work page 2019
-
[43]
Lifelong Learning with Dynamically Expandable Networks
Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong Learning with Dynamically Expandable Networks. In ICLR, 2018. 1
work page 2018
-
[44]
Contin- ual learning of context-dependent processing in neural net- works
Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. Contin- ual learning of context-dependent processing in neural net- works. Nature Machine Intelligence , 1(8):364–372, 2019. 1
work page 2019
-
[45]
Contin- ual learning through synaptic intelligence
Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. In ICML, pages 3987–3995, 2017. 1
work page 2017
-
[46]
Lifelong gan: Continual learning for conditional image generation
Mengyao Zhai, Lei Chen, Frederick Tung, Jiawei He, Megha Nawhal, and Greg Mori. Lifelong gan: Continual learning for conditional image generation. In ICCV, pages 2759– 2768, 2019. 2
work page 2019
-
[47]
Co- Transport for Class-Incremental Learning
Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Co- Transport for Class-Incremental Learning. InACMMM, page 1645–1654, 2021. 2, 6
work page 2021
-
[48]
A model or 603 exemplars: Towards memory-efficient class-incremental learning
Da-Wei Zhou, Qi-Wei Wang, Han-Jia Ye, and De-Chuan Zhan. A model or 603 exemplars: Towards memory-efficient class-incremental learning. In ICLR, 2023. 2, 6
work page 2023
-
[49]
Continual Learning with Pre-Trained Mod- els: A Survey
Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, and De-Chuan Zhan. Continual Learning with Pre-Trained Mod- els: A Survey. In IJCAI, pages 8363–8371, 2024. 1
work page 2024
-
[50]
Class-incremental learning: A survey
Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey. PAMI, 46(12):9851–9873, 2024. 1, 6, 8 No Forgetting Learning: Memory-free Continual Learning Supplementary Material The supplementary material mainly contains additional ma- terials and experiments that cannot be reported due to the page...
work page 2024
-
[51]
Evaluation Protocol The two main experimental scenarios typically used to eval- uate the performance of methods are the following: • Task Incremental Learning (Task-IL): In Task-IL, the training data is divided into multiple tasks, each with a unique set of classes. The crucial aspect of Task-IL is that the model is provided with information about which t...
-
[52]
The re- sult is illustrated in Fig
Additional Results We report Class-IL for the CIFAR-100 dataset, where all 100 classes were trained under the configuration of 20 tasks, with each corresponding to 5 incremental classes. The re- sult is illustrated in Fig. 6. This helps us understand how the number of tasks or classes affects performance. By compar- ing it to Fig. 2, we see that performan...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.