RECALL: Rehearsal-free Continual Learning for Object Classification
Pith reviewed 2026-05-24 11:11 UTC · model grok-4.3
The pith
The RECALL method lets a network learn new object categories over time without storing any old images by pre-computing and freezing old logits as training targets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RECALL is a rehearsal-free continual learning approach in which a network pre-computes logits on old categories before each new training sequence and treats those logits as fixed targets. For each new sequence a fresh classification head is attached. Forgetting is further reduced by replacing the usual classification loss with a regression loss and by adding a Mahalanobis loss on the known categories that incorporates feature variances.
What carries the argument
The logit-recall step that supplies fixed targets for old categories together with the regression-plus-Mahalanobis regularization.
If this is right
- Outperforms prior methods on CORe50 and iCIFAR-100 without any rehearsal buffer.
- Achieves the highest reported accuracy on the new HOWS-CL-25 dataset of 25 household objects.
- Supports incremental addition of output heads for arbitrary numbers of new classes.
- Enables deployment on memory-constrained platforms such as mobile robots.
Where Pith is reading between the lines
- The logit-recall idea could be paired with other regularization schemes to increase stability further.
- Performance on synthetic household objects suggests the method may transfer to real robot vision once domain shift is handled.
- If the Mahalanobis term is essential, similar covariance-aware losses could be tested in other continual-learning domains.
Load-bearing premise
Pre-computed logits from the previous network state, used as fixed targets together with regression and Mahalanobis losses, are sufficient to protect old category performance when no old data is retained.
What would settle it
A training sequence in which accuracy on previously learned categories falls sharply after the network is updated on new categories alone.
Figures
read the original abstract
Convolutional neural networks show remarkable results in classification but struggle with learning new things on the fly. We present a novel rehearsal-free approach, where a deep neural network is continually learning new unseen object categories without saving any data of prior sequences. Our approach is called RECALL, as the network recalls categories by calculating logits for old categories before training new ones. These are then used during training to avoid changing the old categories. For each new sequence, a new head is added to accommodate the new categories. To mitigate forgetting, we present a regularization strategy where we replace the classification with a regression. Moreover, for the known categories, we propose a Mahalanobis loss that includes the variances to account for the changing densities between known and unknown categories. Finally, we present a novel dataset for continual learning, especially suited for object recognition on a mobile robot (HOWS-CL-25), including 150,795 synthetic images of 25 household object categories. Our approach RECALL outperforms the current state of the art on CORe50 and iCIFAR-100 and reaches the best performance on HOWS-CL-25.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RECALL, a rehearsal-free continual learning method for object classification. For each new data sequence, logits for all previously seen categories are pre-computed on the current model and stored as fixed regression targets; a new classification head is added for the new categories; training replaces cross-entropy with regression loss plus a Mahalanobis-distance term that incorporates per-class covariance to handle density changes. The method is evaluated on CORe50 and iCIFAR-100 (outperforming prior rehearsal-free baselines) and on a newly introduced synthetic dataset HOWS-CL-25 (150k images, 25 household categories), where it reports the highest accuracy.
Significance. If the no-forgetting guarantee holds without data storage or backbone freezing, the approach would be a meaningful contribution to rehearsal-free continual learning, especially for privacy-sensitive or memory-constrained settings such as mobile robotics. The introduction of HOWS-CL-25 is a concrete asset for the community. However, the significance is tempered by the absence of reported quantitative numbers, ablations, or statistical details in the provided abstract and by the unresolved consistency issue between fixed targets and an updating feature extractor.
major comments (2)
- [abstract / §3] Method description (abstract and §3): the claim that pre-computed logits serve as fixed targets sufficient to prevent catastrophic forgetting assumes either that the shared backbone remains frozen or that the targets are re-computed after every update. Neither option is stated; if the backbone parameters change, the stored logits become inconsistent with the current feature space, and the Mahalanobis term alone does not restore decision boundaries on unseen old-class inputs. This is load-bearing for the central rehearsal-free claim.
- [§4 / Table 1] Experimental section: the abstract asserts outperformance on CORe50, iCIFAR-100 and best results on HOWS-CL-25, yet supplies no numerical accuracies, standard deviations, or ablation tables. Without these data the central empirical claim cannot be verified and the free parameters (regression coefficient, Mahalanobis scaling) cannot be assessed for sensitivity.
minor comments (2)
- [§3] Notation: the distinction between “logits” used as regression targets and the output of the new head is not made explicit; a short equation defining the composite loss would remove ambiguity.
- [§4.3] Dataset: HOWS-CL-25 is introduced as synthetic images suited for mobile-robot object recognition; a brief description of the rendering pipeline and domain gap to real images would strengthen the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and indicate the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [abstract / §3] Method description (abstract and §3): the claim that pre-computed logits serve as fixed targets sufficient to prevent catastrophic forgetting assumes either that the shared backbone remains frozen or that the targets are re-computed after every update. Neither option is stated; if the backbone parameters change, the stored logits become inconsistent with the current feature space, and the Mahalanobis term alone does not restore decision boundaries on unseen old-class inputs. This is load-bearing for the central rehearsal-free claim.
Authors: We agree that the current description in the abstract and §3 is insufficiently precise on this point and does not explicitly state how consistency is preserved when the backbone is updated. In the method, the backbone parameters are updated during training on each new sequence while the pre-computed logits from the prior model serve as fixed regression targets for the old classification heads; the Mahalanobis loss is applied in feature space to regularize the evolving class-conditional densities. We will revise §3 to include a step-by-step training algorithm, an explicit statement that the backbone is not frozen, and a discussion (with supporting equations) of why the combination of regression to fixed logits and the Mahalanobis term maintains old-class decision boundaries without requiring old data or target re-computation. We will also add a short theoretical argument and an empirical check on a held-out old-class validation set to demonstrate that inconsistency does not materially degrade performance. revision: yes
-
Referee: [§4 / Table 1] Experimental section: the abstract asserts outperformance on CORe50, iCIFAR-100 and best results on HOWS-CL-25, yet supplies no numerical accuracies, standard deviations, or ablation tables. Without these data the central empirical claim cannot be verified and the free parameters (regression coefficient, Mahalanobis scaling) cannot be assessed for sensitivity.
Authors: The full manuscript already contains Table 1 with per-sequence and average accuracies (including standard deviations over multiple runs) on CORe50 and iCIFAR-100, plus the corresponding numbers for HOWS-CL-25. However, we acknowledge that the abstract itself provides no numerical values and that the sensitivity analysis for the two hyperparameters is not presented as a dedicated ablation. We will (i) augment the abstract with the key quantitative results (e.g., average accuracy and improvement margins), (ii) expand §4 with an explicit hyperparameter sensitivity table for the regression coefficient and Mahalanobis scaling factor, and (iii) report all results with standard deviations and the number of runs used. revision: yes
Circularity Check
No significant circularity; empirical method with external validation
full rationale
The paper presents RECALL as an empirical construction for rehearsal-free continual learning: pre-compute logits on old categories, add a new head per sequence, replace classification with regression, and apply Mahalanobis loss on known categories. No equations, derivations, or first-principles claims are shown that reduce performance to quantities fitted from the evaluation data by construction. Results are reported on independent benchmarks (CORe50, iCIFAR-100) plus a new dataset (HOWS-CL-25), making the central claims self-contained against external data rather than tautological.
Axiom & Free-Parameter Ledger
free parameters (2)
- regularization coefficient for regression loss
- Mahalanobis covariance scaling factor
Reference graph
Works this paper leans on
-
[1]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778
work page 2016
-
[2]
Implicit 3d orientation learning for 6d object detection from rgb images,
M. Sundermeyer, Z.-C. Marton, M. Durner, M. Brucker, and R. Triebel, “Implicit 3d orientation learning for 6d object detection from rgb images,” in European Conference on Computer Vision (ECCV), 2018, pp. 699–715
work page 2018
-
[3]
Learning less is more-6d camera localization via 3d surface regression,
E. Brachmann and C. Rother, “Learning less is more-6d camera localization via 3d surface regression,” in CVPR. IEEE, 2018
work page 2018
-
[4]
Towards end-to-end speech recognition with recurrent neural networks,
A. Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” in IEEE International Conference on Machine Learning (ICML) . PMLR, 2014, pp. 1764–1772
work page 2014
-
[5]
Ima- genet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Ima- genet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE, 2009
work page 2009
-
[6]
Three scenarios for continual learning
G. M. Van de Ven and A. S. Tolias, “Three scenarios for continual learning,” arXiv preprint arXiv:1904.07734 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[7]
icarl: Incremental classifier and representation learning,
S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in CVPR, 2017
work page 2017
-
[8]
Gradient episodic memory for contin- ual learning,
D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for contin- ual learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 6467–6476. 2 4 6 80 20 40 60 80 100Accuracy in % Results on CORe50 AR1 results (from [12]) 2 4 6 8 100 20 40 60 80 100Accuracy in % Results on iCIFAR-100 1 2 3 4 50 20 40 60 80 100Accuracy in % Results...
work page 2017
-
[9]
A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” Computing Research Repository (CoRR) , 2016
work page 2016
-
[10]
Core50: a new dataset and benchmark for continuous object recognition,
V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” Proceedings of Machine Learning Research (PMLR), 2017
work page 2017
-
[11]
Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, 2017
work page 2017
-
[12]
Continuous learning in single- incremental-task scenarios,
D. Maltoni and V . Lomonaco, “Continuous learning in single- incremental-task scenarios,” Neural Networks, 2019
work page 2019
-
[13]
Continual learning through synaptic intelligence,
F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in ICML, vol. 70, 2017, p. 3987
work page 2017
-
[14]
Incremental object learning from contiguous views,
S. Stojanov, S. Mishra, N. A. Thai, N. Dhanda, A. Humayun, C. Yu, L. B. Smith, and J. M. Rehg, “Incremental object learning from contiguous views,” in IEEE CVPR, June 2019
work page 2019
-
[15]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[16]
Overcoming catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al., “Overcoming catastrophic forgetting in neural networks,” Pro- ceedings of the national academy of sciences , vol. 114, 2017
work page 2017
-
[17]
Il2m: Class incremental learning with dual memory,
E. Belouadah and A. Popescu, “Il2m: Class incremental learning with dual memory,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2019
work page 2019
-
[18]
Storing encoded episodes as concepts for continual learning,
A. Ayub and A. R. Wagner, “Storing encoded episodes as concepts for continual learning,” ICML Workshop, 2020
work page 2020
-
[19]
Eec: Learning to encode and regenerate im- ages for continual learning,
A. Ayub and A. Wagner, “Eec: Learning to encode and regenerate im- ages for continual learning,” in International Conference on Learning Representations (ICLR), 2021
work page 2021
-
[20]
Persistent anytime learning of objects from unseen classes,
M. Denninger and R. Triebel, “Persistent anytime learning of objects from unseen classes,” in IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS) . IEEE, 2018, pp. 4075–4082
work page 2018
-
[21]
Blenderproc: Reducing the reality gap with photorealisitc rendering,
M. Denninger, M. Sundermeyer, D. Winkelbauer, D. Olefir, T. Hoda ˇn, Y . Zidan, M. Elbadrawy, M. Knauer, H. Katam, and A. Lodhi, “Blenderproc: Reducing the reality gap with photorealisitc rendering,” Robotics: Science and Systems (RSS) , 2020
work page 2020
-
[22]
ShapeNet: An Information-Rich 3D Model Repository
A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. , “Shapenet: An information-rich 3d model repository,” arXiv:1512.03012, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[23]
The mnist database of handwritten digits,
Y . LeCun, “The mnist database of handwritten digits,” 1998
work page 1998
-
[24]
Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning,
Q. She, F. Feng, X. Hao, Q. Yang, C. Lan, V . Lomonaco, X. Shi, Z. Wang, Y . Guo, Y . Zhang, et al. , “Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning,” in IEEE International Conference on Robotics and Automation (ICRA) , 2020
work page 2020
-
[25]
Visual concepts and compositional voting,
J. Wang, Z. Zhang, C. Xie, Y . Zhou, V . Premachandran, J. Zhu, L. Xie, and A. Yuille, “Visual concepts and compositional voting,” vol. 3
-
[26]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009
work page 2009
-
[27]
Photorealistic image synthesis for object instance detection,
T. Hoda ˇn, V . Vineet, R. Gal, E. Shalev, J. Hanzelka, T. Connell, P. Urbina, S. N. Sinha, and B. Guenter, “Photorealistic image synthesis for object instance detection,” in IEEE International Conference on Image Processing (ICIP) . IEEE, 2019, pp. 66–70
work page 2019
-
[28]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Identity mappings in deep residual networks,
K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in ECCV. Springer, 2016, pp. 630–645
work page 2016
-
[30]
Re- thinking the inception architecture for computer vision,
C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” in IEEE Conference on Computer Vision and Pattern Recognition , 2016
work page 2016
-
[31]
Inception-v4, inception-resnet and the impact of residual connections on learning,
C. Szegedy, S. Ioffe, V . Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Conference on Artificial Intelligence (AAAI) , 2017
work page 2017
-
[32]
Rectified linear units improve restricted boltzmann machines,
V . Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010
work page 2010
-
[33]
Implicit neural representations with periodic activation functions,
V . Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, “Implicit neural representations with periodic activation functions,” in NeurIPS, 2020
work page 2020
-
[34]
RECALL: Rehearsal-free Continual Learning for Object Classification
C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, 2015, pp. 1–9. - Supplemental Material - RECALL: Rehearsal-free Continual Learning for Object Classification September 30, 2022 1 Introduction With RECALL, we introduce a novel algorithm for rehearsal-free o...
work page internal anchor Pith review Pith/arXiv arXiv 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.