pith. sign in

arxiv: 2209.14774 · v1 · submitted 2022-09-29 · 💻 cs.CV · cs.AI

RECALL: Rehearsal-free Continual Learning for Object Classification

Pith reviewed 2026-05-24 11:11 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords continual learningrehearsal-freeobject classificationcatastrophic forgettingMahalanobis losslogit recallHOWS-CL-25
0
0 comments X

The pith

The RECALL method lets a network learn new object categories over time without storing any old images by pre-computing and freezing old logits as training targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a rehearsal-free continual-learning technique for object classification. Before each new training sequence the network records its current output logits on the old categories and then treats those logits as fixed targets while learning the new classes. A separate output head is added for every new sequence, the usual classification loss is replaced by a regression loss, and a Mahalanobis loss that accounts for changing feature variances is applied to the known categories. If the approach holds, networks could acquire new visual knowledge on robots or edge devices without privacy or memory costs from retaining past data.

Core claim

RECALL is a rehearsal-free continual learning approach in which a network pre-computes logits on old categories before each new training sequence and treats those logits as fixed targets. For each new sequence a fresh classification head is attached. Forgetting is further reduced by replacing the usual classification loss with a regression loss and by adding a Mahalanobis loss on the known categories that incorporates feature variances.

What carries the argument

The logit-recall step that supplies fixed targets for old categories together with the regression-plus-Mahalanobis regularization.

If this is right

  • Outperforms prior methods on CORe50 and iCIFAR-100 without any rehearsal buffer.
  • Achieves the highest reported accuracy on the new HOWS-CL-25 dataset of 25 household objects.
  • Supports incremental addition of output heads for arbitrary numbers of new classes.
  • Enables deployment on memory-constrained platforms such as mobile robots.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The logit-recall idea could be paired with other regularization schemes to increase stability further.
  • Performance on synthetic household objects suggests the method may transfer to real robot vision once domain shift is handled.
  • If the Mahalanobis term is essential, similar covariance-aware losses could be tested in other continual-learning domains.

Load-bearing premise

Pre-computed logits from the previous network state, used as fixed targets together with regression and Mahalanobis losses, are sufficient to protect old category performance when no old data is retained.

What would settle it

A training sequence in which accuracy on previously learned categories falls sharply after the network is updated on new categories alone.

Figures

Figures reproduced from arXiv: 2209.14774 by Markus Knauer, Maximilian Denninger, Rudolph Triebel.

Figure 1
Figure 1. Figure 1: Continual learning without using previously seen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Calculation of the recall label r: First, we calculate the logits for each training example of the current sequence s. After that, a new head is added (fc[1, 0], fc[1, 1]) and the recall labels rs, are used for training in sequence s. is now to prevent the logits of the categories of previous sequences from changing. 1) Recall label: In the first sequence, image-label pairs (x, y) are used for training, wh… view at source ↗
Figure 3
Figure 3. Figure 3: Logarithmic variance over all logits per sequence and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: All 25 categories used in the HOWS-CL-25 dataset [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results on four datasets. iCaRL and A-GEM are not [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 1
Figure 1. Figure 1: Each RGB image of an object in the HOWS-CL-25 dataset (left) has a corresponding seg￾mentation map (second from left), a normal image (third from left), and a depth image (right). 3 Qualitative sequence specific results In the paper, we examine the average performance of the variants of RECALL over all sequences. In this section, we want to further investigate the per￾formance for the categories of each se… view at source ↗
Figure 2
Figure 2. Figure 2: The results on the CORe50 dataset with our four methods. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The results on the iCIFAR-100 dataset with our four methods. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The results on the HOWS-CL-25 dataset with our four methods. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The results on the HOWS-CL-25 long dataset with our four methods. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Standard version of HOWS-CL-25 dataset with five sequences. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Long version of HOWS-CL-25 dataset with twelve sequences. A particular set of categories is [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Convolutional neural networks show remarkable results in classification but struggle with learning new things on the fly. We present a novel rehearsal-free approach, where a deep neural network is continually learning new unseen object categories without saving any data of prior sequences. Our approach is called RECALL, as the network recalls categories by calculating logits for old categories before training new ones. These are then used during training to avoid changing the old categories. For each new sequence, a new head is added to accommodate the new categories. To mitigate forgetting, we present a regularization strategy where we replace the classification with a regression. Moreover, for the known categories, we propose a Mahalanobis loss that includes the variances to account for the changing densities between known and unknown categories. Finally, we present a novel dataset for continual learning, especially suited for object recognition on a mobile robot (HOWS-CL-25), including 150,795 synthetic images of 25 household object categories. Our approach RECALL outperforms the current state of the art on CORe50 and iCIFAR-100 and reaches the best performance on HOWS-CL-25.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces RECALL, a rehearsal-free continual learning method for object classification. For each new data sequence, logits for all previously seen categories are pre-computed on the current model and stored as fixed regression targets; a new classification head is added for the new categories; training replaces cross-entropy with regression loss plus a Mahalanobis-distance term that incorporates per-class covariance to handle density changes. The method is evaluated on CORe50 and iCIFAR-100 (outperforming prior rehearsal-free baselines) and on a newly introduced synthetic dataset HOWS-CL-25 (150k images, 25 household categories), where it reports the highest accuracy.

Significance. If the no-forgetting guarantee holds without data storage or backbone freezing, the approach would be a meaningful contribution to rehearsal-free continual learning, especially for privacy-sensitive or memory-constrained settings such as mobile robotics. The introduction of HOWS-CL-25 is a concrete asset for the community. However, the significance is tempered by the absence of reported quantitative numbers, ablations, or statistical details in the provided abstract and by the unresolved consistency issue between fixed targets and an updating feature extractor.

major comments (2)
  1. [abstract / §3] Method description (abstract and §3): the claim that pre-computed logits serve as fixed targets sufficient to prevent catastrophic forgetting assumes either that the shared backbone remains frozen or that the targets are re-computed after every update. Neither option is stated; if the backbone parameters change, the stored logits become inconsistent with the current feature space, and the Mahalanobis term alone does not restore decision boundaries on unseen old-class inputs. This is load-bearing for the central rehearsal-free claim.
  2. [§4 / Table 1] Experimental section: the abstract asserts outperformance on CORe50, iCIFAR-100 and best results on HOWS-CL-25, yet supplies no numerical accuracies, standard deviations, or ablation tables. Without these data the central empirical claim cannot be verified and the free parameters (regression coefficient, Mahalanobis scaling) cannot be assessed for sensitivity.
minor comments (2)
  1. [§3] Notation: the distinction between “logits” used as regression targets and the output of the new head is not made explicit; a short equation defining the composite loss would remove ambiguity.
  2. [§4.3] Dataset: HOWS-CL-25 is introduced as synthetic images suited for mobile-robot object recognition; a brief description of the rendering pipeline and domain gap to real images would strengthen the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and indicate the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [abstract / §3] Method description (abstract and §3): the claim that pre-computed logits serve as fixed targets sufficient to prevent catastrophic forgetting assumes either that the shared backbone remains frozen or that the targets are re-computed after every update. Neither option is stated; if the backbone parameters change, the stored logits become inconsistent with the current feature space, and the Mahalanobis term alone does not restore decision boundaries on unseen old-class inputs. This is load-bearing for the central rehearsal-free claim.

    Authors: We agree that the current description in the abstract and §3 is insufficiently precise on this point and does not explicitly state how consistency is preserved when the backbone is updated. In the method, the backbone parameters are updated during training on each new sequence while the pre-computed logits from the prior model serve as fixed regression targets for the old classification heads; the Mahalanobis loss is applied in feature space to regularize the evolving class-conditional densities. We will revise §3 to include a step-by-step training algorithm, an explicit statement that the backbone is not frozen, and a discussion (with supporting equations) of why the combination of regression to fixed logits and the Mahalanobis term maintains old-class decision boundaries without requiring old data or target re-computation. We will also add a short theoretical argument and an empirical check on a held-out old-class validation set to demonstrate that inconsistency does not materially degrade performance. revision: yes

  2. Referee: [§4 / Table 1] Experimental section: the abstract asserts outperformance on CORe50, iCIFAR-100 and best results on HOWS-CL-25, yet supplies no numerical accuracies, standard deviations, or ablation tables. Without these data the central empirical claim cannot be verified and the free parameters (regression coefficient, Mahalanobis scaling) cannot be assessed for sensitivity.

    Authors: The full manuscript already contains Table 1 with per-sequence and average accuracies (including standard deviations over multiple runs) on CORe50 and iCIFAR-100, plus the corresponding numbers for HOWS-CL-25. However, we acknowledge that the abstract itself provides no numerical values and that the sensitivity analysis for the two hyperparameters is not presented as a dedicated ablation. We will (i) augment the abstract with the key quantitative results (e.g., average accuracy and improvement margins), (ii) expand §4 with an explicit hyperparameter sensitivity table for the regression coefficient and Mahalanobis scaling factor, and (iii) report all results with standard deviations and the number of runs used. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with external validation

full rationale

The paper presents RECALL as an empirical construction for rehearsal-free continual learning: pre-compute logits on old categories, add a new head per sequence, replace classification with regression, and apply Mahalanobis loss on known categories. No equations, derivations, or first-principles claims are shown that reduce performance to quantities fitted from the evaluation data by construction. Results are reported on independent benchmarks (CORe50, iCIFAR-100) plus a new dataset (HOWS-CL-25), making the central claims self-contained against external data rather than tautological.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields limited visibility into hyperparameters; the method description implies at least one tunable regularization coefficient and the Mahalanobis covariance estimation, both of which function as free parameters whose values are not derived from first principles.

free parameters (2)
  • regularization coefficient for regression loss
    Controls the strength of the term that replaces standard classification; must be chosen to balance new learning against retention of old logits.
  • Mahalanobis covariance scaling factor
    Adjusts how variances of known-category features are incorporated; appears fitted to the changing density between known and unknown data.

pith-pipeline@v0.9.0 · 5727 in / 1327 out tokens · 27168 ms · 2026-05-24T11:11:17.218956+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 5 internal anchors

  1. [1]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  2. [2]

    Implicit 3d orientation learning for 6d object detection from rgb images,

    M. Sundermeyer, Z.-C. Marton, M. Durner, M. Brucker, and R. Triebel, “Implicit 3d orientation learning for 6d object detection from rgb images,” in European Conference on Computer Vision (ECCV), 2018, pp. 699–715

  3. [3]

    Learning less is more-6d camera localization via 3d surface regression,

    E. Brachmann and C. Rother, “Learning less is more-6d camera localization via 3d surface regression,” in CVPR. IEEE, 2018

  4. [4]

    Towards end-to-end speech recognition with recurrent neural networks,

    A. Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” in IEEE International Conference on Machine Learning (ICML) . PMLR, 2014, pp. 1764–1772

  5. [5]

    Ima- genet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Ima- genet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE, 2009

  6. [6]

    Three scenarios for continual learning

    G. M. Van de Ven and A. S. Tolias, “Three scenarios for continual learning,” arXiv preprint arXiv:1904.07734 , 2019

  7. [7]

    icarl: Incremental classifier and representation learning,

    S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in CVPR, 2017

  8. [8]

    Gradient episodic memory for contin- ual learning,

    D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for contin- ual learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 6467–6476. 2 4 6 80 20 40 60 80 100Accuracy in % Results on CORe50 AR1 results (from [12]) 2 4 6 8 100 20 40 60 80 100Accuracy in % Results on iCIFAR-100 1 2 3 4 50 20 40 60 80 100Accuracy in % Results...

  9. [9]

    Progressive neural networks,

    A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” Computing Research Repository (CoRR) , 2016

  10. [10]

    Core50: a new dataset and benchmark for continuous object recognition,

    V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continuous object recognition,” Proceedings of Machine Learning Research (PMLR), 2017

  11. [11]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, 2017

  12. [12]

    Continuous learning in single- incremental-task scenarios,

    D. Maltoni and V . Lomonaco, “Continuous learning in single- incremental-task scenarios,” Neural Networks, 2019

  13. [13]

    Continual learning through synaptic intelligence,

    F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in ICML, vol. 70, 2017, p. 3987

  14. [14]

    Incremental object learning from contiguous views,

    S. Stojanov, S. Mishra, N. A. Thai, N. Dhanda, A. Humayun, C. Yu, L. B. Smith, and J. M. Rehg, “Incremental object learning from contiguous views,” in IEEE CVPR, June 2019

  15. [15]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531 , 2015

  16. [16]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al., “Overcoming catastrophic forgetting in neural networks,” Pro- ceedings of the national academy of sciences , vol. 114, 2017

  17. [17]

    Il2m: Class incremental learning with dual memory,

    E. Belouadah and A. Popescu, “Il2m: Class incremental learning with dual memory,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2019

  18. [18]

    Storing encoded episodes as concepts for continual learning,

    A. Ayub and A. R. Wagner, “Storing encoded episodes as concepts for continual learning,” ICML Workshop, 2020

  19. [19]

    Eec: Learning to encode and regenerate im- ages for continual learning,

    A. Ayub and A. Wagner, “Eec: Learning to encode and regenerate im- ages for continual learning,” in International Conference on Learning Representations (ICLR), 2021

  20. [20]

    Persistent anytime learning of objects from unseen classes,

    M. Denninger and R. Triebel, “Persistent anytime learning of objects from unseen classes,” in IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS) . IEEE, 2018, pp. 4075–4082

  21. [21]

    Blenderproc: Reducing the reality gap with photorealisitc rendering,

    M. Denninger, M. Sundermeyer, D. Winkelbauer, D. Olefir, T. Hoda ˇn, Y . Zidan, M. Elbadrawy, M. Knauer, H. Katam, and A. Lodhi, “Blenderproc: Reducing the reality gap with photorealisitc rendering,” Robotics: Science and Systems (RSS) , 2020

  22. [22]

    ShapeNet: An Information-Rich 3D Model Repository

    A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. , “Shapenet: An information-rich 3d model repository,” arXiv:1512.03012, 2015

  23. [23]

    The mnist database of handwritten digits,

    Y . LeCun, “The mnist database of handwritten digits,” 1998

  24. [24]

    Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning,

    Q. She, F. Feng, X. Hao, Q. Yang, C. Lan, V . Lomonaco, X. Shi, Z. Wang, Y . Guo, Y . Zhang, et al. , “Openloris-object: A robotic vision dataset and benchmark for lifelong deep learning,” in IEEE International Conference on Robotics and Automation (ICRA) , 2020

  25. [25]

    Visual concepts and compositional voting,

    J. Wang, Z. Zhang, C. Xie, Y . Zhou, V . Premachandran, J. Zhu, L. Xie, and A. Yuille, “Visual concepts and compositional voting,” vol. 3

  26. [26]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009

  27. [27]

    Photorealistic image synthesis for object instance detection,

    T. Hoda ˇn, V . Vineet, R. Gal, E. Shalev, J. Hanzelka, T. Connell, P. Urbina, S. N. Sinha, and B. Guenter, “Photorealistic image synthesis for object instance detection,” in IEEE International Conference on Image Processing (ICIP) . IEEE, 2019, pp. 66–70

  28. [28]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

  29. [29]

    Identity mappings in deep residual networks,

    K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in ECCV. Springer, 2016, pp. 630–645

  30. [30]

    Re- thinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” in IEEE Conference on Computer Vision and Pattern Recognition , 2016

  31. [31]

    Inception-v4, inception-resnet and the impact of residual connections on learning,

    C. Szegedy, S. Ioffe, V . Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Conference on Artificial Intelligence (AAAI) , 2017

  32. [32]

    Rectified linear units improve restricted boltzmann machines,

    V . Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010

  33. [33]

    Implicit neural representations with periodic activation functions,

    V . Sitzmann, J. N. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein, “Implicit neural representations with periodic activation functions,” in NeurIPS, 2020

  34. [34]

    RECALL: Rehearsal-free Continual Learning for Object Classification

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, 2015, pp. 1–9. - Supplemental Material - RECALL: Rehearsal-free Continual Learning for Object Classification September 30, 2022 1 Introduction With RECALL, we introduce a novel algorithm for rehearsal-free o...