pith. sign in

arxiv: 1907.07872 · v1 · pith:PMK7AULXnew · submitted 2019-07-18 · 💻 cs.LG · stat.ML

Autoencoder-Based Incremental Class Learning without Retraining on Old Data

Pith reviewed 2026-05-24 19:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords incremental class learningcontinual learningautoencoderprototypesmetric classificationcatastrophic forgettingCIFAR-100CUB-200-2011
0
0 comments X

The pith

Storing only the mean prototype per class from an autoencoder allows incremental class learning without old data or high memory costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an incremental class learning approach in which an autoencoder extracts prototypes from inputs and only the mean of those prototypes is retained for each class. Classification then proceeds by metric comparison to these stored means rather than by expanding a softmax output layer. When a new task arrives, regularization is applied to the model to limit forgetting of earlier classes while no raw previous data is kept or replayed. On the CIFAR-100 and CUB-200-2011 benchmarks the resulting accuracy matches that of rehearsal-based or generative methods yet requires far less additional memory. The work therefore claims that prototype means suffice as a compact representation for sequential disjoint-class learning.

Core claim

An autoencoder can be trained so that its latent representations serve as prototypes; retaining only the per-class mean of these prototypes permits metric-based classification across sequentially presented disjoint classes, and regularization at each new task prevents catastrophic forgetting without any access to prior raw examples or generative reconstruction of them.

What carries the argument

Autoencoder that maps inputs to prototypes whose class-wise means are stored for nearest-prototype metric classification, with task-wise regularization.

If this is right

  • Output layer size stays constant because new classes do not require new units.
  • Memory overhead grows only linearly with the number of classes and the prototype dimension rather than with the size of stored exemplars.
  • Rehearsal buffers and generative replay networks become unnecessary for this class-incremental setting.
  • Regularization alone, when paired with fixed prototype means, is claimed to be enough to control forgetting.
  • The same prototype-mean storage can be used for both classification and any downstream metric task without retraining the encoder on old data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may scale to settings where storage is strictly limited, such as embedded devices that must learn new object categories over time.
  • If the mean prototype continues to work when the number of classes grows into the thousands, the method would imply that class-conditional statistics in latent space are unusually stable.
  • Combining the stored means with a small set of synthetic or distilled examples could be tested as a low-cost way to recover any lost accuracy.
  • The same prototype storage could support open-set recognition by flagging inputs whose distance to all stored means exceeds a threshold.

Load-bearing premise

The mean of the autoencoder prototypes for a class remains a sufficient statistic for correct metric classification of future inputs even after many new classes have been added and without any access to the original training images.

What would settle it

A controlled run on CIFAR-100 in which the method, after learning all classes sequentially, yields accuracy more than a few points below the rehearsal-based state-of-the-art baseline while still using only the reported memory budget.

Figures

Figures reproduced from arXiv: 1907.07872 by Euntae Choi, Kiyoung Choi, Kyungmi Lee.

Figure 1
Figure 1. Figure 1: Overall architecture of our model. experiment on (CIFAR-100 and CUB-200), a VGG-19 [Si￾monyan and Zisserman, 2014] pretrained on ImageNet is at￾tached in front of the encoder and works as a fixed feature ex￾tractor ϕ. This is similar to FearNet, which uses a pretrained ResNet-50 [He et al., 2016] as its feature extractor. Cosine Similarity-Based Classification The cosine version of NCM is used for classifi… view at source ↗
read the original abstract

Incremental class learning, a scenario in continual learning context where classes and their training data are sequentially and disjointedly observed, challenges a problem widely known as catastrophic forgetting. In this work, we propose a novel incremental class learning method that can significantly reduce memory overhead compared to previous approaches. Apart from conventional classification scheme using softmax, our model bases on an autoencoder to extract prototypes for given inputs so that no change in its output unit is required. It stores only the mean of prototypes per class to perform metric-based classification, unlike rehearsal approaches which rely on large memory or generative model. To mitigate catastrophic forgetting, regularization methods are applied on our model when a new task is encountered. We evaluate our method by experimenting on CIFAR-100 and CUB-200-2011 and show that its performance is comparable to the state-of-the-art method with much lower additional memory cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an autoencoder-based incremental class learning method that extracts prototypes from inputs, stores only the per-class mean of these prototypes, and performs nearest-mean metric classification without retraining on old data or using rehearsal buffers. Regularization is applied when new tasks arrive to mitigate catastrophic forgetting. Experiments on CIFAR-100 and CUB-200-2011 are reported to achieve performance comparable to state-of-the-art methods at substantially lower additional memory cost.

Significance. If the empirical results hold under the stated assumptions, the method supplies a simple, low-memory alternative to rehearsal and generative-model approaches in class-incremental learning by combining an autoencoder with off-the-shelf regularization and prototype means. This could be practically relevant for memory-constrained continual-learning settings.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'comparable performance' to SOTA is asserted without any reported accuracy numbers, baseline methods, statistical tests, or ablation results in the abstract; the full experimental section must supply these quantitative details and controls to substantiate the claim.
  2. [§3] §3 (Method): the claim that storing only the per-class mean of autoencoder prototypes remains a sufficient statistic for metric classification after subsequent regularization steps rests on the unverified assumptions that (a) intra-class latent distributions remain compact and unimodal and (b) the encoder does not shift old-class points far from their stored means; no latent-space visualizations, intra-class variance statistics, or ablation replacing means with full prototype sets are provided to test these conditions.
minor comments (1)
  1. [§3] Clarify the precise regularization term (e.g., EWC, SI, or other) and the joint training objective for the autoencoder plus classifier in the incremental phase.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'comparable performance' to SOTA is asserted without any reported accuracy numbers, baseline methods, statistical tests, or ablation results in the abstract; the full experimental section must supply these quantitative details and controls to substantiate the claim.

    Authors: We agree that the abstract would be strengthened by including specific quantitative results. The experimental section (§4) already reports accuracy numbers, baseline comparisons (including rehearsal and generative-model methods), and memory overhead on CIFAR-100 and CUB-200-2011, along with controls for the regularization approach. In the revision we will update the abstract to cite key accuracy figures and memory savings relative to SOTA, and we will ensure §4 explicitly tabulates all baselines, any statistical tests performed, and ablation results. revision: yes

  2. Referee: [§3] §3 (Method): the claim that storing only the per-class mean of autoencoder prototypes remains a sufficient statistic for metric classification after subsequent regularization steps rests on the unverified assumptions that (a) intra-class latent distributions remain compact and unimodal and (b) the encoder does not shift old-class points far from their stored means; no latent-space visualizations, intra-class variance statistics, or ablation replacing means with full prototype sets are provided to test these conditions.

    Authors: We acknowledge that the sufficiency of per-class means after regularization is an assumption that benefits from direct verification. Our regularization term is designed to penalize large shifts in the latent representations of previous classes, which we expect to keep intra-class distributions compact. To substantiate this, the revised manuscript will add (i) t-SNE or PCA visualizations of the latent space before and after new tasks, (ii) per-class variance statistics in the prototype space, and (iii) an ablation that compares nearest-mean classification against storing and using the full set of prototypes. These additions will empirically test assumptions (a) and (b). revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is empirically described without self-referential derivations

full rationale

The provided abstract and description contain no equations, derivations, or self-citations that reduce any claimed result to a fitted parameter or input by construction. The approach is presented as storing per-class prototype means from an autoencoder and applying off-the-shelf regularization, with performance evaluated empirically on CIFAR-100 and CUB-200-2011. No load-bearing steps match the enumerated circularity patterns; the central claim rests on experimental comparison rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, mathematical axioms, or newly postulated entities; the method is described in terms of standard autoencoders and regularization already present in the literature.

pith-pipeline@v0.9.0 · 5678 in / 1116 out tokens · 22287 ms · 2026-05-24T19:48:03.801036+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 8 internal anchors

  1. [1]

    Memory aware synapses: Learning what (not) to forget

    [Aljundi et al., 2018] Rahaf Aljundi, Francesca Babiloni, Mo- hamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Com- puter Vision – ECCV 2018, pages 144–161,

  2. [2]

    Lof: identifying density-based local outliers

    [Breunig et al., 2000] Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and J¨org Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM,

  3. [3]

    Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

    [Clevert et al., 2015] Djork-Arn´e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289,

  4. [4]

    Catastrophic forgetting in con- nectionist networks

    [French, 1999] Robert M French. Catastrophic forgetting in con- nectionist networks. Trends in cognitive sciences, 3(4):128–135,

  5. [5]

    A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems

    [Gepperth and Karaoguz, 2016] Alexander Gepperth and Cem Karaoguz. A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems. Cognitive Computation, 8:924 – 934,

  6. [6]

    Deepncm: Deep nearest class mean classi- fiers

    [Guerriero et al., 2018] Samantha Guerriero, Barbara Caputo, and Thomas Mensink. Deepncm: Deep nearest class mean classi- fiers. In International Conference on Learning Representations, Workshop Track,

  7. [7]

    Deep residual learning for image recognition

    [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778. IEEE,

  8. [8]

    Distilling the knowledge in a neural network

    [Hinton et al., 2015] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop,

  9. [9]

    Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

    [Hsu et al., 2018] Yen-Chang Hsu, Yen-Cheng Liu, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488 ,

  10. [10]

    Fearnet: Brain-inspired model for incremental learning

    [Kemker and Kanan, 2018] Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. In International Conference on Learning Representations,

  11. [11]

    Measuring catastrophic forgetting in neural networks

    [Kemker et al., 2018] Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence,

  12. [12]

    Overcoming catastrophic forgetting in neural networks

    [Kirkpatrick et al., 2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, page 201611835,

  13. [13]

    Learning multiple layers of features from tiny images

    [Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer,

  14. [14]

    What learning systems do intelligent agents need? complementary learning systems theory updated

    [Kumaran et al., 2016] Dharshan Kumaran, Demis Hassabis, and James L McClelland. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in cognitive sciences, 20(7):512–534,

  15. [15]

    Learning without forgetting

    [Li and Hoiem, 2018] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947,

  16. [16]

    Gradient episodic memory for continual learning

    [Lopez-Paz and Ranzato, 2017] David Lopez-Paz and Marc Aure- lio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems 30, pages 6467–6476. Curran Associates, Inc.,

  17. [17]

    Continuous Learning in Single-Incremental-Task Scenarios

    [Maltoni and Lomonaco, 2018] Davide Maltoni and Vincenzo Lomonaco. Continuous learning in single-incremental-task scenarios. arXiv preprint arXiv:1806.08568,

  18. [18]

    Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory

    [McClelland et al., 1995] James L McClelland, Bruce L Mc- Naughton, and Randall C O’reilly. Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory. Psychological review, 102(3):419,

  19. [19]

    Catastrophic interference in connectionist networks: The sequential learning problem

    [McCloskey and Cohen, 1989] Michael McCloskey and Neal J Co- hen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and mo- tivation, volume 24, pages 109–165

  20. [20]

    Mensink, J

    [Mensink et al., 2013] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Distance-based image classification: Generalizing to new classes at near-zero cost. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 35(11):2624–2637, Nov

  21. [21]

    Continual Lifelong Learning with Neural Networks: A Review

    [Parisi et al., 2018] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. arXiv preprint arXiv:1802.07569,

  22. [22]

    Semi-supervised learning with ladder networks

    [Rasmus et al., 2015] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Informa- tion Processing Systems, pages 3546–3554,

  23. [23]

    [Rebuffiet al., 2017] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July

  24. [24]

    Reddi, Satyen Kale, and Sanjiv Ku- mar

    [Reddi et al., 2018] Sashank J. Reddi, Satyen Kale, and Sanjiv Ku- mar. On the convergence of adam and beyond. In International Conference on Learning Representations,

  25. [25]

    Overcoming catastrophic forgetting with hard attention to the task

    [Serra et al., 2018] Joan Serra, D ´ıdac Sur ´ıs, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423,

  26. [26]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    [Simonyan and Zisserman, 2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale im- age recognition. arXiv preprint arXiv:1409.1556,

  27. [27]

    [Wah et al., 2011] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Tech- nical Report CNS-TR-2011-001, California Institute of Technol- ogy,

  28. [28]

    Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

    [Wang et al., 2018] Shuai Wang, Zili Huang, Yanmin Qian, and Kai Yu. Deep discriminant analysis for i-vector based robust speaker recognition. arXiv preprint arXiv:1805.01344,

  29. [29]

    Incremental Classifier Learning with Generative Adversarial Networks

    [Wu et al., 2018] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, and Yun Fu. Incremental classifier learning with generative adversarial networks. arXiv preprint arXiv:1802.00853,

  30. [30]

    Continual learning through synaptic intelligence

    [Zenke et al., 2017] Friedemann Zenke, Ben Poole, and Surya Gan- guli. Continual learning through synaptic intelligence. In Pro- ceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research , pages 3987–3995. PMLR, 06–11 Aug 2017